I am an associate professor in the Department of Computer Science and Engineering at Korea University and the director of Multimodal Interactive Intelligence Laboratory (MIIL). Previously, I was a research scientist at Google working with Cordelia Schmid on research problems involving both vision and language. I completed Ph.D. in 2020 under the supervision of Bohyung Han and Minsu Cho at POSTECH. Before that I was advised by Gary Geunbae Lee at the same school for M.S. During my Ph.D., I interned at Disney Research, Google and Facebook working with Leonid Sigal, Jack Sim, Radu Soricut and Peter Vajda. My research interests primarily lie in the areas of computer vision and natural language processing, especially in the intersections of these areas.

Education

Ph.D. Computer Science and Engineering, POSTECH (Sep. 2016 - Feb. 2020)
- Combinatorial Classification: Learning by Combination of Classifiers on Heterogeneous Output Spaces
  Advisors: Bohyung Han and Minsu Cho; Committee: Suha Kwak, Seungyong Lee and Jinwoo Shin
M.S. Computer Science and Engineering, POSTECH (Mar. 2011 - Feb. 2013)
- Multiple User Intent Understanding for Spoken Dialog System
  Advisor: Gary Geunbae Lee; Committee: Jonghyeok Lee and Hwanjo Yu
B.S. Computer Engineering, Changwon National University (Mar. 2006 - Feb. 2011)

Work Experiences

Vice Dean for Academic Affairs, Dept. of CSE, Korea University, South Korea (Sep. 2025 - )
Associate Professor, Korea University, South Korea (Sep. 2025 - )
Assistant Professor, Korea University, South Korea (Sep. 2023 - )
Research Scientist, Google, France (Mar. 2020 - Jun. 2023)
Research Intern, Facebook, USA (Sep. 2019 - Jan. 2020)
Research Intern, Google, USA (May. 2019 - Aug. 2019)
Research Intern, Google, USA (Jun. 2017 - Dec. 2017)
Research Intern, Disney Research Pittsburgh, USA (Feb. 2017 - May. 2017)
Lecturer, Dept. of Information Technology, Mongolia International University (Aug. 2013 - Jan. 2015)

Honors and Awards

1 Place in Ego4D AV Transcription Challenge, (2022)
Best Ph.D. Dissertation Award (Engineering), POSTECH (2020)
CVPR 2019 Doctoral Consortium (2019)
Naver Ph.D. Fellowship (2017)
Best Team Project Award, SUNY Korea Hot Topics in Computer Science Workshop (2015)
Academic Scholarship, Kyobo Foundation for Education and Culture (2006 - 2011)
Academic Scholarship, Bakyeop Foundation (2007 - 2008)
Grand Prize, CNU Venture Item Contest (2007)
First Runner Up, CNU Programming Contest (2006)

Academic Services

Regular Program Committee Member (Area Chair), in NeurIPS, CVPR, ICML, NeurIPS, ICCV.
Regular Program Committee Member (Reviewer) in CVPR, NeurIPS, ICLR, ICML, ICCV and ACL.
KCCV 2026 Organizing Committee Member Finance Chair.
KCCV 2025 Organizing Committee Member Finance Chair.
KCCV 2024 Organizing Committee Member Finance Chair.
Workshop Organizer, The 1st Workshop on Customized Chat Grounding Persona and Knowledge, COLING 2022
Technical Committee Member, Conceptual Captions Challenge, CVPR 2019

Publications

Seg4Diff: Unveiling Open-Vocabulary Segmentation in Text-to-Image Diffusion Transformers
Chaehyun Kim, Heeseong Shin, Heeji Yoon, Eunbeen Hong, Anurag Arnab, Paul Hongsuck Seo, +Sunghwan Hong, +Seungryong Kim (+ corresponding authors)
In NeurIPS 2025
ReTAG: Retrieval-Enhanced, Topic-Augmented Graph-Based Global Sensemaking
Boyoung Kim, Dosung Lee, Sumin An, Jinseong Jeong, Paul Hongsuck Seo
In EMNLP 2025 (Findings)
DialNav: Multi-turn Dialog Navigation with a Remote Guide
Leekyeung Han, Hyunji Min, Gyeom Hwangbo, Jonghyun Choi, Paul Hongsuck Seo
In ICCV 2025
Cross-Modal Watermarking for Authentic Audio Recovery and Tamper Localization in Synthesized Audiovisual Forgeries
Minyoung Kim, Sehwan Park, Sungmin Cha, Paul Hongsuck Seo
In Interspeech 2025
DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization
*Geonyoung Lee, *Geonhee Han, Paul Hongsuck Seo (* equal contribution)
In Interspeech 2025
Bridging Audio and Vision: Zero-Shot Audiovisual Segmentation by Connecting Pretrained Models
In Interspeech 2025
ReSCORE: Label-free Iterative Retriever Training for Multi-hop Question Answering with Relevance-Consistency Supervision
Dosung Lee, Wonjun Oh, Boyoung Kim, Minyoung Kim, +Joonsuk Park, +Paul Hongsuck Seo (+ corresponding authors)
In ACL 2025
Random Conditioning for Diffusion Model Compression with Distillation
*Dohyun Kim, *Sehwan Park, Geonhee Han, Seung Wook Kim, Paul Hongsuck Seo (* equal contribution)
In CVPR 2025
LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs
oral

Sumin An, Junyoung Sung, Wonpyo Park, +Chanjun Park, +Paul Hongsuck Seo (+ corresponding authors)
In NAACL 2025
Multi-Granularity Video Object Segmentation
*Sangbeom Lim, *Seongchan Kim, *Seungjun An, Seokju Cho, +Paul Hongsuck Seo, +Seungryong Kim (+ corresponding authors)
In AAAI 2025
TrackIME: Enhanced Video Point Tracking via Instance Motion Estimation
spotlight

Seong Hyeon Park, Huiwon Jang, Byungwoo Jeon, Sukmin Yun, Paul Hongsuck Seo, Jinwoo Shin
In NeurIPS 2024
Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels
Heeseong Shin, Chaehyun Kim, Sunghwan Hong, Seokju Cho, Anurag Arnab, +Paul Hongsuck Seo, +Seungryong Kim (+ corresponding authors)
In NeurIPS 2024
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
Seonghoon Yu, +Paul Hongsuck Seo, +Jeany Son (+ corresponding authors)
In ECCV 2024
CAT-Seg: Cost Aggregation for Open-vocabulary Semantic Segmentation
highlight

*Seokju Cho, *Heeseong Shin, Sunghwan Hong, Anurag Arnab, +Paul Hongsuck Seo, +Seungryong Kim (+ corresponding authors)
In CVPR 2024
Learning Correlation Structures for Vision Transformers
Manjin Kim, +Paul Hongsuck Seo, Cordelia Schmid, +Minsu Cho (+ corresponding authors)
In CVPR 2024
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid
In CVPR 2023
IFSeg: Image-free Semantic Segmentation via Vision-Language Model
Sukmin Yun, Seong Hyeon Park, Paul Hongsuck Seo, Jinwoo Shin
In CVPR 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang, Arsha Nagrani, Paul Hongsuck Seo, Antoine Miech, Jordi Pont-Tuset, Ivan Laptev, Josef Sivic, Cordelia Schmid
In CVPR 2023
Zero-shot Referring Image Segmentation with Global-Local Context Features
Seonghoon Yu, Paul Hongsuck Seo, Jeany Son
In CVPR 2023
Learning Audio-Video Modalities from Image Captions
Arsha Nagrani, Paul Hongsuck Seo, Bryan Seybold, Anja Hauth, Santiago Manen, Chen Sun, Cordelia Schmid
In ECCV 2022
AVATAR: Unconstrained Audiovisual Speech Recognition
oral

*Valentin Gabeur, *Paul Hongsuck Seo, *Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid (* equal contribution)
In Interspeech 2022
End-to-end Generative Pretraining for Multimodal Video Captioning
Paul Hongsuck Seo, Arsha Nagrani, Anurag Arnab, Cordelia Schmid
In CVPR 2022
Look Before you Speak: Visually Contextualized Utterances
Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid
In CVPR 2021
Reinforcing an Image Caption Generator by Human Feedback
oral

Paul Hongsuck Seo, Piyush Sharma, Tomer Levinboim, Bohyung Han, Radu Soricut
In AAAI 2020
Combinatorial Inference against Label Noise
Paul Hongsuck Seo, Geeho Kim, Bohyung Han
In NeurIPS 2019
Regularizing Neural Networks via Stochastic Branch Layers
oral

*Wonpyo Park, *Paul Hongsuck Seo, Bohyung Han, Minsu Cho (* equal contribution)
In ACML 2019
Learning for Single-Shot Confidence Calibration in Deep Neural Networks through Stochastic Inferences
*Seonguk Seo, *Paul Hongsuck Seo, Bohyung Han (* equal contribution)
In CVPR 2019
CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of Maps
Paul Hongsuck Seo, Tobias Weyand, Jack Sim, Bohyung Han
In ECCV 2018
Attentive Semantic Alignment with Offset-Aware Correlation Kernels
Paul Hongsuck Seo, Jongmin Lee, Deunsol Jung, Bohyung Han, Minsu Cho
In ECCV 2018
Progressive Attention Networks for Visual Attribute Prediction
Paul Hongsuck Seo, Zhe Lin, Scott Cohen, Xiaohui Shen, Bohyung Han
In BMVC 2018
Visual Reference Resolution using Attention Memory for Visual Dialog
Paul Hongsuck Seo, Andreas Lehrmann, Bohyung Han, Leonid Sigal
In NIPS 2017
MarioQA: Answering Questions by Watching Gameplay Videos
*Jonghwan Mun, *Paul Hongsuck Seo, Ilchae Jung, Bohyung Han (* equal contribution)
In ICCV 2017
Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction
oral

Hyeonwoo Noh, Paul Hongsuck Seo, Bohyung Han
In CVPR 2016
A Corpus for a Multimodal Dialog System for Presentation Controls
Paul Hongsuck Seo, Gary Geunbae Lee
In Proceedings of the International Workshop Series on Multimodal Corpora (MMC 2016)
Conversational Knowledge Teaching Agent that Uses a Knowledge Base
Kyusong Lee, Paul Hongsuck Seo, Junhwi Choi, Sangjun Koo, Gary Geunbae Lee
In SIGDIAL 2015
Grammatical Error Correction based on Learner Comprehension Model in Oral Conversation
Kyusong Lee, Seonghan Ryu, Paul Hongsuck Seo, Seokhwan Kim, Gary Geunbae Lee
In Proceedings of the IEEE Workshop on Spoken Language Technology (SLT 2014)
Generating Grammar Questions using Corpus Data in L2 Learning
Kyusong Lee, Soo-ok Kweon, Hongsuck Seo, Gary Geunbae Lee
In Proceedings of the IEEE Workshop on Spoken Language Technology (SLT 2012)
A Meta-Learning Approach to Grammatical Error Correction
Hongsuck Seo, Jonghoon Lee, Seokhwan Kim, Kyusong Lee, Sechun Kang, Gary Geunbae Lee
In ACL 2012
Grammatical Error Annotation for Korean Learners of Spoken English
Hongsuck Seo, Kyusong Lee, Gary Geunbae Lee, Soo-ok Kweon
In LREC 2012