I am an assistant professor in the Department of Computer Science and Engineering at Korea University and the director of Multimodal Interactive Intelligence Laboratory (MIIL). Previously, I was a research scientist at Google working with Cordelia Schmid on research problems involving both vision and language. I completed Ph.D. in 2020 under the supervision of Bohyung Han and Minsu Cho at POSTECH. Before that I was advised by Gary Geunbae Lee at the same school for M.S. During my Ph.D., I interned at Disney Research, Google and Facebook working with Leonid Sigal, Jack Sim, Radu Soricut and Peter Vajda. My research interests primarily lie in the areas of computer vision and natural language processing, especially in the intersections of these areas.
Education
Work Experiences
- Research Scientist, Google, France (Mar. 2020 - Jun. 2023)
- Research Intern, Facebook, USA (Sep. 2019 - Jan. 2020)
- Research Intern, Google, USA (May. 2019 - Aug. 2019)
- Research Intern, Google, USA (Jun. 2017 - Dec. 2017)
- Research Intern, Disney Research Pittsburgh, USA (Feb. 2017 - May. 2017)
- Lecturer, Dept. of Information Technology, Mongolia International University (Aug. 2013 - Jan. 2015)
Honors and Awards
- 1 Place in Ego4D AV Transcription Challenge, (2022)
- Best Ph.D. Dissertation Award (Engineering), POSTECH (2020)
- CVPR 2019 Doctoral Consortium (2019)
- Naver Ph.D. Fellowship (2017)
- Best Team Project Award, SUNY Korea Hot Topics in Computer Science Workshop (2015)
- Academic Scholarship, Kyobo Foundation for Education and Culture (2006 - 2011)
- Academic Scholarship, Bakyeop Foundation (2007 - 2008)
- Grand Prize, CNU Venture Item Contest (2007)
- First Runner Up, CNU Programming Contest (2006)
Academic Services
- Workshop Organizer, The 1st Workshop on Customized Chat Grounding Persona and Knowledge, COLING 2022
- Technical Committee Member, Conceptual Captions Challenge, CVPR 2019
- Program Committee Member (Area Chair), in NeurIPS 2022, CVPR 2023, ICML 2023, NeurIPS 2023.
- Regular Program Committee Member (Reviewer) in CVPR, NeurIPS, ICLR, ICML, ICCV and ACL.
Publications
-
TrackIME: Enhanced Video Point Tracking via Instance Motion Estimation
spotlight
Seong Hyeon Park, Huiwon Jang, Byungwoo Jeon, Sukmin Yun, Paul Hongsuck Seo, Jinwoo Shin
In NeurIPS 2024
-
Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels
Heeseong Shin, Chaehyun Kim, Sunghwan Hong, Seokju Cho, Anurag Arnab, +Paul Hongsuck Seo, +Seungryong Kim (+ corresponding authors)
In NeurIPS 2024
-
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
Seonghoon Yu, +Paul Hongsuck Seo, +Jeany Son (+ corresponding authors)
In ECCV 2024
-
CAT-Seg: Cost Aggregation for Open-vocabulary Semantic Segmentation
highlight
*Seokju Cho, *Heeseong Shin, Sunghwan Hong, Anurag Arnab, +Paul Hongsuck Seo, +Seungryong Kim (+ corresponding authors)
In CVPR 2024
-
Learning Correlation Structures for Vision Transformers
Manjin Kim, +Paul Hongsuck Seo, Cordelia Schmid, +Minsu Cho (+ corresponding authors)
In CVPR 2024
-
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid
In CVPR 2023
-
IFSeg: Image-free Semantic Segmentation via Vision-Language Model
Sukmin Yun, Seong Hyeon Park, Paul Hongsuck Seo, Jinwoo Shin
In CVPR 2023
-
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang, Arsha Nagrani, Paul Hongsuck Seo, Antoine Miech, Jordi Pont-Tuset, Ivan Laptev, Josef Sivic, Cordelia Schmid
In CVPR 2023
-
Zero-shot Referring Image Segmentation with Global-Local Context Features
Seonghoon Yu, Paul Hongsuck Seo, Jeany Son
In CVPR 2023
-
Learning Audio-Video Modalities from Image Captions
Arsha Nagrani, Paul Hongsuck Seo, Bryan Seybold, Anja Hauth, Santiago Manen, Chen Sun, Cordelia Schmid
In ECCV 2022
-
AVATAR: Unconstrained Audiovisual Speech Recognition
oral
*Valentin Gabeur, *Paul Hongsuck Seo, *Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid (* equal contribution)
In Interspeech 2022
-
End-to-end Generative Pretraining for Multimodal Video Captioning
Paul Hongsuck Seo, Arsha Nagrani, Anurag Arnab, Cordelia Schmid
In CVPR 2022
-
Look Before you Speak: Visually Contextualized Utterances
Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid
In CVPR 2021
-
Reinforcing an Image Caption Generator by Human Feedback
oral
Paul Hongsuck Seo, Piyush Sharma, Tomer Levinboim, Bohyung Han, Radu Soricut
In AAAI 2020
-
Combinatorial Inference against Label Noise
Paul Hongsuck Seo, Geeho Kim, Bohyung Han
In NeurIPS 2019
-
Regularizing Neural Networks via Stochastic Branch Layers
oral
*Wonpyo Park, *Paul Hongsuck Seo, Bohyung Han, Minsu Cho (* equal contribution)
In ACML 2019
-
Learning for Single-Shot Confidence Calibration in Deep Neural Networks through Stochastic Inferences
*Seonguk Seo, *Paul Hongsuck Seo, Bohyung Han (* equal contribution)
In CVPR 2019
-
CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of Maps
Paul Hongsuck Seo, Tobias Weyand, Jack Sim, Bohyung Han
In ECCV 2018
-
Attentive Semantic Alignment with Offset-Aware Correlation Kernels
Paul Hongsuck Seo, Jongmin Lee, Deunsol Jung, Bohyung Han, Minsu Cho
In ECCV 2018
-
Progressive Attention Networks for Visual Attribute Prediction
Paul Hongsuck Seo, Zhe Lin, Scott Cohen, Xiaohui Shen, Bohyung Han
In BMVC 2018
-
Visual Reference Resolution using Attention Memory for Visual Dialog
Paul Hongsuck Seo, Andreas Lehrmann, Bohyung Han, Leonid Sigal
In NIPS 2017
-
MarioQA: Answering Questions by Watching Gameplay Videos
*Jonghwan Mun, *Paul Hongsuck Seo, Ilchae Jung, Bohyung Han (* equal contribution)
In ICCV 2017
-
Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction
oral
Hyeonwoo Noh, Paul Hongsuck Seo, Bohyung Han
In CVPR 2016
-
A Corpus for a Multimodal Dialog System for Presentation Controls
Paul Hongsuck Seo, Gary Geunbae Lee
In Proceedings of the International Workshop Series on Multimodal Corpora (MMC 2016)
-
Conversational Knowledge Teaching Agent that Uses a Knowledge Base
Kyusong Lee, Paul Hongsuck Seo, Junhwi Choi, Sangjun Koo, Gary Geunbae Lee
In SIGDIAL 2015
-
Grammatical Error Correction based on Learner Comprehension Model in Oral Conversation
Kyusong Lee, Seonghan Ryu, Paul Hongsuck Seo, Seokhwan Kim, Gary Geunbae Lee
In Proceedings of the IEEE Workshop on Spoken Language Technology (SLT 2014)
-
Generating Grammar Questions using Corpus Data in L2 Learning
Kyusong Lee, Soo-ok Kweon, Hongsuck Seo, Gary Geunbae Lee
In Proceedings of the IEEE Workshop on Spoken Language Technology (SLT 2012)
-
A Meta-Learning Approach to Grammatical Error Correction
Hongsuck Seo, Jonghoon Lee, Seokhwan Kim, Kyusong Lee, Sechun Kang, Gary Geunbae Lee
In ACL 2012
-
Grammatical Error Annotation for Korean Learners of Spoken English
Hongsuck Seo, Kyusong Lee, Gary Geunbae Lee, Soo-ok Kweon
In LREC 2012