CS Colloquium
Ming-Hsuan Yang
UC Merced and Google DeepMind
Host: Alex Wong
Title: Video Understanding and Generation with Multimodal Foundation Models
Abstract:
Recent advancements in vision and language models have greatly enhanced various visual tasks related to understanding and generation. In this talk, I will present our latest research on effective tokenizers for transformers and discuss our efforts to adapt frozen large language models for a range of vision tasks, including visual classification, video-text retrieval, visual captioning, vision query answering, visual grounding, video generation, stylization, outpainting, and video-to-audio conversion. If time permits, I will also share some recent findings in 3D vision.
Bio:
Ming-Hsuan Yang is a Professor at UC Merced and a Research Scientist at Google DeepMind. He received the Google Faculty Award in 2009, the NSF CAREER Award in 2012, and the Nvidia Pioneer Research Award in 2017 and 2018. Yang has earned several awards, including Best Paper Honorable Mention at UIST 2017, Best Paper Honorable Mention at CVPR 2018, Best Student Paper Honorable Mention at ACCV 2018, Longuet-Higgins Prize (for test of time) at CVPR 2023, and Best Paper at ICML 2024. He serves as Associate Editor-in-Chief of PAMI and as an Associate Editor for IJCV. Previously, he was the Editor-in-Chief of CVIU and served as program co-chair for ICCV in 2019. Yang is a Fellow of both the IEEE and ACM.