Speaker: Kai Han, The University of Hong Kong.
Time: 2:00 pm, Jun. 5th
Location: SIST1A 200
Host: Xuming He
Abstract:
In this talk, I will introduce our recent works that cover open-world visual categorization, reconstruction, and generation. Firstly, I will discuss our recent study on open-world learning, including generalized category discovery and open-vocabulary action recognition on images and videos, respectively, leveraging foundation models. Next, I will present our recent work on generalizable visual SLAM, focusing on the development of a feed-forward SLAM system that eliminates the need for per-scene optimization. We propose an image-based depth fusion framework to achieve this goal. Finally, I will discuss our recent work on 3D human modeling, encompassing both reconstruction and generation perspectives.
Bio:
Kai Han is an Assistant Professor in Department of Statistics and Actuarial Science at The University of Hong Kong, where he directs the Visual AI Lab. His research interests lie in computer vision, machine learning, and artificial intelligence. His current research focuses on open-world learning, 3D vision, generative AI, foundation models and their relevant fields. Previously, he was a Researcher at Google Research, an Assistant Professor in Department of Computer Science at University of Bristol, and a Postdoctoral Researcher in the Visual Geometry Group (VGG) at the University of Oxford. He received his Ph.D. degree in Department of Computer Science at The University of Hong Kong. During his Ph.D., he also worked at the WILLOW team of Inria Paris and École Normale Supérieure (ENS) in Paris. He serves as Area Chair for CVPR 2024 and ECCV 2024.