Fixing mode collapsing with diversification: one size doesn't fit all

Publisher:闻天明Release Time:2019-06-03Number of visits:85

Speaker:    Prof. Jianbo Shi

Time:        14:30-15:30, June 4

Location:    SIST 1A-200

Host:       Prof. Jingyi Yu

Abstract:

Many structure prediction tasks can be formulated a domain transfer problem using GAN.  The crucial part is the learning of a function to recognize good vs bad contextual structural relationships from sparse samples drawn from target domain.  Mode collapse, cherry picking what to learn, can cause the solution all look similar.  When there are multiple possible solutions co-exist, for example video future prediction, the solution becomes highly unstable. 

Our main idea is to recognize that 1) both target and domain images form a complex nonlinear topological manifold with highly even sample bias, and 2) more precise modeling of the manifolds, instead of fitting them to a single true vs false value, lead to more diverse and accurate solution.

We show two special solutions in the domain of image super slowmotion and image segmentation-depth estimation.  We demonstrate a general solution, called normalized diversification, for modeling one-to-many domain transfer problem explicitly addressing the problem of mode collapsing.

Bio:

Jianbo Shi is a Professor of Computer and Information Science at the University of Pennsylvania, where he served as Graduate Group Chair.  He studied Computer Science and Mathematics as an undergraduate at Cornell University where he received his B.A. degrees.  He received his Ph.D. degree in Computer Science from University of California at Berkeley.  He was a research faculty at The Robotics Institute at Carnegie Mellon University before joining the faculty of the University  of Pennsylvania.  He has made fundamental contributions to computer vision and machine learning on image segmentation,motion tracking, and data clustering.  He was awarded for IEEE Longuet-Higgins Prize for Fundamental contributions in Computer Vision'.  According to GoogleScholar, his work has been cited over 35,000 times. His current research focuses on first-person vision, human behavior analysis and image recognition-segmentation. His other research interests include image/video retrieval, 3D vision, and vision based desktop computing. His long-term interests center around a broader area of machine intelligence, he wishes to develop a visual thinking module that allows computers not only to understand the environment around, but also to achieve cognitive abilities such as machine memory and learning. 

SIST-Seminar 18165