Recently, many papers from SIST’s Intelligent Vision Center (vic.shanghaitech.edu.cn) have been received by four top international artificial intelligence. 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) received seven articles, 2019 Annual Meeting of the Association for Computational Linguistics (ACL) received three articles, 2019 International Conference on Machine Learning (ICML) received three articles and the 2019 International Joint Conference (IJCAI) received two articles. The above four international conferences have been recognized by the China Computer Society (CCF) as the highest level in the field of artificial intelligence, namely the Class A International Academic Conference.

In the field of Intelligent Vision, seven papers from Assistant Professor Gao Shenghua’s research group were received by CVPR 2019. “Density Map Regression Guided Detection Network for RGB-D Crowd Counting and Localization” proposes a regression guided detection network (RDNet) for RGB-D crowd counting in order to simultaneously estimate head counts and localize heads with bounding boxes. (See Figure 1). “Local to Global Learning: Gradually Adding Classes for Training Deep Neural Networks” proposes a new learning paradigm, Local to Global Learning (LGL, See Figure 2), for Deep Neural Networks (DNNs) to improve the performance of classification problems. In the paper, the researchers incorporate the idea of LGL into the learning objective of DNNs and explain why LGL works better from an information-theoretic perspective. “PPGNet: Learning Point-Pair Graph for Line Segment Detection” proposes to describe junctions, line segments and relationships between them with a simple graph (See Figure 3), and proposes a Point-Pair Graph Network (PPGNet), capable of detecting all the junctions in an image and outputting line segment detection results in the form of adjacency matrix. “Single-Image Piece-wise Planar 3D Reconstruction via Associative Embedding” studies the approach to providing a compact 3D scene representation with unfixed fixed number of planes for a single RGB image. It uses a convolutional neural network to map each pixel to an embedding space and get the plane instances via an efficient mean shift clustering algorithm (See Figure 4). This method achieves state-of-the-art performance on the public dataset and runs at 30 fps at the testing time.

Figure1. The RDNet for crowd counting

Figure 2. An illustration of the difference between transfer learning and LGL. A, B and C denote three classes in the training set

Figure 3. The frame work of Point-Pair Graph Net (PPGNet)

Figure 4. Planar reconstruction results. From left to right: input image, plane instance segmentation, depth map, and the planar 3D model

Two papers by Assistant Professor Laurent Kneip’s research group were received by CVPR. “The Alignment of the Spheres: Globally-Optimal Spherical Mixture Alignment for Camera Pose Estimation” focusses on cross-modality registration, where only purely geometric information about the 3D model is available. The problem is cast as a 2D-3D mixture model alignment task, and its solution relies on the globally optimal branch-and-bound search algorithm (See Figure 5). “Motion Estimation of Non-holonomic Ground Vehicles from a Single Feature Correspondences Measured over n Views” establishes an n-linear constraint on the locally circular motion of non-holonomic vehicles able to handle an arbitrarily large and dense window of views. (See Figure 6), inspired by the planar tri-focal tensor and its ability to handle lines.

Figure 5. The spheres represent Gaussian modes of a 3D point cloud distribution. The goal is to find the pose of the camera such that its projection onto the camera unit sphere aligns with the measured projected point cloud distribution.

Figure 6. Approximation of car motion with the Instantaneous Centre of Rotation (ICR)

Professor Yu Jingyi’s research group proposed a novel ray-space projection model (See Figure 7) to transform sets of rays captured by multiple light field cameras in term of the Plucker coordinates in their paper “Ray-Space Projection Model for Light Field Camera,” also accepted by CVPR. They first derive a ray-space intrinsic matrix based on multi-projection-center (MPC) model. A homogeneous ray-space projection matrix and a fundamental matrix are then proposed to establish ray-ray correspondences among multiple light fields. Experimental results on both synthetic and real light field data have verified the effectiveness and robustness of the proposed model.

Figure 7. Ray-space projection model and ray-ray transformation among two light field cameras

In the field of Language, three papers were received by ACL 2019, including two from Assistant Professor Tu Kewei’s research group. “Enhancing Unsupervised Generative Dependency Parser with Contextual Information” proposes a novel probabilistic model called discriminative neural dependency model with valence (D-NDMV). (See Figure 8). “Second-Order Semantic Dependency Parsing with End-To-End Neural Networks” proposes a second-order semantic dependency parser that takes into consideration the interaction between two dependency edges aiming at identifying the semantic relationship between words in a sentence. They show that a second-order parsing can be approximated using two inference algorithms and can be transformed into an end-to-end neural network for training. (See Figure 9). They also present a range of sentiment grammars for using neural networks to model sentiment composition explicitly in “Latent Variable Sentiment Grammar” in collaboration with Professor Zhang Yue at Westlake University.

Figure 8. The neural network structure for computing grammar rule probabilities

Figure 9. Our model architecture

In the field of Machine Learning, three papers were received by ICML 2019. Two of these papers are from Assistant Professor Manolis Tsakiris. The article “Noisy dual principal component pursuit” mentions that Dual Principal Component Pursuit (DPCP) is a robust subspace learning method that fits a linear subspace to a dataset corrupted by outliers via non-convex optimization. The paper extends the global optimality and convergence theory of DPCP to the case of noisy data. It shows that DPCP outperforms RANSAC in 3D road plane detection applications. In “Homomorphic sensing,” given a linear subspace and a finite set of linear transformations, they develop an algebraic theory which establishes conditions guaranteeing that points in the subspace are uniquely determined from their homomorphic image under some transformation in the set.

Associate Professor He Xuming’s group proposes an efficient and yet flexible non-local relation representation based on a novel class of graph neural networks in their paper “LatentGNN: Learning Efficient Non-local Relations for Visual Recognition.” (See Figure 10) Their key idea is to introduce a latent space to reduce the complexity of graph, which allows them to achieve a linear complexity in computation. Extensive experimental evaluations on three major visual recognition tasks show that their method outperforms the prior works with a large margin while maintaining a low computation cost.

Figure 10: Illustration of the Latent Graph Neural Network

In the field of Universal Artificial Intelligence, two papers were received by IJCAI 2019

Assistant Professor Gao Shenghua’s research group proposes a Margin Learning Embedded Prediction (MLEP) framework (See Figure 11) in their paper “Open-set Supervised Video Anomaly Detection with Margin Learning Embedded Prediction.” Extensive experiments validate the effectiveness of their framework for anomaly detection.

Figure 11. The network architecture of MLEP contains an encoder, a ConvLSTM, a decoder

In “Diffusion and Auction on Graphs,” Assistant Professor Zhao Dengji and his research partner characterize a class of mechanisms/markets to utilize people’s social connections to share (sale) information to their social neighbors (Dr. Zhao’s group has pioneered this research direction since their first work “Mechanism Design in Social Networks,” published at AAAI 2017). These mechanisms utilize the power of our social connections for information sharing and challenge the costly advertising mechanisms used by search engines and social media by eliminating the costly platforms and ensuring revenue-increase for the advertisers/sellers.

In addition to the above-mentioned latest research results, the paper “High-spectral light field stereo matching” from Professor Yu Jingyi, and the paper “Minimum case relative posture calculation using spot light characteristics,” from Assistant Professor Laurent Kneip, were published in *IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) *with an impact factor of 9.455.