Learning Neural Networks with Adaptive Regularization

Speaker:   Mr. Han Zhao

Time:       15:00-16:00, Aug. 5

Location:  SIST 1A 200

Host:       Prof. Kewei Tu


Feed-forward neural networks can be understood as a combination of an intermediate representation and a linear hypothesis. While most previous works aim to diversify the representations, we explore the complementary direction by performing an adaptive and data-dependent regularization motivated by the empirical Bayes method. Specifically, we propose to construct a matrix-variate normal prior (on weights) whose covariance matrix has a Kronecker product structure. This structure is designed to capture the correlations in neurons through backpropagation. Under the assumption of this Kronecker factorization, the prior encourages neurons to borrow statistical strength from one another. Hence, it leads to an adaptive and data-dependent regularization when training networks on small datasets. To optimize the model, we present an efficient block coordinate descent algorithm with analytical solutions. Empirically, we demonstrate that the proposed method helps networks converge to local optima with smaller stable ranks and spectral norms. These properties suggest better generalizations and we present empirical results to support this expectation. We also verify the effectiveness of the approach on multiclass classification and multitask regression problems with various network structures.


Han Zhao is a 4th-year PhD student in the Machine Learning Department at Carnegie Mellon University, advised by Prof. Geoffrey J. Gordon. He has broad interests in theoretical and applied machine learning and artificial intelligence. In particular, he works on efficient probabilistic reasoning with Sum-Product Networks (SPNs), adversarial representation learning, and computational social choice. Notably, his work on SPNs solves fundamental problems related to expressiveness and efficient computation in SPNs, and it serves as a bridge to connect probabilistic graphical models and deep neural networks. Han Zhao has research experience at Baidu Research, Microsoft Research, and D. E. Shaw. Before coming to CMU, he obtained his bachelor degree in computer science from Tsinghua University (honored as a Distinguished Graduate) and masters degree in mathematics from the University of Waterloo (honored with the Alumni Gold Medal Award).

SIST-Seminar 18191