Visit ShanghaiTech University | 中文 | How to find us
HOME > News and Events > Events
Coding Problems in Distributed Data Storage Systems
Date: 2014/3/18             Browse: 773
Speaker: Dr. Chao Tian 

Time:       March 18, 2014 (Tuesday), 3:15 to 4:15pm

Location: Room 220, Building 8#

Abstract:

The rapid advances in big data analytics and cloud-based applications require the deployment of high-performance, highly reliable, and cost-effective distributed data storage systems at a scale never expected before. In this two-part talk, results on two coding problems in such systems will be presented. In the first part, we provide answer to a fundamental question in repair-efficient erasure codes (i.e., regenerating codes) which remains open despite numerous research efforts, and show that the optimal storage-repair-bandwidth tradeoff for functional repair regenerating codes is not the same as that of the exact-repair counterpart. In order to overcome the main difficulty of deriving outer bounds using the conventional manual proof approach, a novel computer-aided proof (CAP) approach is developed based on linear programming. A new construction of exact-repair regenerating codes will also be presented. In the second part of the talk, we consider failure-resilient content delivery from distributed storage devices in the multiple description (MD) framework. Characterizing the rate-distortion (RD) region for MD coding is a well-known open problem in information theory. Knowing this difficulty, we instead aim to find an approximate characterization of the RD region. By making connection to the lossless version of the MD problem, we show that a simple separation-based approach is in fact approximately optimal, and thus it may be sufficient in many practical scenarios. This success also suggests that the general approach of translating the solution to a lossless coding problem into an approximate solution to its lossy coding counterpart can be effective on other coding problems.

Bio:

Dr. Tian received the B.E. degree in Electronic Engineering from Tsinghua University, Beijing, China, in 2000 and the M.S. and Ph. D. degrees in Electrical and Computer Engineering from Cornell University, Ithaca, NY in 2003 and 2005, respectively. He was a postdoctoral researcher at Swiss Federal Institute of Technology at Lausanne (EPFL) from 2005 to 2007, and then joined AT&T Labs-Research, Florham Park, New Jersey. Dr. Tian received the Liu-Memorial Award at Cornell University in 2004, and AT&T Key Contributor Award in 2010 and 2011. He is an associate editor of the IEEE Signal Processing Letters, and an adjunct associate professor at Columbia University. 

SIST-Seminar Series-14003 Chao TianChao TianChao TianChao TianThe rapid advances in big data analytics and cloud-based applications require the deployment of high-performance, highly reliable, and cost-effective distributed data storage systems at a scale never expected before. In this two-part talk, results on two coding problems in such systems will be presented. In the first part, we provide answer to a fundamental question in repair-efficient erasure codes (i.e., regenerating codes) which remains open despite numerous research efforts, and show that the optimal storage-repair-bandwidth tradeoff for functional repair regenerating codes is not the same as that of the exact-repair counterpart. In order to overcome the main difficulty of deriving outer bounds using the conventional manual proof approach, a novel computer-aided proof (CAP) approach is developed based on linear programming. A new construction of exact-repair regenerating codes will also be presented. In the second part of the talk, we consider failure-resilient content delivery from distributed storage devices in the multiple description (MD) framework. Characterizing the rate-distortion (RD) region for MD coding is a well-known open problem in information theory. Knowing this difficulty, we instead aim to find an approximate characterization of the RD region. By making connection to the lossless version of the MD problem, we show that a simple separation-based approach is in fact approximately optimal, and thus it may be sufficient in many practical scenarios. This success also suggests that the general approach of translating the solution to a lossless coding problem into an approximate solution to its lossy coding counterpart can be effective on other coding problems.

SIST-Seminar Series-14003