Routing and Scheduling in Optical Data Center Networks for Emerging Cloud Applications

发布时间:2024-02-23浏览次数:187

Speaker:  Jialong Li, MPI-INF.

Time:       10:00-11:00 am, Feb.28

Location: SIST 1A200

Host:      Zhice Yang

Abstract:

Optical data center networks show promise to serve as the next-generation cloud infrastructure with their cost and power benefits. As circuit-switched networks, optical data center networks set up dedicated optical circuits between endpoints before they can exchange data. On one hand, this property makes optical data center networks a good fit for bulk data transfers in critical cloud applications, such as machine learning and parallel computing. On the other hand, this nature deviates from traditional packet-switched networks and brings unique challenges for routing and task scheduling. In this talk, we present our solutions to routing and tasking scheduling in optical data center networks. We first introduce Hop-On Hop-Off (HOHO) routing that leverages programmable switches to accelerate flow transmission by up to 35%, and then give an overview to Network-Aware GPU Sharing (NAGS) that allocates distributed training jobs on GPUs to minimize training time and maximize GPU utilization.

Bio:

Jialong Li is a postdoctoral researcher at Max Planck Institute for Informatics (MPI-INF). Before joining MPI-INF, he received his B.E. and Ph.D. degrees in Electronic Engineering from Tsinghua University in 2016 and 2021, respectively. His research interests include optical networks, optical data center networks, and network-accelerated machine learning systems. He has published more than ten papers in journals and conferences such as JOCN, ToN, APNet, OFC, and served as a reviewer for journals including JOCN, ToN, Mathematics.