DeepZs, the joint team comprising two SIST junior students WANG Ruoyu (majored in Computer Science and Technology) and ZHANG Huifan (majored in Electronic Engineering) and four master students from Zhejiang University won 3rd place in the System Design Contest (SDC) organized by the Design Automation Conference (DAC) 2019, the flagship conference in the area of IC design and design automation. Our two Juniors are instructed by SIST Professor ZHOU Pingqiang.
The mission of the contest was to design a deep learning algorithm on embedded platforms which could achieve high-speed, high-accuracy and low-power real-time object detection by drones. Each registered team could choose to implement their design on either GPU (Nvidia TK2) or FPGA (Xilinx Ultra 96) platform. All the training and test dataset used in the competition was provided by DJI company, a world leader in camera drones/quadcopters, and there were 95 different objects in the data set.
The contest attracted a total of 110 teams from all over the world, of which 52 teams were in the GPU group and 58 teams were in the FPGA group. The champion team in the GPU group is comprised of several PhD students and one postdoc from the University of Illinois at Urbana-Champaign, together with one researcher from IBM Watson Research Center. The members of the runner-up team consisted of PhD/postdoc from Tsinghua University and researchers from the TsingMicro Intelligent Tech company in China.
The difficulty of this competition lies in two aspects. First, there are many small objects in the data set which are not identifiable even by human eyes. Second, in addition to detection accuracy, the contest also compares the designs in terms of processing speed (at least 20 frames per second, FPS) and power consumption. Therefore, each team is required to make balance between the model’s complexity and its processing time. In general, the more complex the model, the slower the processing speed.
The solution adopted by DeepZS is based on the convolutional neural network of TinyYOLO and Feature Pyramid Network. The framework adopts ResNet18 as the network backbone, and a mix of two feature maps are generated for the object detection: one is generated by the last layer of the deep neural network which is used to process larger objects in the input images and the other one is generated by the intermediate convolution layer which helps to identify the small objects. The main work done by the two SIST juniors is the optimization in the inference stage. The model obtained by training TinyYOLOv3 on Caffe is automatically converted into the framework of the TensorRT inference acceleration model, which achieves a high processing speed of > 47 FPS. The final performance of the design by the DeepZS team are: IoU-0.7232, Power-15,119mW and FPS-26.37.
Through this competition, we have a deeper understanding of the object detection algorithm, the design and implementation of neural network models, and the hardware acceleration of deep learning. Our sense of cooperation is enhanced and we also notices the gap between world-class students and us. This experience makes us clarify our future direction of study and research, and strengthen our determination to strive and work harder.