Introduction

Cloud data center has become one of the most important IT infrastructures. Building high-performance data centers at low cost requires collective effort of the entire global community. As an attempt to initiate a platform that brings together the most important and forward-looking work in the area for intriguing and productive discussions, the Sixth Workshop on Hot Topics on Data Centers (HotDC 2021) will be held in Beijing, China on December 2nd, 2021.

HotDC 2021 consists of by-invitation-only presentations from top academic and industrial groups around the world. The topics include a wide range of data-center related issues, including the state-of-the-art technologies for server architecture, storage system, data-center network, resource management, etc. Besides, HotDC 2021 includes a student poster session to present recent research works from the data-center teams in Institute of Computing Technology, Chinese Academy of Sciences. The HotDC workshop expects to provide a forum for the cutting edge in data-center research, where researchers/engineers can exchange ideas and engage in discussions with their colleagues around the world. Welcome to HotDC 2021!

Please join the Workshop with VooV Meeting
Time: GMT+8 9:00am ~ 17:00pm, Dec. 02, 2021 (Thursday)
Meeting ID: 426 627 315
Password: 211202
Online streamiing: https://live.bilibili.com/22391464

请使用 腾讯会议加入会议
时间:2021年12月2日(周四),上午 9:00 ~ 下午 5:00
会议号:426 627 315
密码:211202
直播地址:https://live.bilibili.com/22391464


Workshop Schedule

Date: Dec. 02, 2021 (Thursday)
Click here to join, 点击这里加入会议

09:00 - 09:10Opening remark
Yungang Bao, Institute of Computing Technology, Chinese Academy of Sciences
Keynote Session I, Chair: Yungang Bao
09:10 - 09:50Persistent Memory System: A Computable View
Yu Hua, Huazhong University of Science and Technology
09:50 - 10:30存储系统数据去重技术探索
夏文, 哈尔滨工业大学(深圳)
10:30 - 10:40Break
Keynote Session II, Chair: Mi Zhang
10:40 - 11:20Cloud Systems to the Next Level: From High Availability to Observability
Ryan Huang, Johns Hopkins University
11:20 - 12:00Rethinking the Designs of Scalable Storage Class Memory
Jie Zhang, Peking University 
12:00 - 13:30Lunch
Keynote Session III, Chair: Ke Liu
13:30 - 14:10DCN领域研究创新与展望
袁辉, 华为
14:10 - 14:50面向多租户机器学习训练的聚合传输协议
吴文斐, 北京大学
14:50 - 15:00Break
Keynote Session IV, Chair: Sa Wang
15:00 - 15:40Towards High-throughput Computing in Serverless
Laiping Zhao, Tianjin University
15:40 - 16:20Selective Replication in Memory-Side GPU Caches
Xia Zhao, Academy of Military Science
16:20 - 17:00Boosting Data Centers Performance with the Entangling Instruction Prefetcher
Alberto Ros, University of Murcia
Student Poster Session, Chair: Dejun Jiang
17:00 - 19:00, Lecture hall on the fourth floor, ICT
  • SEER: A Time Prediction Model for CNNs from GPU Kernel's View, Guodong Liu

  • DNCSim: Learning micro-architecture behaviors with Differentiable Neural Computer,
    Yaoyang Zhou

  • Oops! It‘s Too Late. Your Autonomous Driving System Needs a Faster Middleware, Tianze Wu

  • Tighter Bounds of Speedup Factor of Partitioned EDF for Constrained-Deadline Sporadic Tasks, Zhenyu Sun

  • Precise Memory Controller Evaluation: Bridging the Gap Between Core and Memory on FPGA, Zuojun Li

  • Designing Remote Prefetching Runtime with Transparent Memory Tracking, Haifeng Li

  • Asynchronous Memory Access Extension for General Purpose Processors with Message Interface based Memory Systems, Luming Wang

  • Towards Holistic and Systematic Benchmarking for AI Inference Accelerators, Zihan Jiang

  • LightKV: A Cross Media Key Value Store with Persistent Memory to Cut Long Tail Latency, Shukai Han

  • Dalea: A Persistent Multi-level Extendible Hashing with Reduced Tail Latency, Ziwei Xiong

  • 青云高并发协议栈, 王浩昆

Keynote Speakers

Topic: Persistent Memory System: A Computable View
Speaker: Yu Hua, Huazhong University of Science and Technology

Bio: Dr.Yu Hua is a professor at the School of Computer Science and Technology, Huazhong University of Science and Technology. He was Postdoc Research Associate in McGill University in 2009 and Postdoc Research Fellow in University of Nebraska-Lincoln in 2010-2011. He obtained his B.E and Ph.D degrees respectively in 2001 and 2005. His research interests include cloud storage systems, file systems, non-volatile memory architectures, etc. His papers have been published in major conferences, including OSDI, FAST, MICRO, USENIX ATC, SC, HPCA. He serves as PC (vice) chair in ACM APSys 2019 and ICDCS 2021, and PC member in OSDI, FAST, ASPLOS, USENIX ATC, EuroSys, SC. He is the distinguished member of CCF, and senior member of ACM and IEEE. He has been selected as the Distinguished Speaker of ACM and CCF.

Abstract: Persistent memory (PM) provides large capacity, near-zero standby power and high performance for real-world applications. The PM-enabled system becomes more and more important to bridge the gap between applications and devices. In this talk, I will present our recent work that exploits and explores the near-data property within the computable PM, which delivers high performance, efficiently handles crash consistency and supports concurrent operations.

Topic: 存储系统数据去重技术探索
Speaker: 夏文, 哈尔滨工业大学(深圳)

Bio: 哈尔滨工业大学(深圳)计算机学院副教授、博士生导师。主要研究方向为数据存储系统、去重压缩等,在FAST、USENIX ATC、IEEE TC、PIEEE等会议和期刊上发表论文60余篇,授权国内外专利25项;研究工作曾获得教育部自然科学一等奖、湖北省科技进步一等奖、中国电子学会优秀博士学位论文奖等荣誉;研究成果已被Ceph、rdedup等多个知名开源项目采纳。

Abstract: 目前全球每年的数据量呈现爆炸式增长,这极大地增加了数据中心存储系统的成本和负担,数据去重是一项适用于大规模存储系统的数据缩减技术,本次报告将主要介绍课题组在数据去重分块和去重后存储管理的最新研究工作,以及课题组对未来数据缩减技术的智能化、多元化、可维护性等方面的一些思考。

Topic: Cloud Systems to the Next Level: From High Availability to Observability
Speaker: Ryan Huang, Johns Hopkins University

Bio: Dr. Ryan (Peng) Huang is an Assistant Professor in the Department of Computer Science at Johns Hopkins University. He leads the Ordered Systems Lab at JHU, which conducts research broadly in distributed systems, operating systems, cloud computing, and mobile systems. His work received multiple best paper awards in top systems conferences. He is a recipient of the NSF CAREER Award.

Abstract: Classic techniques such as state-machine replication have made it feasible to construct fault-tolerant distributed systems at extremely large scales. However, they usually make simple assumptions about the failure model, which do not reflect the complex issues such as gray faults that cloud systems today frequently experience. These complex faults present significant challenges in building highly-available cloud systems. In this talk, I will discuss this problem, and make a case for observability as a critical system design metric. I will describe our recent work to enhance observability to effectively detect, localize, predict, and mitigate complex faults in large systems. I will conclude by outlining some open challenges.

Topic: Rethinking the Designs of Scalable Storage Class Memory
Speaker: Jie Zhang, Peking University

Bio: Dr. Jie Zhang is currently a tenure-track assistant professor of Peking University, China. Before that, he worked as a postdoctoral researcher at KAIST, South Korea. His research interests include storage system, the emerging non-volatile memory and heterogeneous computing. So far, he has published over 40 papers, including 10 CCF-A conference papers as the first author. His research has been listed as "KAIST Breakthroughs 50th Innoversary". For more details, please visit his personal websitehttps://jiezhang-camel.github.io/.

Abstract: The computing power of supercomputers is increasing exponentially by employing more computing nodes. However, the scalability of the memory capacity has fallen behind this increasing trend of computing power. Nowadays, the memory and storage systems have experienced significant technology shifts. Such technology promotions have motivated researchers to re-think and re-design the existing system organization and hardware architecture. This talk mainly shares our research experience of building up a scalable storage class memory for the existing computing system. Our solutions are proposed to address the challenges of a heavy software-stack intervention and eliminate the overheads incurred by the physical boundaries.

Topic: DCN领域研究创新与展望
Speaker: 袁辉, 华为

Bio: 就职于华为数通产品线布尔实验室,从事数据中心前沿技术的研究工作。他于2020年2月毕业于UCL Optical Networks Group并获得博士学位,期间主要研究内容围绕optical DCN和disaggregated DCN,研究成果曾多次发表于光领域顶会和期刊。

Abstract: 随着人工智能等新兴技术的出现,数据中心实时处理数据量激增,需要有更多的算力和更快的存储介质。除此之外,网络作为节点互联的技术正逐渐成为提升算力、资源使用效率的关键瓶颈。本次演讲将介绍我们近期在数据中心网络方面的一些已有的研究工作以及对为未来研究方向的展望。

Topic: 面向多租户机器学习训练的聚合传输协议
Speaker: 吴文斐, 北京大学

Bio: 吴文斐2015年于美国威斯康星大学麦迪逊分校获得博士学位,现在北京大学信息科学学院任助理教授。吴博士从事系统网络方向的研究,在国际高水平会议和期刊上发表论文36篇。吴博士的博士课题是虚拟网络故障诊断,获得SoCC13最佳学生论文奖;他在5g网络传输层的设计获得IPCCC19最佳论文提名。吴博士研发的高性能机器学习基础设施获得NSDI21最佳论文,为中国地区首次争取到该荣誉。

Abstract: 随着机器学习数据集和模型的增大,机器学习的训练过程逐步被分布式部署到多服务器上,其中多worker向参数服务器PS交换梯度、更新模型的计算方式是一种典型的体系结构。但是,在这种体系结构下,PS容易成为通信瓶颈。我们设计了聚合传输协议ATP来解决这一瓶颈,同时支持在数据中心中的多租户多机柜部署。ATP利用最近的可编程交换机技术,将参数聚合的过程卸载到交换机上,从而减小了PS的网络流量和计算量。ATP协议包括交换机上的网内聚合计算服务、终端服务器的可靠传输、和高吞吐网卡的加速技术。我们将ATP对接PyTorch并在AlexNet、VGG等常用模型上进行测试,证明ATP能够有效的加速机器学习的效率。

Topic: Towards High-throughput Computing in Serverless
Speaker: Laiping Zhao, Tianjin University

Bio: Dr. Laiping Zhao is an associate professor with the college of intelligence and computing, Tianjin University. He received the BS and MS degrees from Dalian University of Technology, China, in 2007 and 2009, and the PhD degree from the Department of Informatics, Kyushu University, Japan, in 2012. His research interests include cloud computing and operating system and has over 40 publications in this field, e.g., SC, Eurosys, HPDC, ICDCS, ICPP, TPDS, TCC, TSC, JSA. His research is supported by funding from National Key Research and Development Program, NSFC, Tianjin Municipal S&T Bureau, Huawei, Meituan, etc.

Abstract: Serverless computing has grown rapidly in recent years due to its low-cost and management-free operation properties. Many applications are being deployed in commercial serverless platform. We characterize the serverless computing features and find that serverless computing tends to decrease the resource efficiency severely. We explore to improve the resource efficiency in serverless through fine-grained resource allocation and proactive scheduling.

Topic: Selective Replication in Memory-Side GPU Caches
Speaker: Xia Zhao, Academy of Military Sciences

Bio:  Xia Zhao received the PhD degree in computer science and engineering from Ghent University in 2019. He currently is an Assistant Researcher at the Academy of Military Science, China. His research interests include GPU architecture in general, and multi-program execution, cache hierarchy optimization and Network-on Chip (NoC) design more in particular. He has served as a member of the External Review Committee of the leading computer architecture conferences ISCA and MICRO.

Abstract: Data-intensive applications put immense strain on the memory systems of Graphics Processing Units (GPUs). To cater to this need, GPU memory systems distribute requests across independent units to provide high bandwidth by servicing requests (mostly) in parallel. We find that this strategy breaks down for shared data structures because the shared Last-Level Cache (LLC) organization used by contemporary GPUs stores shared data in a single LLC slice. Shared data requests are hence serialized — resulting in data-intensive applications not being provided with the bandwidth they require. A private LLC organization can provide high bandwidth, but it is often undesirable since it significantly reduces the effective LLC capacity.
In this talk, I will discuss our recent work the Selective Replication (SelRep) LLC which selectively replicates shared read-only data across LLC slices to improve bandwidth supply while ensuring that the LLC retains sufficient capacity to keep shared data cached. The compile-time component of SelRep LLC uses dataflow analysis to identify read-only shared data structures and uses a special-purpose load instruction for these accesses. The runtime component of SelRep LLC then monitors the caching behavior of these loads. Leveraging an analytical model, SelRep LLC chooses a replication degree that carefully balances the effective LLC bandwidth benefits of replication against its capacity cost. SelRep LLC consistently provides high performance to replication-sensitive applications across different data set sizes. More specifically, SelRep LLC improves performance by 19.7% and 11.1% on average (and up to 61.6% and 31.0%) compared to the shared LLC baseline and the state-of-the-art Adaptive LLC, respectively.

Topic: Boosting Data Centers Performance with the Entangling Instruction Prefetcher
Speaker: Alberto Ros, University of Murcia

Bio: Alberto Ros is full professor in the Computer Engineering Department at the University of Murcia, Spain. Funded by the Spanish government to conduct the PhD studies he received the PhD in computer science from the University of Murcia in 2009. He held postdoctoral positions at the Universitat Politècnica de València and Uppsala University. He received an European Research Council Consolidator Grant in 2018 to improve the performance of multicore architectures. Working on cache coherence, memory hierarchy designs, memory consistency, and processor microarchitecture, he has co-authored more than 80 peer-reviewed articles. He has been inducted into the ISCA Hall of Fame. He is IEEE Senior member.

Abstract: As software-as-a-service and Cloud computing become increasingly popular, server and Cloud applications exhibit notoriously large instruction sets that do not fit in the first level instruction cache (L1I), leading to high L1I miss rates and therefore stalls. This causes significant performance degradation, in addition to wasteful energy expenditure and under-utilization of resources. Prefetching instructions emerges then as a fundamental technique for designing high-performance data-center computers.
This talk introduces the concept of Entangling Prefetchers, which have a focus on timeliness. We present an Entangling Prefetcher for Instructions, which works by finding which instruction should trigger the prefetch for a subsequent instruction, accounting for the latency of each cache miss. Our evaluation using data-center applications shows that with 40KB of storage, Entangling can increase performance up to 23%, outperforming state-of-the-art prefetchers.

Organizing Committee

General Chair

Yungang Bao, Institute of Computing Technology, Chinese Academy of Sciences

TPC Chair

Mi Zhang, Institute of Computing Technology, Chinese Academy of Sciences

Organization Committee

Ke Liu, Institute of Computing Technology, Chinese Academy of Sciences
Sa Wang, Institute of Computing Technology, Chinese Academy of Sciences
Dejun Jiang, Institute of Computing Technology, Chinese Academy of Sciences
Wanling Gao, Institute of Computing Technology, Chinese Academy of Sciences
Ke Zhang, Institute of Computing Technology, Chinese Academy of Sciences
Biwei Xie, Institute of Computing Technology, Chinese Academy of Sciences
Wenya Hu, Institute of Computing Technology, Chinese Academy of Sciences
Zhiwei Lai, Institute of Computing Technology, Chinese Academy of Sciences
Zirui Wang, Institute of Computing Technology, Chinese Academy of Sciences
Zhimeng Li, Institute of Computing Technology, Chinese Academy of Sciences

Contacts

Mi Zhang (zhangmi@ict.ac.cn)
Zirui Wang (wangzirui18@mails.ucas.ac.cn)