Projects

1)Open-source Planet-scale Computers  (PSC) for Emerging and Future Applications

Participators:Frontier System Laboratory, Software System Laboratory, Architecture Laboratory, Data System Laboratory

Status:Ongoing

Research the requirements and challenges of emerging applications and future applications such as medical emergency, industrial digital twin, intelligent defense, future driving, metaverse, etc. Based on the concept of IoTs-Edges-Cloud-as-a-Computer, research the intelligent collaboration model and resource scheduling mechanism, build scalable scenario simulator to support the verification and evaluation of new technologies such as network, storage, and hardware, develop open source Planet-scale computer system with tens of billions of service capabilities for IoT, edge and cloud integration.



2)Xiangshan—An Open-Source High-Performance RISC-V Processor.

Participators: Frontier System Laboratory, Architecture Laboratory

Status: Ongoing.

Xiangshan performs better than any other open-source RISC-V processors all over the world. Also, the source code has raised more than 2600 stars at GitHub. Further, it draws lots of attention and is under support of domestic and foreign companies, in which 16 companies collaboratively develop under the foundation of “Xiangshan”. This research will accelerate the construction of RISC-V ecology.



3)One Student One Chip Initiative.

Participators: Frontier System Laboratory, Architecture Laboratory

Status: Ongoing.

The project guides students to design a tape-out open-source processor by combing EE with CS. It can help students improve their capacity of implementing software and hardware systems and learn how to design chips. Meanwhile the project trains talents to be transferred to the high-performance processor "Xiangshan", open source EDA, open source IP and other teams and communities, which will continue to cultivate excellent reserve forces for advanced research and development of CS in China.



4)Open source chip design method and whole-picture workload analysis tool based on observation-reference-fusion-feedback loop

Participators:Frontier System Laboratory, Software System Laboratory, Architecture Laboratory

Status:Ongoing

The observation-reference-fusion-feedback loop based design method systematically observes the fine-grained design space across the IR layer, ISA layer, and microarchitecture layer, and provides feedback for design based on the differences with the standard reference implementation and multi-layer fusion, realizes open source chip design based on the loop of Observation, Reference, Fusion, and Feedback.



5)Platform for intelligent across-layer co-design

Participators:Frontier System Laboratory, Software System Laboratory

Status:Ongoing

Perform characteristic analysis on typical loads of IoT and AIoT, build large-scale performance data sets covering different levels of design space, develop intelligent collaborative search algorithms, and realize software and hardware collaborative design through automated and intelligent methods.



6)The Information Superbahn Measurement and Control System.

Participators: Frontier System Laboratory, Software System Laboratory

Status: Ongoing.

The Information Superbahn is a new information infrastructure proposed by ICT. The measurement and control system are the core of the Information Superbahn, which provides low-latency and high-throughputs services by some control mechanisms such as labelling and regulation.



7)Data center node operating system:RainForest。

Participators: Software System Laboratory

Status: Finished (2012-2017)

Propose that the operating system should be designed and optimized for average performance and worst-case performance at the same time, propose the abstraction of subOS (heavy OS + light OS) and implement the node operating system, RainForest. Transferred operating system prototypes to Huawei and applied for 34 patents, including 8 US patents and 3 PCT patents.



8)Unified abstraction method and system implementation for modern computer workloads (big data, artificial intelligence, Internet services, etc.)

Participators: Software System Laboratory

Status:Ongoing

In order to solve the fragmentation and portability problems of modern computer workloads, the computing abstraction and management abstraction of modern computer workloads are studied. To build a unified abstraction theory for modern computer workloads, and complex workloads can be constructed based on the combination of abstractions. To build efficient machine learning compiler and big data system with portability and scalability based on the unified abstraction. Research extended instruction set and microarchitecture from the aspect of architecture.



9)Basic theories, methods, and tools for benchmarking science and engineering

Participators: Software System Laboratory

Status:Ongoing

Unlike classical quantities, such as time and length, which have essential properties, the observation objects of computer discipline only have external properties, which are determined by the problem definition and specific solutions. Thus, there is a need to develop benchmark science and engineering that is independent of metrology. Research on basic algorithms and intelligence sources, and develop benchmarks related to AI, big data, metaverse, and CPU, based on benchmarking science and engineering methods,.



10)An Open-Source EDA Project.

Participators: Frontier System Laboratory

Status: Ongoing.

It builds an open-source EDA software tool to achieve automated chip design, which can reduce the cost of designing chips. It consists of the following steps. At first, we need to abstract and formulate the key technique problems. Second, we design a software architecture for it by focusing on the key problems, paying attention to the quality of code and documents, and implementing the prototype. Third, we collect more than 100 SoC chips, generate several datasets that cover various scenarios, and support taping out. It helps us improve the performance and availability. Further, it can be combined with the critical open-source IP and promote solving the open-source chip problem.



11)Memory access pattern optimization engine on Storage system

Participators:Architecture Laboratory

Status:Ongoing

Designs lightweight hardware, optimizes the memory access patterns on mobile memory system. Makes memory accesses from random to high-paralleled (Memory access sprinting), improves performance and reduces energy consumption. 



12) Message Interfaced Memory System

Participators:Architecture Laboratory

Status:Ongoing

Message Interfaced Memory System (MIMS) is a new memory access architecture based on asynchronous request and response messages, which was proposed around 2012. MIMIS changes the traditional synchronous memory system that has been used for decades. By adopting a definable message memory interaction interface, it introduces processing logic in the memory controller and buffer scheduler. MIMS can flexibly support innovative technologies in various memory systems, thus improving the utilization of memory channels, improving the scalability of memory system capacity, and making the processors adaptive to various new memory devices and memory acceleration components. In order to efficiently implement MIMS, we need to face the design challenges of processor asynchronous memory access instructions, message-based memory controller, MIMS protocol, multi-functional memory servers and other systems.



13) Real-time Hybrid Memory Trace and analyzing Toolkit (HMTT)

Participators:Architecture Laboratory

Status:Ongoing

HMTT monitors the DDRx bus command and address information flow and returns a large number of traces through the high-speed channel in real time. It combines a small amount of metadata information obtained by the embedded software plug-ins of the system under test to analyze the program memory access behavior offline, which is used to guide the architecture design and application optimization. Traces can also be returned to the system under test in real time to guide the real-time analysis and scheduling of the system. Four generations of HMTT have been developed since 2007, and dozens of domestic and foreign research units and enterprises have adopted memory traces or purchased equipment.



14) DASICS (Dynamic in-Address Space Isolation by Code Segments) 

Participators:Architecture Laboratory

Status:Ongoing

The traditional processor security mechanism does not check memory access in the same address space (for example, in a single process), and a large number of attacks such as buffer overflow occur in the same address space. By modifying the processor, DASICS carries out fine-grained area divisions and permission settings in the same address space, and performs real-time inspection when memory access instructions are executed on piplines, so as to effectively prevent unexpected memory access operations in various domains.



15)C10m user-space network stack QStack

Participators: Architecture Laboratory

Status: Ongoing

Facing the needs and challenges of the new generation of cloud computing server applications such as mobile Internet, Internet of Things, Internet, etc., we proposed a user-space network stack QStack supporting high concurrency and high quality of service. It adopts a flexible system architecture to schedule the CPU on demand according to the application needs, and provides full-data-path priority support to ensure the response tail latency of delay sensitive requests. It also supports zero copy including the whole process of sending and receiving, the whole stack with lock free to reduce the extra system overhead, and the intelligent network card offloading as an option. We prototype a system oriented to typical IoT service scenarios, which can get tens of millions concurrency with single mainstream server and hundreds of millions of level concurrency with multiple servers under the condition of ensuring the quality of service, while ten million concurrency for one mainstream X86 server (C10m problem) is a big challenge. It has published a number of papers and applied for patents. The system has been open source now.



16)SERVE: A cloud platform for agile development of open source processor chip

Participators: Architecture Laboratory

Status: Ongoing

In response to the national call to deal with the current situation of the processor chip shortage, the SERVE cloud platform is developed to aid the agile development of open source processor chips. It can accelerate the open source processor chip design process and quickly educate or train chip design talents. The SERVE platform enables chip designers and developers to put their whole process of software and hardware design on to the cloud.  The "one-click" automatic development on the cloud reduces the development environment construction time by matter of magnitudes, reduces the design and development time by half, and significantly increases the number of concurrent workflows. The key research points of the platform include: 1. customized design of FPGA cloud server to support and accelerate chip design simulation verification, 2. abstraction of cloud resources/cloud services that satisfies the demand of chip design task load, and 3. exploring distributed computing resource scheduling methods to improve platform scalability. The research team continues to update and improve the platform, and is currently carrying out the research and development of the third-generation cloud platform. 



17)FuYao: a distributed storage system with predicable tail latency for hybrid workloads in datacenters 

Participators: Data System Laboratory

Status: Ongoing


Datacenter applications can be grouped into two broad classes: one has clear requirements for latency and tail latency, such as search engines; another expects sustainable high bandwidth, such as big data. To fully utilize system resources, datacenter uses a distributed storage system to serve both types of applications simultaneously. Distributed storage systems have to provide deterministic performance and high throughput to serve these applications and meet their respective performance requirements. The advent of low-latency storage devices, high-speed network devices, and new hardware with in-device computational logic has brought new opportunities and challenges to the design of distributed storage systems.

We are designing and developing FuYao, a distributed storage system. The features of FuYao include: an architecture that separates data from metadata; providing file/block interface; supporting a variety of KV engines; a central controller that allocates resources on demand; guaranteeing deterministic performance QoS across the entire data path; a RDMA-based network communication module that provides scalable, high-performance network communication, and fully utilize the capabilities of high-speed network devices; within the storage server which adopts RTC architecture, efficient request queue model, resource allocation and scheduling strategy are built to fully utilize the capabilities of high-performance storage devices. In addition, we also explore offloading some functionalities to in-device computational logic and the impact on performance QoS by doing this.

Figure 1 Distributed storage system FuYao


Representative achievements: QWin, enforcing differentiated tail latency SLOs for multiple tenants: 1) accurately calculating core requirements by an SLO-to-model; 2) quickly detecting changes in core requirements and allocating sufficient cores; 3) guaranteeing differentiated tail latency SLOs for multiple tenants, and increasing resource utilization. This work was published in IEEE ICDCS 2022.



18)TianChi: a high-performance memory pool

Participators: Data System Laboratory

Status: Ongoing


Distributed memory pool allows dynamic memory allocation, to provide high-performance, highly scalable, and highly reliable data access. The key-value interface is a common form of the memory pool, and datacenter applications have a huge demand for key-value data store. Our research group has carried out lots of research work on key-value stores in recent years, to further construct efficient distributed memory pool. Moreover, with the development of new storage hardware and network technologies, such as non-volatile memory (NVM), new solid-state drives (Optane SSD), remote direct memory access (RDMA), the design and implementation of key-value storage systems face new opportunities and challenges. Thus, we have designed a series of high-performance key-value storage systems and published related papers in top-ranking international academic conferences, such as HiKV (ATC 2017), LightKV (MSST 2020), SplitKV (HotStorage 2020). Among them, HiKV is the first hybrid index structure built on hybrid memory, which significantly improves the system throughput. We delve into the research on designing and implementing high-performance key-value stores based on new hardware, to satisfy the needs of various upper-layer applications for low-latency data access.

  

Figure 3. Architecture of hybrid index and procedure to serve KV operations in HiKV.


Figure 4. Architecture of LightKV.


Figure 5. System overview of SplitKV.




19)Wukong: RISC-V based Open-Sourced SSD Controller Platform

Participators: Data System Laboratory

Status: Ongoing


Software-Hardware co-design is becoming the trend in storage system design. Solid State Drives (SSDs) are widely used in the scenarios, such as database and distributed storage system. It is critical to be able to implement and evaluate innovative ideas using a real SSD platform. Thus, we design and implement an open-sourced SSD SoC platform Wukong with either ARM-based processor or RISC-V-based processor.

Figure 6 shows the whole architecture of RISC-V-based Wukong. We design and implement NVMe controller, Flash controller and firmware for standard SSD. Meanwhile, we explore new forms of SSD such as ZNS SSD and open-channel SSD on Wukong platform. In addition, according to the running requirement of SSD controller and firmware, we design and implement a 9-stage pipeline double issue 64bit RISC-V processor with RV64IMAC and GShare predictor.

Figure 6 SSD Controller Architecture(RISC-V processor)


By using Wukong platform, we can explore the following works:

1)develop and optimize NVMe controller, processor, Flash controller and firmware;

2)software-hardware co-design for storage system and database, such as software-defined garbage collection and near-storage computing;

3)research on security issues within SSD controller

Current specs of Wukong platform: supporting ARM-based and RISC-V based processor, 4 channels, 1TB capacity and 400~500MB bandwidth。