创新及科技解决方案

解决方案编号

S-0217

解决方案名称

nCloud Fault-Tolerant Geo-distributed Hyperconverged Infrastructure (HCI)

解决方案描述

To support hyper-scale storage management across multiple geographical regions, we posit that Hyperconverged Infrastructure (HCI) should be geared towards geo-distributed deployment. A geo-distributed HCI can follow a hierarchical topology: it is composed of multiple DCs that reside in different geographical regions, in which each DC is composed of multiple nodes (or servers) for storage. Thus, a geo-distributed HCI unifies the storage space across different geographical regions to form a global-scale storage pool.


One critical deployment requirement of a geo-distributed HCI is to maintain availability and durability guarantees in the presence of failures, ranging from transient failures where data is temporarily unavailable (e.g., power loss, network disconnection, system upgrades, reboots), to permanent failures where data is permanently lost (e.g., disk crashes, sector errors). Cloud outages are common in practice.


Storing data with redundancy is a typical fault-tolerant approach, as any lost or unavailable data can be recovered through the available redundant data. Traditional distributed storage systems use replication to provide fault tolerance. The idea is to create multiple identical copies (called replicas) for each data chunk and distribute the replicas across different nodes. However, the storage overhead of replication is significant, and poses a scalability concern with the unprecedented growth of data volume.


We present our core technology, network coding, proposed by the research team at the Chinese University of Hong Kong in 2000. Network coding provides a promising low-cost redundancy mechanism to achieve fault tolerance, while significantly reducing the storage overhead compared to traditional replication and achieving optimal repair performance with theoretical guarantees. It belongs to a special class of erasure coding. We first formalize the terminologies of erasure coding, and then elaborate how network coding improves the repair performance.

应用领域

基础设施

GovCloud

使用的技术

云端运算

Network Coding

使用例子

High storage efficiency: nCloud leverages network coding to minimize the amount of storage redundancy compared to traditional replication, thereby saving the long-term operational costs.


High performance: nCloud mitigates the cross-DC traffic to maintain high repair performance. It also mitigates the overhead on other applications that share the cross-DC bandwidth.


High fault tolerance: nCloud achieves high fault tolerance via network coding in two aspects: (i) availability, which means that any unavailable data remains accessible through data redundancy, and (ii) durability, which means there is no data loss in the face of node or DC failures.


Security via diversity: nCloud exploits secret sharing (see our previous work CDStore [22]) to generate secure coded chunks (called shares), such that an adversary cannot recover the stored data if the number of shares is insufficient.


Software-defined storage management: nCloud provides a configurable, software-defined storage management framework to readily address the heterogeneity of storage resources and application requirements.

若政府部门欲对创科方案进行PoC试验或技术测试,请联络Smart LAB。