BabelBirdBabelBird Docs

Object storage and erasure coding

Self-built object storage server

Babel object storage is an object storage system developed on the open source MinIO system. It inherits MinIO's advantages of high performance, scalability, and data security, and is more suitable for the application scenarios of BabelBird Enterprise Drive.

As an advanced object storage system, Babel object storage system has unparalleled advantages over traditional storage systems in terms of speed, security, stability, high availability, and horizontal expansion.

Advantages of Babel object storage system

Performance

MinIO is claimed to be the fastest object storage server in the world. Published GET/PUT results of over 325 GiB/sec and 165 GiB/sec on 32 NVMe drive nodes and 100Gbe network, a clear advantage compared to Amazon S3 and far superior to traditional HDFS.

High availability: MinIO ensures high reliability and availability through multiple copies of data, failover and automatic recovery, thereby ensuring that data is not lost and business is not interrupted.

Low redundancy and high tolerance for disk damage: the standard and highest data redundancy coefficient is 2 (that is, storing a 1 M data object actually takes up 2 MB of disk space). However, data can still be read even if any n/2 disks are damaged (n is the number of disks in an erasure coding set). And this damage recovery is based on a single object, not based on the entire storage volume.

Security:

MinIO delivers more functionality with the highest levels of encryption and extensive optimizations that virtually eliminate the overhead typically associated with storage encryption operations.

Data is stored in blocks and on each hard disk, and data cannot be restored even if you obtain read and write permissions from the hard disk or server.

Extremely high scalability: MinIO supports distributed deployment and can be expanded horizontally. When more storage space or higher performance is needed, MinIO can be easily expanded by adding new nodes.

At least it supports the deployment of object storage systems on a single machine, instead of the traditional requirement of three independent servers.

Object storage system and erasure coding

About erasure coding

Babel Object Storage System (MinIO) uses erasure coding and checksums to protect data from hardware failures and silent data corruption. Even if half (N/2) of the hard drives are lost in the highest redundancy state, you can still recover data. Erasure coding is a mathematical algorithm for recovering lost and damaged data. MinIO uses Reed-Solomon code to split objects into N/2 data and N/2 parity blocks. This means that if there are 12 disks, an object will be divided into 6 data blocks and 6 parity blocks. Any 6 disks can be lost (regardless of whether they store data blocks or parity blocks) and can still be recovered from the data in the remaining disks. We call the data storage D and the erasure code (parity) P.

By default, the drive is sliced in N/2 data and N/2 parity blocks (customizable to support higher space utilization)

Erasure coding is different from traditional multi-copy technology. It has higher disk utilization and higher data recovery efficiency. (However, many storage manufacturers promote erasure coding as multiple copies and refer to the P number as the number of copies. This is inaccurate and will cause a lot of misunderstandings.)

Multiple copies in the context of Babel refers to adding backup servers to fully synchronize and store data.

Characteristics of erasure coding and differences from raid technology

Erasure coding protects data from multiple drive failures

RAID6 tolerates two drive failures while MinIO erasure coding allows the loss of half the drives

RAID is a volume-level erasure code, while MinIO erasure code is an object-based erasure code (without downtime).

Effective space calculation (space utilization)

Assuming you need 100TB of storage space, how many hard drives you need to buy depends on the redundancy standard you choose. For example: using the 3+1 redundancy method, 100TB of available space requires the purchase of 12 12TB hard drives, 4 hard drives forming 3 groups, 9 data disks and 3 parity disks, the available space is 9*12, a total of 108TB available. If 2+1 redundancy is adopted, 15 10TB hard drives, 10 data disks, and 5 parity disks are required.

In the 3+1 plan, only 3 of the 12 hard disks can fail without affecting system usage and data restoration. However, if you use the 2+1 plan, although you need to purchase more hard disks, any 5 of the 15 hard disks can fail without affecting the system usage.

After replacing the damaged hard drive, the system can automatically repair the data.

Please refer to the table below for space utilization. If the space requirement is large (more than 150TB) and continues to grow, considering storage density and cost performance, it is recommended that a server be equipped with 16 hard disks at a time.

Different numbers of disks and space utilization can be calculated by clicking the link below

https://min.io/product/erasure-code-calculator?ref=docs

Hardware preparation

Required hardware configuration. Babel supports a single server to build an object storage system (single node), using hard disk groups as redundant objects. Up to half of the hard drive can be damaged without affecting the normal use of the system.

The Babel object storage system requires customers to prepare hardware servers in advance for deployment as required. The minimum configuration requires 2 Xhiqiang CPUs, 64GB memory and 400G solid-state drives (system disks), and then the mechanical hard drives required to increase the corresponding storage capacity.

To calculate the storage space required and the number of hard drives to purchase, click the link below to calculate it.

https://min.io/product/erasure-code-calculator?ref=docs

The recommended hardware configuration is as follows:

Server type Configuration requirements Operating system Server purpose Notes
Storage server If storage density and future expansion investment are considered, a 12T hard drive can be used. Linux Object Storage Server It is recommended to add a server with the same disk capacity for future expansion.

Babel object storage uses software-defined storage technology, so the hard disk does not require RAID. Considering that some servers require RAID to support multiple disks, in this case, each hard disk can be set to RAID0 pass-through mode.

Each expansion in single server mode requires adding a server with the same configuration. Uninterrupted service is possible during capacity expansion.

Single-server deployment also supports adding a server with the same configuration to establish active-standby mode for real-time synchronization. If the hardware of a server is damaged, it can be quickly switched to the backup server.

The effective disk space capacity is related to the selected redundancy method. For example: use 8 10T hard drives to build an object storage system, and adopt a 3+1 redundancy scheme, which is equivalent to a group of 4 hard drives, divided into 2 groups, with an effective available space of 60TB. At the same time, if any two of the 8 hard drives are damaged, the data and system usage will not be affected. The damaged hard drive can be automatically rebuilt. If a 7+1 redundancy scheme is adopted, the effective available space is 70TB. Any damage to one of the eight hard drives will not affect data and system usage.

Multi-server deployment (distributed). Babel object storage system supports multi-server deployment. If the server is used as a redundant unit, at least 3 servers are required, that is, 2+1 mode. However, considering the space utilization, it is recommended to adopt the 3+1 solution (that is, 4 servers).

The configuration requirements for each node (server) are consistent with the single-server deployment above.

It is recommended that the node configuration be consistent (same operating system, same number of disks and same network connection)

The number of drives provided by each node must be the same

The time difference between nodes cannot be greater than 15 minutes (it is recommended to use NTP to ensure time consistency)

The space utilization calculation is the same as for a single server.

BabelBird capabilities may change by product version, licensed modules and deployment configuration; actual availability depends on the deployed environment and administrator settings.