CORE: Cross-Object Redundancy for Efficinet Data Repair



[NEW: see a visualized demo here]


Part I: The Idea

Erasure codes are an integral part of many distributed storage systems aimed at Big Data, since they provide high fault-tolerance for low overheads. However, traditional erasure codes are inefficient on reading stored data in degraded environments (when nodes might be unavailable), and on replenishing lost data (vital for long term resilience). Consequently, novel codes optimized to cope with distributed storage system nuances are vigorously being researched. In this work, we take an engineering alternative, exploring the use of simple and mature techniques -juxtaposing a standard erasure code with RAID-4 like parity.

Analytical as well as experimental studies (described in a this manuscript) showcase the efficacy of this approach over traditional as well as some novel (local reconstruction) codes.

Repairing Single Failures in CORE

Using the vertical parities, not only requires fewer blocks, but also incurs lower computational cost (XOR vs RS decoding).

Scheduling Multi-Failure Repairs in CORE

We have proposed RGS (Recursively-Generated Scheduling), an algorithm that generates efficient schedules for repairing multiple failures.

Update in CORE

An efficient parity update solution (instead of re-encoding the data) has been integrated into CORE to handle updated data blocks.



Part II: The Implementation

Important Note: StorageCORE implementation has been integrated into the HDFS-RAID, an erasure code supporting branch of Apache Hadoop developed at Facebook. In order to use StorageCORE, you first need to install this version of Apache Hadoop and then upgrade it using the following files.

NEW*: the new release of the CORE storage primitive supports data updates.


Download:

Component Download Link
Binary Distribution hadoop-raid.jar
XML Configuration File raid.xml
Source Codes (for development purposes) hadoop-raid.zip
Data Generator (for testing purposes) DataFileGenerator.java
Failure Pattern (for testing purposes) pattern.txt


Upgrade:

  • 1) exclude the raid subdirectory from the path and use the above jar archive instead.

  • 2) replace the raid configuration file (conf/raid.xml) with the new one.


Usage:

  • 1) Adjust the configuration file (raid.xml) according to your pick of coding parameters. The default values are n=9, k=6, and t=3.

  • 2) Use the generator program (DataFileGenerator.java) to create data files. If the default settings are left intact, it will generate 3 files, each of size 6 * 64MB, and copy them into a folder named 'data'.

  • 3) Start Apache Hadoop and then upload the whole data directory to HDFS.

  • 4) Use the StorageCORE's command lines to carry out your tasks. Here is the general pattern of this commands:

    ./bin/hadoop org.apache.hadoop.raid.RaidShell -COMMAND PARAMETERS_LIST

    If this command is run without any parameters, it will list all possible options. Among them, here are the most important ones:
    COMMAND PARAMETERS_LIST Functionality
    raid2D dirPath This command encodes (or in HDFS-RAID's terminology, raids) all files in the input directory in two dimensions: first per file (or horizontally) and then across files (or vertically).

    The relevant parameters --n,k, and t-- are fetched from the configuration file (conf/raid.xml).

    The generated horizontal and vertical parity files are stored under /raidrs and /raid subdirectories respectively.

    Internally, it calls the raidFile and raidAcrossFiles commands iteratively.
    fsck dirPath This is the HDFS-RAID's native file scan. It returns the number of files in the input directory which have missing/corrupt blocks.
    printBlockAllocations dirPath This command outputs two matrices, each of size k*t, that contain comprehensive information about the distribution/status of the all data blocks across the DataNodes.

    The first matrix shows the mappings between each single file and all DataNodes that host its blocks.
    The second matrix represents the status of all the blocks in the CORE matrix (0 for healthy, X for missing/corrupt).
    failUsingPattern dirPath patternFilePath It reads the input failure matrix --a textual file representing a matrix of size k*t in which each 0/X item indicates a healthy/unhealthy block in the CORE matrix-- and enforces it on the input directory.

    A sample failure pattern file for coding parameters of k=6 and t=3 is included in the download list above.

    Note: failures are enforced by deleting the physical block on the hosting DataNode through SSH connections.
    failRandomBlocks dirPath numberOfFailures Similar to the previous command. However it takes a number instead of a pattern and fails as many blocks as the input parameter.
    recoverBlocks filePath This is the native recovery option in HDFS-RAID.
    recoverBlocksOptimized filePath This is our optimized version of the previous command.
    recoverCRBlocks filePath This is the vertical counterpart of the recoverBlocks command. It uses cross-object parities to repair a failed block in the input file.
    recover2DBlocks dirPath recoverySchedule This is the most sophisticated recovery option. It is a collective recovery and works at the level of CORE matrix (in contrast to other recovery commands which work at the level of individual files).
    It first splits the failures into independent failure clusters and then one-by-one repairs them using a repair scheduling algorithm.
    Currently there are three repair scheduling algorithms implemented in StorageCORE: Column-First (colFirs), Row-First (rowFirst), and Recursively-Generated Schedules (rgs).
    replaceBlocks dirPath blockMap newBlockPath This is a new feature which allows HDFS to update data files by replacing some of their data blocks.
    Upon calling this method, all the marked blocks in teh blockMap file (similar to pattern.txt) will be replaced by the content of the block residing at newBlockPath.
    updateParitiesNaive dirPath blockMap This method implements the navie approach to reflect data updates in the parity blocks by re-encoding the affected rows and columns.
    updateParitiesOptimized dirPath blockMap oldBlockPath This method implements a prity update solution which is more efficient in handling updated data blocks. It first computes the diff-blocks and then uses them to compute the new versions of the parity files.
    Note that "oldBlockPath" provides the path to the older version of the updated blocks (as they are required in the computation of diff-blocks)


Developed by the Coding for Distributed Networked Storage Systems team at Nanyang Technological University, Singapore.
2012-2013