Introduction


With the rapid increase of the amount of information and data in digital format, reliable storage of data poses significant challenges despite the advances in storage device technologies. Networked distributed storage systems provide a scalable way to store more and more data across networks of storage nodes. Yet individual storage devices (or network components) are bound to fail, and hence redundancy mechanisms at the node level are essential in order to achieve reliable storage.

Coding techniques promise a good trade-off between reliability and storage overhead for data storage. Erasure codes (such as Reed-Solomon erasure codes), while being widespread for storage in media like CDs/DVD, as well as RAID like systems, have become more and more popular for networked distributed storage systems. Existing coding techniques for communications however do not address all the issues appearing in the context of networked distributed storage systems, in particular they do not come with immediate repair mechanims (that is, failure of some storage devices call for recreation and storage of lost redundancy in new storage devices). This gives rise to the issue of designing codes for better repairability, a question that has generated many recent research results (see [M1] for a survey).

In this project, we study many aspects of code design suited for networked storage systems, including repairability, but also data insertion, some security aspects, and more recently mutable content, along with practical aspects such as algorithms and distributed networked storage system design issues in order to harness the salient properties of these new coding techniques. More details are found below.

Repairability


Erasure codes for communication tend to entail high communication overhead for maintenance, when some of the encoded fragments are lost and need to be replenished. Such overheads arise from the fundamental need to recreate (or keep separately) first a copy of the whole object before any individual encoded fragment can be generated and replenished. This issue is referred to as repairability.

Local Repairability

In [C1], we proposed to look at the repair process from the point of view of minimizing the number of live nodes to be contacted to perform one repair, and called Self-Repairing Codes (SRC) codes which have the property to repair one failure by contacting as low as 2 or 3 live nodes. Another family of self-repairing codes based on projective geometry was given in [C2], which has the feature of storing systematic pieces.

In [C4], we explored an alternative way to achieve good repairability by disentangling fault-tolerance and repairability. Specifically, we apply a RAID-4 like parity code on top of erasure coded object pieces of different objects, to achieve efficient repairs, while the individual objects are stored in a fault-tolerant manner using any erasure coding technique.

In [C6], bounds on the number of nodes to be contacted to perform a repair (called repair locality, or repair degree) are computed, while in [C8], it is noted that when trying to minimize too much the repair degree for one failure, there are risks that two failures may not enjoy local repair anymore. It was then proposed to build codes with multiple repair alternatives, that is, capable of repairing locally more than one failure. Incidentally, [C1, C4] supported multiple repair alternatives, even though this was not a priori a code design goal for those works.

Repair Bandwidth and Partial Collaboration

In [C9], constructions of codes which optimize the repair bandwidth instead of the repair degree are proposed, in the functional setting, that is, where repair does not have to be done bit by bit.

In the context of repair bandwidth efficient codes, collaboration among repair nodes to repair simultaneously several node failures has been proposed independently by Kermarrec et al. and Shum. In [C14], we propose the framework of partial collaboration as a way to nuance between no cooperation and full collaboration. In [C15], we propose two code constructions that realize partial collaboration.

Codes that enable collaborative repair are not easy to find. We suggest to approach this problem from a Grassmannian point of view in both [C13] where codes are constructed from cliques in Grassmann graphs, and in [C16], where codes are obtained by looking at the action of subgroups of GL(n) on suitable subspaces.

Data Insertion


During the storage process, traditional erasure codes require a unique source node to create and upload all the redundant data to the different storage nodes. However, such a source node may have limited communication and computation capabilities, which constrain the storage process throughput. Moreover, the source node and the different storage nodes might not be able to send and receive data simultaneously - e.g., nodes might be busy in a datacenter setting, or simply be offline in a peer-to-peer setting - which can further threaten the efficacy of the overall storage process. Codes with low repair degree can be leveraged to achieve "in-network" redundancy generation [J1]. Such an in-network redundancy generation allows storage nodes to generate new redundant data by exchanging partial information among themselves, improving the throughput of the storage process. The process is carried out asynchronously, utilizing spare bandwidth and computing resources from the storage nodes. Experiments show that such an approach can increase the throughput of the storage process significantly with respect to the classical naive storage approach.

Related to the above problem of how encoded data is created inside the storage network is that of migration from replicas to erasure coded data. This setting happens typically for systems which replicate hot data, and create coded archive of the data once it becomes cold. Families of codes that facilitate an erasure coded archival from available replicas in the storage systems have been proposed in [C5] and [C7].

A survey of the codes we proposed to either create fast erasure coded data from data objects, or from available replicas is available in [S3].

Mutable Content (updates & versioning)


Use of erasure codes is often limited to archival storage, where it is assumed that the existing content does not (or seldom) changes. This limits the kinds of applications where the storage savings from using erasure codes can be harnessed.

We investigate how to optimize the update process for erasure coded data, and integrate the ideas in the context of a cross-object coding based system in [C11].

In [P2], we discuss how versioning can be supported in erasure coded storage systems, where we show how I/O access can be improved with respect to a naive approach by exploiting sparsity in updates.

Some Security Aspects


Pollution Attacks and Regenerating Codes

Kermarrec et al. and independently K. Shum derived a trade-off between storage capacity and bandwidth needed for repair taking into account collaboration among regenerating nodes. It was shown that collaboration benefits the trade-off curve. In [C3] we study Byzantine fault tolerance of regenerating codes with collaboration, and show that in the presence of nodes polluting the network with corrupted data, the trade-off curve is actually worse than without any collaboration.

Systems


Erasure Coding Based Storage Systems

We explore the application and efficacy of the novel storage centric erasure codes with experimental setups and prototype implementations. These hands on and systems works include the following:

We integrate the ideas of cross-object coding [C4] with HDFS, to realize the CORE (Cross-Object Redundancy) system [C10].

We implement a multi-cloud (InterCloud) storage system (also called redundant array of cloud systems in the literature) using SRC codes in [C12].

In practice, many objects are stored in a storage system, and likewise failures simultaneously affect multiple objects. Thus, there will naturally be contention for shared resources (e.g., I/O, network) during the repairs, and other system design choices such as placement of the encoded objects can influence the repair process. We have carried out some preliminary experiments [M2] to benchmark repair of multiple objects under single as well as multiple storage node failures.

Proprietary cluster (using HP ThinClients)

In order to carry out small scale deployments and experiments, we have set up a cluster composed of 50 HP t5745 ThinClients, each with an Intel Atom N280 Processor 1.66 GHz, 1 GB DDR3 SDRAM and a solid state drive (SSD) of 2 GB. A PC with 4GB RAM is used as the master node orchestrating the cluster. The network connectivity is provided by three 24-port HP 2510 Switches (J9019B) and the whole setup has a maximum power consumption of 3.31 kW.

Software prototypes

The project currently includes the following storage code implementations:

ClusterDFS (Implemented by L. Pamies Juarez): Experimental distributed storage system implemented in python for fast prototyping of storage codes. The main features of ClusterDFS are: (1) Fast I/O and concurrency thanks to the asynchronous network library and lightweight threads offered by gevent. (2) Network buffers are wrapped with GaloisBuffer arrays that allow to perform encoding operations directly over them, without requiring extra memory allocations. The current implementation allows to execute different DataNodes (bin/datanode) across different computers, allowing to store, retrieve and encode data in each of them. The project is available on github.

RapidRAID Codes  (Implemented by L. Pamies Juarez):a new pipelined coding strategy that distributes the network and computing load of single-object encodings among different nodes, which also speeds up multiple object archival. RapidRAID codes, provide fast archival without compromising either data reliability or storage overheads. Experiments show that RapidRAID codes reduce a single object's coding time by up to 90%, while when multiple objects are encoded concurrently, the reduction is up to 20%.

Self Repairing Codes (Implemented by L. Pamies Juarez) : Implementation of self-repairing codes in Python using Jerasure's finite field arithmetic. The project is available on github.

StorageCORE for HDFS-RAID (Implemented by K. S. Esmaili) : Cross-object coding, by juxtaposing a standard erasure code on individual data objects followed by RAID-4 like parity of encoded pieces of different objects is used in the CORE storage primitive, yielding significantly better repairability for equivalent storage overheads and static fault-tolerance as traditional (MDS codes like Reed-Solomon) erasure codes. Integration of StorageCORE with HDFS-RAID, an erasure code supporting branch of Apache Hadoop developed at Facebook has been carried out and the source codes (as well as executables) are openly available.

Publications (2014-Present)


Journal papers


  1. Self-repairing codes: Local repairability for cheap & fast maintenance of erasure coded data, F. Oggier, A. Datta, Springer Computing
  2. On Repairing Erasure Coded Data in an Active-Passive Mixed Storage Network, F. Oggier, A. Datta, International Journal of Information and Coding Theory, 3(1), 2015

Conference papers


  1. Some Constructions of Storage Codes from Grassmann Graphs, F. Oggier, International Zurich Seminar on Communications (Invited Paper) 2014.
  2. On Storage Codes Allowing Partially Collaborative Repairs, S. Liu, F. Oggier, IEEE International Symposium on Information Theory (ISIT 2014).
  3. Two Storage Code Constructions Allowing Partially Collaborative Repairs, S. Liu, F. Oggier, IEEE International Symposium on Information Theory and Applications (ISITA 2014).
  4. On the Design of Storage Orbit Codes, S. Liu, F. Oggier, 4th International Castle Meeting on Coding Theory and Applications (4ICMCTA).
  5. Locally Repairable RapidRAID Systematic Codes — One simple convoluted way to get it all, A. Datta, ITW 2014.
  6. On Algebraic Manipulation Detection Codes from Linear Codes and their Application to Storage Systems, J. Harshan, F. Oggier, ITW 2015.
  7. Sparsity Exploiting Erasure Coding for Resilient Storage and Efficient I/O Access in Delta based Versioning Systems (extended abstract), J. Harshan, F. Oggier, A. Datta, ICDCS 2015.

Preprints


  1. Sparsity Exploiting Erasure Coding for Resilient Storage and Efficient I/O Access in Delta based Versioning Systems J. Harshan, Frédérique Oggier, Anwitaman Datta, Manuscript on arXiv, Nov 2014.

Patents and technology Disclosures


  1. Method For Fast Creation Of Erasure Coded Redundancy And Its Localized Repair In Distributed Data Stores. A. Datta, Filed in Singapore and for PCT (2014).
  2. A Method For Coding And Storing Multiple Versions Of Data In A Storage Efficient Manner In Erasure Coded Storage Systems. J. Harshan, A. Datta, F. Oggier, Filed in Singapore (2014).

Publications (2011-2013)


Monograph


  1. Coding Techniques for Repairability in Networked Distributed Storage Systems, F. Oggier, A. Datta, Foundations and Trends in Communications and Information Theory, Now Publishers, 2013.

Surveys and tutorials


  1. An Overview of Codes Tailor-made for Better Repairability in Networked Distributed Storage Systems, A. Datta, F. Oggier, ACM SIGACT News Distributed Computing Column, March 2013.
  2. Scalable Data Management and Storage on the Cloud: State of the Art and Emerging Trends , A. Datta & F. Oggier, ICDCN Tutorial, Hong Kong, 2012
  3. Data Insertion & Archiving in Erasure-coding Based Large-scale Storage Systems, L. Pamies-Juarez, F. Oggier, A. Datta, Paper accompanying invited talk at International Conference on Distributed Computing and Internet Technologies (ICDCIT 2013).
  4. Storage Codes: Managing Big Data with Small Overheads [slides | video - Part I & Part II], A. Datta, F. Oggier, Tutorial at the 2013 International Symposium on Network Coding (NetCod 2013).

Journal paper


  1. In-Network Redundancy Generation for Opportunistic Speedup of Data Backup, L. Pamies-Juarez, A. Datta, F. Oggier, Future Generation Computer Systems, Volume 29 Issue 6, August, 2013.

Conference papers


  1. Self-repairing Homomorphic Codes for Distributed Storage Systems, F. Oggier, A. Datta, The 30th IEEE International Conference on Computer Communications (Infocom 2011)
  2. Self-Repairing Codes for Distributed Storage - A Projective Geometric Construction, F. Oggier, A. Datta, IEEE Information Theory Workshop (ITW) 2011.
  3. Byzantine Fault Tolerance of Regenerating Codes, F. Oggier, A. Datta, The 11th IEEE International Conference on Peer-to-Peer Computing (P2P 2011).
  4. Redundantly Grouped Cross-object Coding for Repairable Storage, A. Datta, F. Oggier, The 3rd ACM SIGOPS Asia-Pacific Workshop on Systems (APSys 2012).
  5. Decentralized Erasure Coding for Efficient Data Archival in Distributed Storage Systems, L. Pamies-Juarez, F. Oggier, A. Datta, 14th International Conference on Distributed Computing and Networking (ICDCN 2013).
  6. Storage codes -- coding rate and repair locality, H. Hollmann, Invited paper at International Conference on Computing, Networking and Communications (ICNC 2013).
  7. RapidRAID: Pipelined Erasure Codes for Fast Data Archival in Distributed Storage Systems , L. Pamies-Juarez, A. Datta, F. Oggier, The 32nd IEEE International Conference on Computer Communications (Infocom 2013).
  8. Locally Repairable Codes with Multiple Repair Alternatives , L. Pamies-Juarez, H. Hollmann, F. Oggier, IEEE International Symposium on Information Theory (ISIT 2013).
  9. Characterizations and construction methods for linear functional-repair storage codes, H. Hollmann, W. Poh, IEEE International Symposium on Information Theory (ISIT 2013).
  10. CORE: Cross-Object Redundancy for Efficient Data Repair in Storage Systems, K. S. Esmaili, L. Pamies-Juarez, A. Datta, IEEE International Conference on Big Data (BigData 2013). Extended version.
  11. Efficient Updates in Cross-Object Erasure-Coded Storage Systems, K. S. Esmaili, A. Chiniah, A. Datta, Workshop on Distributed Storage Systems and Coding for BigData, 2013.
  12. InterCloud RAIDer: A Do It Yourself Multi-Cloud Private Data Backup System, C. W. Ling, A. Datta, 15th International Conference on Distributed Computing and Networks (ICDCN 2014). Best paper in Networking track

Preprints


  1. An Empirical Study of the Repair Performance of Novel Coding Schemes for Networked Distributed Storage Systems
    Lluis Pamies-Juarez, Frédérique Oggier, Anwitaman Datta
    Manuscript on arXiv, June 2012.

Patent


  1. Data encoding methods, Data decoding methods, Data reconstruction methods, Data encoding devices, Data decoding devices, and data reconstruction devices. F. Oggier, A. Datta, Granted in Singapore (April 2013, Patent No. 186781). Granted in US (January 2015, Patent No.: 8,928,503B2).

Talks and Courses (2014-Present)


Course


  • DigiCosme Spring School, Paris, France, May 2014: Coding for Networked Distributed Storage Systems by F. Oggier

Invited Talks


  • Talk at Bordeaux University (France), January 9 2014, Design of Codes for Repairability in Distributed Storage Systems, by F. Oggier
  • Talk at University of Melbourne (Australia), November 6 2014, Beyond repairability ­- Rethinking erasure codes for distributed storage, by A. Datta

Talks and Courses (2011-2013)


Courses


  • Open lectures for PhD students in computer science, Univ. of Warsaw, Poland, 2012: Coding Techniques for Distributed Storage Systems by F. Oggier
  • Lectures for PhD students in computer science, Univ. of Calcutta, India, 2013: Distributed Storage Systems and Applications of Coding Theory by F. Oggier & A. Datta

Invited Talks