BitDew : Open Source Data Management for Grid, Desktop Grid and Cloud Computing

Gilles Fedak, Haiwu He / INRIA-Futurs

What is BitDew ?

The BitDew framework is a programmable environment for management and distribution of data for Grid, Desktop Grid and Cloud Systems.

BitDew is a subsystem which can be easily integrated into large scale computational systems (XtremWeb, BOINC, Hadoop, Condor, Glite, Unicore etc..). Our approach is to break the "data wall" by providing in single package the key P2P technologies (DHT, BitTorrent) and high level programming interfaces. We first target Desktop Grid with peta-scale data system : up to 1K files/nodes, with size up to 1GB and distributed to 10K to 100K nodes.

The BitDew framework will enable the support for data-intense parameter sweep applications, long-running applications which requires distributed checkpoint services, workflow applications and maybe in the future soft-realtime and stream processing applications.

What Can I do with BitDew ?

BitDew offers programmers a simple API for creating, accessing, storing and moving data with ease, even on highly dynamic and volatile environments.

The BitDew programming model relies on 5 abstractions to manage the data : i) replication indicates how many occurrences of a data should be available at the same time on the network, ii) fault-tolerance controls the policy in presence of machine crash, iii) lifetime is an attribute absolute or relative to the existence of other data, which decides the life cycle of a data in the system, iv) affinity drives movement of data according to dependency rules, v) protocol gives the runtime environment hints about the protocol to distribute the data (http, ftp or bittorrent). Programmers define for every data these simple criteria, and let the BitDew runtime environment manage operations of data creation, deletion, movement, replication, and fault-tolerance operation.

Bitdew Architecture

The BitDew runtime environment is a flexible environment implementing the API. It relies both on centralized and distributed protocols for indexing, storage and transfers providing reliability, scalability and high-performance.

The architecture follows a classical three-tiers schema commonly found in Desktop Grids: it divides the world in two sets of nodes : stable nodes and volatile nodes. Stable nodes run various independent services which compose the runtime environment: Data Repository (DR), Data Catalog (DC), Data Transfer (DT) and Data Scheduler (DC). We call these nodes the service hosts. Volatile nodes can either ask for storage resources (we call them client hosts) or offer their local storage (they are called reservoir hosts). Usually, programmers will not use directly the various D* services; instead they will use the API which in turn hides the complexity of internal protocols.

The Bitdew runtime environment delegates a large number of operation to third party components : 1) Meta-data information are serialized using a traditional SQL database, 2) data transfer are realized out-of-band by specialized file transfer protocols and 3) publish and look-up of data replica is enabled by the means of of DHT protocols. One feature of the system is that all of these components can be replaced and plugged-in by the users, allowing them to select the most adequate subsystem according to their own criteria like performance, reliability and scalability.

Download and Licence

BitDew is still in a development phase. We have released several versions under the GPL-v3 or CeCill-v2.0 licence (at user's choice). Be warned that this program is unstable, lacks documentation and may only be used by confirmed developers. However, it can be tried by CS students or researchers for experimentations or education.

See BitDew in Action

Hot Researches

Publications

BitDew: A Programmable Environment for Large-Scale Data Management and Distribution — Gilles Fedak, Haiwu He and Franck Cappello published at SC'08, Austin Texas, 2008

If you cite BitDew, please use the following citation

 @inproceedings{bitdew,
 author = {Fedak, Gilles and He, Haiwu and Cappello, Franck},
 title = {BitDew: a programmable environment for large-scale data management and distribution},
 booktitle = {SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing},
 year = {2008},
 isbn = {978-1-4244-2835-9},
 pages = {1--12},
 location = {Austin, Texas},
 publisher = {IEEE Press},
 address = {Piscataway, NJ, USA},
 }

BitDew: A data management and distribution service with multi-protocol file transfer and metadata abstraction — Gilles Fedak, Haiwu He and Franck Cappello, Journal of Network and Computer Applications, Volume 32, Issue 5, September 2009, Pages 961-975

A File Transfer Service with Client/Server, P2P and Wide Area Storage Protocols — Gilles Fedak, Haiwu He and Franck Cappello — Proceedings of the First International Conference on Data Management in Grid and P2P Systems Globe'08/LNCS, Turin, 2008

BLAST Application with Data-aware Desktop Grid Middleware. Haiwu He,Gilles Fedak, Bing Tang and Franck Cappello. Proceeding of 9th IEEE/ACM International Symposium on Cluster Computing and the Grid Page 284-291 (CCGrid'09), Shanghai, China