alex band's List: Cloud Database

Dare Obasanjo aka Carnage4Life - Project Cassandra: Facebook's Open Source Alternative to Google BigTable 2

Jul 30, 08

www.25hoursaday.com/...ternativeToGoogleBigTable.aspx
- Cassandra has several optimizations to make writes cheaper. When a write operation occurs, it doesn't immediately cause a write to the disk. Instead the record is updated in memory and the write operation is added to the commit log. Periodically the list of pending writes is processed and write operations are flushed to disk. As part of the flushing process the set of pending writes is analyzed and redundant writes eliminated. Additionally, the writes are sorted so that the disk is written to sequentially thus significantly improving seek time on the hard drive and reducing the impact of random writes to the system. How important is improving seek time when accessing data on a hard drive? It can make the difference between taking hours versus days to flush a hundred gigabytes of writes to a disk. Disk is the new tape.
- The Cassandra data model is fairly straightforward. The entire system is a giant table with lots of rows. Each row is identified by a unique key. Each row has a column family, which can be thought of as the schema for the row. A column family can contain thousands of columns which are a tuple of {name, value, timestamp} and/or super columns which are a tuple of {name, column+} where column+ means one or more columns. This is very similar to the data model behind Google's BigTable.
- Facebook Cassandra - alex band on 2008-07-30
- Disk is the new tape - alex band on 2008-07-30
Cloud Computing | DBMS2 -- DataBase Management System Services 1

Jul 30, 08

www.dbms2.com/...cloud-computing
- Facebook has open-sourced Project Cassandra,
Cloud Computing with bigdata: OSCON 2008 - O'Reilly Conferences, July 21 - 25, 2008, Portland, Oregon 3

Jul 30, 08

en.oreilly.com/...2933
- The services layer uses Jini for service registration and discovery, but SCA and OSGi integrations are being considered.
- bigdata is a 100% Java project providing scale-out (distributed) indices, map/reduce style computing, a sparse row store (ala Hadoop’s HBase, Google’s bigtable, or CouchDB) a distributed file system (ala Hadoop’s HDFS or Google’s GFS), a high performance RDF database, and a flexible object generic object model (GOM) database.
- bigdata begins with a distributed index architecture and derives a high concurrency row store, a high performance semantic web database, a generic object database, and a distributed file system with atomic append from some basic operations on those indices.
1 more annotation...
- bigtable semantic and DFS - alex band on 2008-07-30
- The components of bigdata - alex band on 2008-07-30
Databases and the Cloud 3

Jul 29, 08

www.10gen.com/...databases-and-the-cloud
- Now we are working on the 10gen database, named Mongo
- Scalability: object databases are easier to scale than relational databases; sharding is easier. In a relational database, distributed joins are a complex problem that must be solved if one desires true plug-and-play scalability without limits
- Our approach with Mongo is in some ways similar, and in some ways different, from that of Amazon SimpleDB and Google BigTable. It is similar in that all three are non-relational. It is different in that Mongo is a true object database, rather than a key/value data store.
1 more annotation...
- Mongo,one of the prototype of cloud database - alex band on 2008-07-29
Apache CouchDB: The CouchDB Project 1

Jul 30, 08

incubator.apache.org/couchdb
- Apache CouchDB is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API. Among other features, it provides robust, incremental replication with bi-directional conflict detection and resolution, and is queryable and indexable using a table-oriented view engine with JavaScript acting as the default view definition language.
  
  CouchDB is written in Erlang, but can be easily accessed from any environment that provides means to make HTTP requests. There are a multitude of third-party client libraries that make this even easier for a variety of programming languages and environments.
- apache CouchDB - alex band on 2008-07-30
Apache CouchDB: Introduction 1

Jul 30, 08

incubator.apache.org/...intro.html
- What CouchDB is
  
  
  A document database server, accessible via a RESTful JSON API.
  
  Ad-hoc and schema-free with a flat address space.
  
  Distributed, featuring robust, incremental replication with bi-directional conflict detection and management.
  
  Query-able and index-able, featuring a table oriented reporting engine that uses Javascript as a query language.
  
  
  What it is Not
  
  
  A relational database.
  
  A replacement for relational databases.
  
  An object-oriented database. Or more specifically, meant to function as a seamless persistence layer for an OO programming language.
- CouchDB - alex band on 2008-07-30
Nimbus: About Us 1

Jul 30, 08

www.nimbusdata.com/...index.php
- Nimbus’ state-of-the-art Breeze unified iSCSI SAN and NAS storage systems, featuring the HALO storage operating system and 10 Gigabit Ethernet technology, provide a scalable, easy-to-use storage infrastructure for midsize enterprises focused on storage consolidation, server virtualization, and digital content management. With over 10,000 installations, Nimbus’ MySAN software is the world’s most popular open iSCSI target for Microsoft Windows servers.
- Nimbus IP storage - alex band on 2008-07-30
Hbase/HbaseArchitecture - Hadoop Wiki 3

Jul 30, 08

wiki.apache.org/...HbaseArchitecture
- Data Model
  
  HBase uses a data model very similar to that of Bigtable. Users store data rows in labelled tables. A data row has a sortable key and an arbitrary number of columns. The table is stored sparsely, so that rows in the same table can have crazily-varying columns, if the user likes.
  
  A column name has the form "<family>:<label>" where <family> and <label> can be any string you like. A single table enforces its set of <family>s (called "column families"). You can only adjust this set of families by performing administrative operations on the table. However, you can use new <label> strings at any write without preannouncing it. HBase stores column families physically close on disk. So the items in a given column family should have roughly the same write/read behavior.
  
  Writes are row-locked only. You cannot lock multiple rows at once. All row-writes are atomic by default.
  
  All updates to the database have an associated timestamp. The HBase will store a configurable number of versions of a given cell. Clients can get data by asking for the "most recent value as of a certain time". Or, clients can fetch all available versions at once.
- Conceptual View
  
  Conceptually a table may be thought of a collection of rows that are located by a row key (and optional timestamp) and where any column may not have a value for a particular row key (sparse). The following example is a slightly modified form of the one on page 2 of the Bigtable Paper.
- Physical Storage View
  
  Although, at a conceptual level, tables may be viewed as a sparse set of rows, physically they are stored on a per-column basis. This is an important consideration for schema and application designers to keep in mind.
  
  Pictorially, the table shown in the conceptual view above would be stored as follows:
1 more annotation...
- Hbase - alex band on 2008-07-30
Hbase - Hadoop Wiki 1

Jul 30, 08

wiki.apache.org/Hbase
- Just as Google's Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop Core. Data is organized into tables, rows and columns. An Iterator-like interface is available for scanning through a row range (and of course there is the ability to retrieve a column value for a specific key). Any particular column may have multiple versions for the same row key.
Welcome to HBase! 1

Jul 30, 08

hadoop.apache.org/hbase
- HBase is the Hadoop database. Its an open-source, distributed, column-oriented store modeled after the Google paper, Bigtable: A Distributed Storeage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop.
- Bigtable and Hbase - alex band on 2008-07-30

Amazon's Dynamo - All Things Distributed 8

Jul 30, 08

allthingsdistributed.com/...amazons_dynamo.html

Dynamo and similar Amazon technologies are used to power parts of our Amazon Web Services, such as S3.
This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon’s core services use to provide an “always-on” experience. To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.

Table 1: Summary of techniques used in Dynamo and their advantages.

Problem	Technique	Advantage
Partitioning	Consistent Hashing	Incremental Scalability
High Availability for writes	Vector clocks with reconciliation during reads	Version size is decoupled from update rates.
Handling temporary failures	Sloppy Quorum and hinted handoff	Provides high availability and durability guarantee when some of the replicas are not available.
Recovering from permanent failures	Anti-entropy using Merkle trees	Synchronizes divergent replicas in the background.
Membership and failure detection	Gossip-based membership protocol and failure detection.	Preserves symmetry and avoids having a centralized registry for storing membership and node liveness information.

Partitioning Algorithm
Replication
Data Versioning
Execution of get () and put () operations
Handling Failures: Hinted Handoff

6 more annotations...

1 - 11 of 11

20 items/page

List Comments (0)

List Info

alex band

11 items | 28 visits

the source of the writing"What kind of DBs we need in Cloud Computing Era"

Updated on Jul 30, 08
Created on Jul 30, 08

Category: Computers & Internet

URL:

alex band's List: Cloud Database

Dare Obasanjo aka Carnage4Life - Project Cassandra: Facebook's Open Source Alternative to Google BigTable 2

Cloud Computing | DBMS2 -- DataBase Management System Services 1

Cloud Computing with bigdata: OSCON 2008 - O'Reilly Conferences, July 21 - 25, 2008, Portland, Oregon 3

Databases and the Cloud 3

Apache CouchDB: The CouchDB Project 1

Apache CouchDB: Introduction 1

What CouchDB is

What it is Not

Nimbus: About Us 1

Hbase/HbaseArchitecture - Hadoop Wiki 3

Data Model

Conceptual View

Physical Storage View

Hbase - Hadoop Wiki 1

Welcome to HBase! 1

Amazon's Dynamo - All Things Distributed 8

List Info