之前看过类似的,总结过一小点:
http://web2.0coder.me/archives/630252
This link has been bookmarked by 49 people . It was first bookmarked on 06 May 2009, by feng bo.
-
22 Jun 13
-
30 Sep 12
-
22 May 12
-
29 May 11
-
Everyone who builds big applications builds them on CAP and BASE
-
compression - great gains in throughput, can store more, reduces IO bottleneck
-
single master - one node knows everything about all the other node (backed up and cached).
-
- row database - store objects together
- column database - store attributes of objects together. Makes sequential retrieval very fast, allows very efficient compression, reduces disks seeks and random IO.
-
eventually consistent - append only system using a row time stamp. When a client queries they get several versions and the client is in charge of picking the most recent.
-
Uses consistent hashing to distribute data to one or more nodes for redundancy and performance.
-
Consistency between nodes is based on vector clocks and read repair.
-
Read repair - When a client does a read and the nodes disagree on the data it's up to the client to select the correct data and tell the nodes the new correct state.
-
Highly Available for Write
-
Clients have to be smart to handle read-repair
-
Not suitable for column-like workloads, it's just a key-value store
-
Distributed databases are the new web framework.
-
Pick one and start submitting patches. Don't start another half-baked clone.
-
Similar replication strategy to MySQL. Not useful for scalability as it limits the write throughput to one node.
-
it understands your values so you can operate on them
-
Can match on key spaces. You can look for all keys that match an expression.
-
Understands lists and sets.
-
it requires that full data store in RAM
-
Documents can be nested unlike CouchDB which requires applications keep relationships.
-
Advantage is that the whole object doesn't have to be written and read because the system knows about the relationship.
-
Each column is stored separately so IO is efficient as only the columns of interest are scanned. When using column database you are almost always scanning the entire column.
-
Bitmap indexes for fast sequential scans.
-
No query language; generally need to iterate over each row using MapReduce to do queries
-
Only has an index for the row key
-
Even though both Yahoo (Pig) and Facebook (Hive) have their own analytics apps based on Hadoop, neither uses HBase for storage
-
And don't forget Yahoo Everest, which is basically a MPP column store for PostgreSQL
-
-
10 Mar 11
pshah2kPartition Tolerance - if one or more nodes fails the system still works and becomes consistent when the system comes on-line.
-
23 Jan 11
-
04 Nov 10
-
28 Mar 10
-
10 Nov 09
-
15 Oct 09
-
05 Aug 09
-
16 Jul 09
-
01 Jun 09
-
25 May 09
-
24 May 09
-
22 May 09
-
18 May 09
-
14 May 09
-
12 May 09
-
10 May 09
-
Yushi HThis talk explores the landscape of new technologies available today to augment your data layer to improve performance and reliability.
-
07 May 09
-
06 May 09
-
-
Add Sticky Note
-
-
Public Stiky Notes
http://web2.0coder.me/archives/630252
Would you like to comment?
Join Diigo for a free account, or sign in if you are already a member.