An Unorthodox Approach to Database Design : The Coming of the Shard | High Scalability

22 Apr 17

Daniel Dittmar

database scalability

14 Sep 16

patrick-unicorn

architecture database sharding patition

- High availability. If one box goes down the others still operate.
- Faster queries. Smaller amounts of data in each user group mean faster querying.
- More write bandwidth. With no master database serializing writes you can write in parallel which increases your write throughput. Writing is major bottleneck for many websites.
- High availability. If one box goes down the others still operate.
- Faster queries. Smaller amounts of data in each user group mean faster querying.
- More write bandwidth. With no master database serializing writes you can write in parallel which increases your write throughput. Writing is major bottleneck for many websites.
- You can do more work. A parallel backend means you can do more work simultaneously. You can handle higher user loads, especially when writing data, because there are parallel paths through your system. You can load balance web servers, which access shards over different network paths, which are processed by separate CPUs, which use separate caches of RAM and separate disk IO paths to process work. Very few bottlenecks limit your work.
Replicating data from a master server to slave servers is a traditional approach to scaling. Data is written to a master server and then replicated to one or more slave servers. At that point read operations can be handled by the slaves, but all writes happen on the master.

Obviously the master becomes the write bottleneck and a single point of failure. And as load increases the cost of replication increases. Replication costs in CPU, network bandwidth, and disk IO. The slaves fall behind and have stale data. The folks at YouTube had a big problem with replication overhead as they scaled.
Some Problems With Sharding

2 more annotations...

07 Sep 15

perivr

Sharding

29 Aug 15

dzocco

From Google Chrome My Venture

04 Jun 15

udomsak

From Google Chrome Advance_Zimbra

01 Jun 15

linekin

imported-links

14 Nov 13

kjleng

database sharding scalability

10 Oct 13

Yuvaraj L

sharding database scaling performance shard flickr

08 Jul 13

Pap Tom

scalability sharding database architecture performance mysql scaling design

Data are denormalized. Traditionally we normalize data. Data are splayed out into anomaly-less tables and then joined back together again when they need to be used. In sharding the data are denormalized. You store together data that are used together.
Data are more highly available. Since the shards are independent a failure in one doesn't cause a failure in another. And if you make each shard operate at 50% capacity it's much easier to upgrade a shard in place. Keeping multiple data copies within a shard also helps with redundancy and making the data more parallelized so more work can be done on the data. You can also setup a shard to have a master-slave or dual master relationship within the shard to avoid a single point of failure within the shard. If one server goes down the other can take over.
Obviously the master becomes the write bottleneck and a single point of failure. And as load increases the cost of replication increases. Replication costs in CPU, network bandwidth, and disk IO. The slaves fall behind and have stale data. The folks at YouTube had a big problem with replication overhead as they scaled.
Rebalancing data. What happens when a shard outgrows your storage and needs to be split? Let's say some user has a particularly large friends list that blows your storage capacity for the shard. You need to move the user to a different shard.
On some platforms I've worked on this is a killer problem. You had to build out the data center correctly from the start because moving data from shard to shard required a lot of downtime.
Joining data from multiple shards. To create a complex friends page, or a user profile page, or a thread discussion page, you usually must pull together lots of different data from many different sources. With sharding you can't just issue a query and get back all the data. You have to make individual requests to your data sources, get all the responses, and the build the page. Thankfully, because of caching and fast networks this process is usually fast enough that your page load times can be excellent.
mplementing shards is not well supported. Sharding is currently mostly a roll your own approach. LiveJournal makes their tool chain available. Hibernate has a library under development. MySQL has added support for partioning. But in general it's still something you must implement yourself.

5 more annotations...

23 May 13

guocai chen

sharding database scalability mysql

16 Mar 13

idpt 0000

mysql startup

12 Feb 13

Alejandro Alvarez

shard architecture

18 Nov 12

mohsensajjadi

sharding database scalability architecture performance scaling mysql distributed programming shared-nothing shared-disk cloud computing

27 Jun 12

mozhay

sharding database scalability

26 May 12

garfield mypet

sharding database scalability architecture performance scaling mysql

02 Jan 12

Goran Sander

sharding database mysql

18 Aug 11

Denis Guerrero

Article discussing DB sharding. Technique used by the big boys Facebook, Google, etc...

MySQL

13 Aug 11

Petri Tonteri

sharding tietokannat

07 Aug 11

Esfand S

db design sharding

08 Jun 11

Vinay R

Excellent articles on Sharding

database shard whatis resourcecenter

02 May 11

Daniel Bruges

Database Sharding scalability

24 Dec 10

Arrix Z

sharding

28 Jul 10

Harris Sun

high scalability architecture design database

16 May 10

Island Chen

sharding

20 Apr 10

databases db mysql partitioning scaling database design architecture shard sharding flickr

03 Mar 10

Javier Monterrubio

database scalability sharding architecture performance scaling programming

20 Nov 09

rieman bren

database scalability architecture performance scaling design optimization programming mysql delicious

10 Aug 09

Steven Dehandtschutter

database scalability optimization sharding distributed

16 Jun 09

Will Critchlow

scaling shard sharding delicious

18 May 09

zarkdav

mysql scalability

14 May 09

gstathis

database scalability sharding architecture performance scaling shard design optimization

10 May 09

Dante-Gabryell Monson

What is sharding?

While working at Auction Watch, Dathan got the idea to solve their scaling problems by creating a database server for a group of users and running those servers on cheap Linux boxes. In this scheme the data for User A is stored on one s

distributed p2p architecture programming database storage

20 Apr 09

databases database sysadmin webdev development programming scalability sharding shard

20 Mar 09

bmwbzz

dev db sharding

21 Feb 09

Panupan Sriautharawong

database sharding

11 Feb 09

digitalrinaldo

sharding

04 Feb 09

A S

imported Bookmarks_Menu database scalability sharding architecture performance scaling design

27 Jan 09

Rich Hintz

databases sharding

22 Jan 09

Jeff Stewart

RDBMS shards considerations scalability databases architecture partitioning

onceuponapriori

programming scaling database sharding

21 Jan 09

smeier

shard sharding

07 Jan 09

Sven Duzont

sharding database partitionning architecture scalability performance

12 Nov 08

Yushi H

database sharding web scalability architecture

09 Nov 08

Vivus Ignis

performance sharding

16 Oct 08

adrian kalaveshi

database flickr shard sharding scaling scalability to_read optimization shards databases

michelerallo

shard sharding

30 Aug 08

veverkap

web scalability for:gustin del

28 Aug 08

Pantelis Nasikas

SQL dev databases mysql scalability scaling distributed architecture performance cluster database engineering sysadmin data

27 Aug 08

Mark Masterson

shards databases architecture scalability flickr cloudcomputing datacenter cscsag catapult

25 Aug 08

Lee Parker

websoftware

fulvius longhi

Flickr now handles more than 1 billion transactions per day, responding in less then a few seconds and can scale linearly at a low cost.
You can keep a user's profile data separate from their comments, blogs, email, media, etc, but the user profile data would be stored and retrieved as a whole. This is a very fast approach. You just get a blob and store a blob. No joins are needed and it can be written with one disk write.

09 Aug 08

johnwards

programming scalability database db sql sharding scaling mysql architecture shard

29 Jul 08

Karol T

sharding shard scalability scaling database

08 Jul 08

M G

23 Jun 08

eimaj42jdp

article database design sharding SQL TOREAD delicious

15 Jun 08

jayfkay

database sharding mysql

13 Jun 08

Tristan Rivoallan

database performance documentation clevermarks

dentharg

appengine architecture database scalability sharding performance scaling optimization databases cluster

03 Jun 08

Wen-Chun Ni

scalability clustering database design distributed architecture sharding performance

27 May 08

thesuffixed

database cluster

21 May 08

hungrypipo

database development architecture sharding scalability performance optimization design scaling

06 May 08

03 May 08

John L

architecture article development programming sql

24 Apr 08

Mike Barone

scalability databases performance sharding toread from:del.icio.us

22 Apr 08

Reuben Grinberg

database scalability sharding architecture scaling

Pedro Alves

Good explanation of sharding

database

05 Apr 08

Carl Dunham

database sharding scalability

01 Apr 08

David Czarnecki

database sharding

31 Jan 08

scalability

09 Oct 07

Ken Wei

architecture database scaling sharding

In sharding the data are denormalized. You store together data that are used together.

04 Oct 07

François Charoy

database design sharding scalability architecture performance scaling optimization for:momo54

03 Oct 07

Arvind

_new

23 Sep 07

Fernando Serer

mysql optimization performance escalabilidad

15 Sep 07

Edwin van Ouwerkerk Moria

database scalability sharding architecture performance

08 Sep 07

tvaananen

scalability architecture sharding database performance design

28 Aug 07

Navneet Kumar

Scalability Shards Storage Database-Design Database Architecture

16 Aug 07

Nicolas Perriault

What is sharding and how has it come to be the answer to large website scaling problems?

database scalability sharding architecture performance mysql cleverplanet

What is sharding and how has it come to be the answer to large website scaling problems?

database scalability sharding architecture performance mysql cleverplanet

15 Aug 07

Steve Willer

Well, I have a name to put to the "mod 12" row-based data scaling technique, at least.

architecture database performance

13 Aug 07

s s

database scalability sharding

Olifante *

"Dathan Pattishall explains his motivation for a revolutionary new database architecture - sharding - that he began thinking about even before he worked at Friendster, and fully implemented at Flickr. Flickr now handles more than 1B transactions per day"

mysql sharding scalability friendster flickr

12 Aug 07

Chu Yeow Cheah

databases deployment scaling

08 Aug 07

Jay Luker

database scalability webapps mysql sfxdev for:sfxdev

07 Aug 07

Chrys R

database architecture scalability design

Mike King

DatabaseDesign Architecture ha DR scalability

tbenbrahim

research primeresearch

TooManySecrets

Magnífico documento donde se describe la técnica del "sharding", como una alternativa para una escalabilidad óptima en base de datos.

sharding

sharding database shard highscalabitity altaescalabilidad sysadmin documentacion

sharding

06 Aug 07

Frank Colcord

Database Ruby

05 Aug 07

ousiotic

database

avianto

development database scaling scalability performance

Mark Gardner

database scaling sharding scalability architecture performance design optimization

Jeff Giddens

data interesting howto

04 Aug 07

Miguel Angel Rasero

database scaling sharding architecture

Foo Foo

database scaling sharding scalability architecture performance optimization design distributed

pollotote

design mysql programming diy howto

pollo te

design mysql programming diy howto

jagdip singh

database db2 datawarehouse

ken .

"Solving" performance with horizontal and vertical scaling (bigger, more boxes) has limits - Dathan Pattishall on Sharding, linear costs, avoiding bottleneck of a single master writer and replication, using federated space, grouped/partitioned data

architecture computer database design flickr growth strategy

An Unorthodox Approach to Database Design : The Coming of the Shard | High Sca... - The Diigo Meta page

Would you like to comment?

Top Tags

Check out another URL