Skip to main content

How FriendFeed uses MySQL to store schema-less data - Bret Taylor's blog - The Diigo Meta page

bret.appspot.com/...how-friendfeed-uses-mysql - Cached - Annotated View

Share This

Bookmarking History
Comments (3)

This link has been bookmarked by 433 people . It was first bookmarked on 27 Feb 2009, by Ted Louie.

01 Jun 15

linekin
imported-links mysql
01 Oct 14

Charlie Smith
SU databases
16 Dec 11

liukaiyang
mysql scalability nosql
28 Sep 11

movingahead
mysql hacking scalability delicious
Living Buddha
databases database storage MySQL couchdb web development python performance data scalability scaling architecture programming SQL schema-less from-delicious
27 Sep 11

alexgadea
entity can be stored on different shards than the entities themselves, consistency is an issue. What if the process crashes before it has written to all the index tables?

Building a transaction protocol was appealing to the most ambitious of FriendFeed e
Ed Lucas
mysql database scalability friendfeed architecture
florentin s
design performance database architecture
26 Aug 11

Mike King
mysql
18 Aug 11

Matthew York
mysql key-value
10 Aug 11

Claude Falguière
07 Jul 11

Vincent Tsao
mysql performance database scaling architecture
04 Jul 11

karthik katooru
mysql blog delicious
22 May 11

mysql friendfeed
21 May 11

C K
Web_Development MySQL NoSQL FriendFeed Scalability
27 Apr 11

quartzo
architecture database Development friendfeed hash JSON mysql performance PROGRAMMING scalability scaling schemaless Imported from del.icio.us
Laurentiu Ilie
mysql friendfed
26 Apr 11

mkalika
technology examined case-study architecture
06 Apr 11

Jeremy Frazzizle
mysql scalability delicious
20 Mar 11

timkeller
A slightly older article about mysql scaling. Worth reading as we scale UNITI Fireweb and Disasternet.

mysql database scalability performance architecture scaling
08 Mar 11

Jon Phipps
mysql database performance scalability architecture programming scaling
16 Feb 11

usingsystem
mysql schemaless database key-value-store tutorial
10 Feb 11

Christian Winkler
mysql nosql database schema
08 Feb 11

Seonhyu Kim
database
03 Feb 11

kidbombay
08 Jan 11

Max Cutler
mysql scaling friendfeed uuid
06 Jan 11

mysql database performance scalability architecture schema-less blob object
22 Nov 10

Paul Barry
mysql scalability database schemaless nosql
05 Nov 10
05 Oct 10

Matthew Boatman
architecture design programming software sql python development storage article blog scalability toread howto couchdb json scaling web data mysql database nosql schemaless performance databases sharding db friendfeed rdbms schema schema-less
26 Sep 10

Alex Ko
mysql scaling database friendfeed performance
15 Sep 10

yarmiky shilla
- As our database has grown, we have tried to iteratively deal with the scaling issues that come with rapid growth. We did the typical things, like using read slaves and memcache to increase read throughput and sharding our database to improve write throughput. However, as we grew, scaling our existing features to accomodate more traffic turned out to be much less of an issue than adding new features.
Aloysius
mysql database scalability performance sql tuning
- We index data in these entities by storing indexes in separate MySQL tables
- one for each index
10 Sep 10

Ronan Amicel
friendfeed mysql
08 Sep 10

earkivar
15 Aug 10

Jascha Dub
architecture data databases db design development mysql programming scalability sharding storage
19 Jul 10

Lathe Moriarty
mysql database scalability architecture
06 Jul 10

Mairbek Khadikov
highload performance scaling development
04 Jul 10

marcbertone
database sql scalability performance optimization
01 Jul 10

Xavier Gorse
mysql database scalability architecture
- eeds. For instance there are different key/value stores with varying characteristics and the document-based ones should get more stable over time. One option I think will get more and more interesting in the future is using a graph database engine like http://neo4j.org/ (which BTW is the re
victortrac
software architecture database mysql performance python data design
26 Jun 10

de Villamil Frédéric
friendfeed performance mysql database architecture scalability programming
Scott Hendrickson
21 Jun 10

znarfor
imported mysql NoSQL friendfeed
15 Jun 10

Yury Yurevich
mysql python performance nosql
30 Apr 10

mat tat
database mysql couchdb cassandra
21 Apr 10

chew barkla
toread friendfeed scalability mysql performance couchdb
10 Apr 10

oriolmari
sharded

database performance
31 Mar 10

Stefano T
howto
15 Mar 10

Karol T
scalability mysql sharding friendfeed
11 Mar 10

Yura Yatsuk
mysql tips programming database
beattakeshi
mysql performance scalability
jason prins
database design friendfeed
03 Mar 10

anonymous anonymous
Ok, here's the thing - most other databases do not have the same maintenance limitations that MySQL has that are the root cause of all of this.

nosql mysql
16 Feb 10

Imaginary Robots
sql scaling mysql nosql database performance
15 Feb 10

mahfuz rahman
dy MEDIUMBLOB, UNIQUE KEY (id),

mysql python friendfeed
design web software programming development blog howto mysql toread data article database performance python sql storage db architecture databases json scalability scaling schema friendfeed schemaless couchdb sharding nosql rdbms schema-less
10 Feb 10

Mark Heimann
mysql database scalability performance architecture
06 Feb 10

pinco pallo
imported mysql friendfeed scaling schema performance architecture database
R Chang
scalability sqlserveradministration performance
04 Feb 10

Sanghyun Park
mysql database architecture friendfeed scaling scalability performance
22 Jan 10

Arrix Z
friendfeed mysql schema nosql
- Lots of projects exist designed to tackle the problem storing data with flexible schemas and building new indexes on the fly (e.g., CouchDB). However, none of them seemed widely-used enough by large sites to inspire confidence.
- MySQL works. It doesn't corrupt data. Replication works. We understand its limitations already. We like MySQL for storage, just not RDBMS usage patterns.
- Our datastore stores schema-less bags of properties (e.g., JSON objects or Python dictionaries). The only required property of stored entities is id, a 16-byte UUID. The rest of the entity is opaque as far as the datastore is concerned.
- We index data in these entities by storing indexes in separate MySQL tables.
- Indexes are stored in separate tables. To create a new index, we create a new table storing the attributes we want to index on all of our database shards
- Our datastore automatically maintains indexes on your behalf
- And we could populate the index asynchronously (even while serving live traffic) with:
  
  ./rundatastorecleaner.py --index=index_link
- Since our database is sharded, and indexes for an entity can be stored on different shards than the entities themselves, consistency is an issue. What if the process crashes before it has written to all the index tables?
  
  Building a transaction protocol was appealing to the most ambitious of FriendFeed engineers, but we wanted to keep the system as simple as possible.
- When we read from the index tables, we know they may not be accurate (i.e., they may reflect old property values if writing has not finished step 2). To ensure we don't return invalid entities based on the constraints above, we use the index tables to determine which entities to read, but we re-apply the query filters on the entities themselves rather than trusting the integrity of the indexes:
  
  Read the entity_id from all of the index tables based on the query
  
  Read the entities from the entities table from the given entity IDs
  
  Filter (in Python) all of the entities that do not match the query conditions based on the actual property values
- To ensure that indexes are not missing perpetually and inconsistencies are eventually fixed, the "Cleaner" process I mentioned above runs continously over the entities table, writing missing indexes and cleaning up old and invalid indexes. It cleans recently updated entities first, so inconsistencies in the indexes get fixed fairly quickly (within a couple of seconds) in practice.
- We do shard our indexes. We query all the relevant index shards in parallel and over-fetch. The indexes are stored in sort order, so sorting is not an issue. To paginate, we fetch start + num and truncate in Python.
- For our last re-shard, we basically set up a parallel instance of our DB and wrote to both in parallel while we copied data over, then switched off the old system. Not optimal, certainly, but it worked for us.
10 more annotations...
Dennis F
Since our databases are all heavily sharded, the relational features of MySQL like JOIN have never been useful to us, so we decided to look outside of the realm of RDBMS.

database software_architecture
30 Dec 09

Khanh Le
25 Dec 09

naim kazi
mysql scalability
23 Dec 09

kmng73
mysql schema design performance
21 Dec 09

Rodrigo de Oliveira
Our datastore stores schema-less bags of properties (e.g., JSON objects or Python dictionaries). The only required property of stored entities is id, a 16-byte UUID. The rest of the entity is opaque as far as the datastore is concerned. We can change the

blog performance mysql nosql schemaless banco_de_dados how-to dica índice from_delicious
18 Dec 09

Shiki Shiji
Bookmarks development design software performance architecture mysql database scalability scaling sql friendfeed schemaless schema storage
09 Dec 09

fulvius longhi
mysql strategy scalability
20 Nov 09

Jang Eui Jin
MySQL을 사용해서 스키마없는 데이타 저장하기
16 Nov 09

adelein rodriguez
todo
15 Nov 09

Paulo Gaspar
database nosql mysql case-study _wrk
08 Nov 09

Andrey Petrov
mysql database scalability peformance architecture friendfeed scaling schemaless
jdtbphc
mysql friendfeed database performance scaling programming scalability python
30 Oct 09

varkas
database web python forum nosql
kobusb
mysql friendfeed schemaless storage db sharding database sql performance scalability
29 Oct 09

Erhardt Graeff
database programming python friendfeed webecology
21 Oct 09

simon pasquier
mysql database scalability performance architecture
14 Oct 09

Ryan Baldwin
scalability
11 Oct 09

Seymour Cakes
mysql couchdb scalability architecture performance
04 Oct 09

S K
friendfeed mysql db schemaless scalability
25 Sep 09

osantana
database article architecture
11 Sep 09

rieman bren
mysql database scalability performance architecture friendfeed scaling programming delicious
Marc Carlucci
mysql database scalability friendfeed web development article howto sql python scaling schemaless storage db
Marcin Kasperski
schema-less mysql scalability architecture scaling friendfeed database model
citrin
mysql database friendfeed
10 Sep 09

Javier Neira
mysql database performance scalability architecture nosql
- making schema changes or adding indexes to a database with more than 10 - 20 million rows completely locks the database for hours at a time. Removing old indexes takes just as much time, and not removing them hurts performance because the database will continue to read and write to those unused blocks on every INSERT, pushing important blocks out of memory
- We can change the "schema" simply by storing new properties.
- We index data in these entities by storing indexes in separate MySQL tables. If we want to index three properties in each entity, we will have three MySQL tables - one for each index.
- We can store new properties and index them in a day's time rather than a week's time, and we don't need to swap MySQL masters and slaves or do any other scary operational work to make it happen.
- Since our database is sharded, and indexes for an entity can be stored on different shards than the entities themselves, consistency is an issue
3 more annotations...
02 Sep 09

dan
fascintating hybrid of schemaless and schema'd high performance datastore at friendfeed. - stashing indexed free(ish)-form serialised python objects in the fields.

nosql mysql db python scalability
25 Aug 09

mysql database friendfeed architecture schemaless nosql
19 Aug 09

David Corking
- Our datastore stores schema-less bags of properties (e.g., JSON objects or Python dictionaries).
10 Aug 09

icy leaf
mysql
07 Aug 09

andrewclegg
Heh. Perhaps MySQL was a better schema-less hash store than RDBMS, all along.

database performance cath_watchlist
05 Aug 09

Kevin Ridgway
mysql couchdb scaling friendfeed howto data database toread
04 Aug 09

tapiokulmala
architecture sql
30 Jul 09

Reid Beckett
Bookmarks Bar LINKS
29 Jul 09

koen_h
mysql scalability toprint delicious
26 Jul 09

Hee Won Kim
mysql database scalability performance architecture FriendFeed scaling Programming
24 Jul 09

Navneet Kumar
a "schema-less" storage system on top of MySQL

Database MySQL Key-Value-Store Document-Database Architecture
- making schema changes or adding indexes to a database with more than 10 - 20 million rows completely locks the database for hours at a time. Removing old indexes takes just as much time, and not removing them hurts performance because the database will continue to read and write to those unused blocks on every INSERT, pushing important blocks out of memory
- InnoDB stores data rows physically in primary key order. The AUTO_INCREMENT primary key ensures new entities are written sequentially on disk after old entities, which helps for both read and write locality (new entities tend to be read more frequently than old entities since FriendFeed pages are ordered reverse-chronologically
15 Jul 09

kaketoe kaketoe
programming development friendfeed mysql database
05 Jul 09

Don Do

< Previous 1 2 3 4 Next >

Public Stiky Notes

Ken Wei on 2009-02-27

11111111111

Page Comments

Joel Liu on 2009-02-27

hi
Ken Wei on 2009-02-27

222222222

Would you like to comment?

Join Diigo for a free account, or sign in if you are already a member.

Top Tags

mysql
database
performance

Other bookmarks from the site bret.appspot.com »

Check out another URL