This link has been bookmarked by 433 people . It was first bookmarked on 27 Feb 2009, by Ted Louie.
-
01 Jun 15
-
01 Oct 14
-
16 Dec 11
-
28 Sep 11
-
27 Sep 11
alexgadeaentity can be stored on different shards than the entities themselves, consistency is an issue. What if the process crashes before it has written to all the index tables?
Building a transaction protocol was appealing to the most ambitious of FriendFeed e -
26 Aug 11
-
18 Aug 11
-
10 Aug 11
-
07 Jul 11
-
04 Jul 11
-
22 May 11
-
21 May 11
-
27 Apr 11
-
26 Apr 11
-
06 Apr 11
-
20 Mar 11
timkellerA slightly older article about mysql scaling. Worth reading as we scale UNITI Fireweb and Disasternet.
-
08 Mar 11
-
16 Feb 11
-
10 Feb 11
-
08 Feb 11
-
03 Feb 11
-
08 Jan 11
-
06 Jan 11
-
22 Nov 10
-
05 Nov 10
-
05 Oct 10
-
26 Sep 10
-
15 Sep 10
-
As our database has grown, we have tried to iteratively deal with the scaling issues that come with rapid growth. We did the typical things, like using read slaves and memcache to increase read throughput and sharding our database to improve write throughput. However, as we grew, scaling our existing features to accomodate more traffic turned out to be much less of an issue than adding new features.
-
-
-
We index data in these entities by storing indexes in separate MySQL tables
-
one for each index
-
-
10 Sep 10
-
08 Sep 10
-
15 Aug 10
-
19 Jul 10
-
06 Jul 10
-
04 Jul 10
-
01 Jul 10
-
eeds. For instance there are different key/value stores with varying characteristics and the document-based ones should get more stable over time. One option I think will get more and more interesting in the future is using a graph database engine like http://neo4j.org/ (which BTW is the re
-
-
26 Jun 10
-
21 Jun 10
-
15 Jun 10
-
30 Apr 10
-
21 Apr 10
-
10 Apr 10
-
31 Mar 10
-
15 Mar 10
-
11 Mar 10
-
03 Mar 10
anonymous anonymousOk, here's the thing - most other databases do not have the same maintenance limitations that MySQL has that are the root cause of all of this.
-
16 Feb 10
-
15 Feb 10
-
10 Feb 10
-
06 Feb 10
-
04 Feb 10
-
22 Jan 10
-
Lots of projects exist designed to tackle the problem storing data with flexible schemas and building new indexes on the fly (e.g., CouchDB). However, none of them seemed widely-used enough by large sites to inspire confidence.
-
MySQL works. It doesn't corrupt data. Replication works. We understand its limitations already. We like MySQL for storage, just not RDBMS usage patterns.
-
Our datastore stores schema-less bags of properties (e.g., JSON objects or Python dictionaries). The only required property of stored entities is
id, a 16-byte UUID. The rest of the entity is opaque as far as the datastore is concerned. -
We index data in these entities by storing indexes in separate MySQL tables.
-
Indexes are stored in separate tables. To create a new index, we create a new table storing the attributes we want to index on all of our database shards
-
Our datastore automatically maintains indexes on your behalf
-
And we could populate the index asynchronously (even while serving live traffic) with:
./rundatastorecleaner.py --index=index_link -
Since our database is sharded, and indexes for an entity can be stored on different shards than the entities themselves, consistency is an issue. What if the process crashes before it has written to all the index tables?
Building a transaction protocol was appealing to the most ambitious of FriendFeed engineers, but we wanted to keep the system as simple as possible.
-
- Read the
entity_idfrom all of the index tables based on the query - Read the entities from the
entitiestable from the given entity IDs - Filter (in Python) all of the entities that do not match the query conditions based on the actual property values
When we read from the index tables, we know they may not be accurate (i.e., they may reflect old property values if writing has not finished step 2). To ensure we don't return invalid entities based on the constraints above, we use the index tables to determine which entities to read, but we re-apply the query filters on the entities themselves rather than trusting the integrity of the indexes:
- Read the
-
To ensure that indexes are not missing perpetually and inconsistencies are eventually fixed, the "Cleaner" process I mentioned above runs continously over the entities table, writing missing indexes and cleaning up old and invalid indexes. It cleans recently updated entities first, so inconsistencies in the indexes get fixed fairly quickly (within a couple of seconds) in practice.
-
We do shard our indexes. We query all the relevant index shards in parallel and over-fetch. The indexes are stored in sort order, so sorting is not an issue. To paginate, we fetch start + num and truncate in Python.
-
For our last re-shard, we basically set up a parallel instance of our DB and wrote to both in parallel while we copied data over, then switched off the old system. Not optimal, certainly, but it worked for us.
-
-
Dennis FSince our databases are all heavily sharded, the relational features of MySQL like JOIN have never been useful to us, so we decided to look outside of the realm of RDBMS.
-
30 Dec 09
-
25 Dec 09
-
23 Dec 09
-
21 Dec 09
Rodrigo de OliveiraOur datastore stores schema-less bags of properties (e.g., JSON objects or Python dictionaries). The only required property of stored entities is id, a 16-byte UUID. The rest of the entity is opaque as far as the datastore is concerned. We can change the
blog performance mysql nosql schemaless banco_de_dados how-to dica índice from_delicious
-
18 Dec 09
-
09 Dec 09
-
20 Nov 09
-
16 Nov 09
-
15 Nov 09
-
08 Nov 09
-
30 Oct 09
-
29 Oct 09
-
21 Oct 09
-
14 Oct 09
-
11 Oct 09
-
04 Oct 09
-
25 Sep 09
-
11 Sep 09
-
10 Sep 09
-
making schema changes or adding indexes to a database with more than 10 - 20 million rows completely locks the database for hours at a time. Removing old indexes takes just as much time, and not removing them hurts performance because the database will continue to read and write to those unused blocks on every
INSERT, pushing important blocks out of memory -
We can change the "schema" simply by storing new properties.
-
We index data in these entities by storing indexes in separate MySQL tables. If we want to index three properties in each entity, we will have three MySQL tables - one for each index.
-
We can store new properties and index them in a day's time rather than a week's time, and we don't need to swap MySQL masters and slaves or do any other scary operational work to make it happen.
-
Since our database is sharded, and indexes for an entity can be stored on different shards than the entities themselves, consistency is an issue
-
-
02 Sep 09
-
25 Aug 09
-
19 Aug 09
-
Our datastore stores schema-less bags of properties (e.g., JSON objects or Python dictionaries).
-
-
10 Aug 09
-
07 Aug 09
andrewcleggHeh. Perhaps MySQL was a better schema-less hash store than RDBMS, all along.
-
05 Aug 09
-
04 Aug 09
-
30 Jul 09
-
29 Jul 09
-
26 Jul 09
-
24 Jul 09
Navneet Kumara "schema-less" storage system on top of MySQL
Database MySQL Key-Value-Store Document-Database Architecture
-
making schema changes or adding indexes to a database with more than 10 - 20 million rows completely locks the database for hours at a time. Removing old indexes takes just as much time, and not removing them hurts performance because the database will continue to read and write to those unused blocks on every
INSERT, pushing important blocks out of memory -
InnoDB stores data rows physically in primary key order. The
AUTO_INCREMENTprimary key ensures new entities are written sequentially on disk after old entities, which helps for both read and write locality (new entities tend to be read more frequently than old entities since FriendFeed pages are ordered reverse-chronologically
-
-
15 Jul 09
-
05 Jul 09
Public Stiky Notes
Page Comments
Would you like to comment?
Join Diigo for a free account, or sign in if you are already a member.