This link has been bookmarked by 103 people . It was first bookmarked on 15 Sep 2007, by Marko Anastasov.
-
07 Sep 15
-
04 Jun 15
-
27 Apr 15
-
Squid in reverse-proxy for html and images.
-
Use dedicated servers for static content.
-
The central database includes data like the 'users' table, which includes primary user
keys (a few different IDs) and a pointer to which shard a users' data can be found on. -
Create a search farm by replicating the portion of the database they want to search.
-
Earlier they suffered from Master-Slave lag.
-
Lots of excellent material on capacity planning. Take a look in the Information Sources for more details.
-
if host is down, go to next host in the list; if all hosts are down, display an error page.
-
Each server in shard is 50% loaded.
-
Average queries per page, are 27-35 SQL statements.
-
- A lot of data is stored twice. For example, a comment is part of the relation between the commentor and the commentee.
-
Data size is at 12 TB of user metadata (these are not photos, this is just innodb
-
Keeping staggered backups is good for when you discover something gone wrong a few days later.
-
something like 1, 2, 10 and 30 day backups.
-
Upon upload, it processes the photos, gives you different sizes, then its complete.
-
Tags do not fit well with traditional normalized RDBMs schema design.
-
Denormalization or heavy caching
-
Some of their data views are calculated offline by dedicated processing clusters which save the results into MySQL
-
REST APIs, SOAP APIs, RSS feeds, Atom feeds
-
Statelessness makes for a simpler more robust system
-
Bring capacity planning into the product discussion EARLY.
-
Create clear levels of abstraction between database work, business logic, page logic, page mark-up and the presentation layer.
-
Forget about small efficiencies, about 97% of the time.
-
Test in production.
-
Find ceilings.
-
- Do you have event related growth?
-
-
12 Apr 15
-
'users' table
-
a pointer to which shard a users' data
-
a share nothing architecture
-
Statelessness
-
Shards: My data gets stored on my shard
-
-
10 Jun 14
-
14 Dec 12
-
04 Aug 12
-
27 Jun 12
-
02 Apr 12
-
20 Mar 12
-
Platform
- PHP
- MySQL
- Shards
- Memcached for a caching layer.
- Squid in reverse-proxy for html and images.
- Linux (RedHat)
- Smarty for templating
- Perl
- PEAR for XML and Email parsing
- ImageMagick, for image processing
- Java, for the node service
- Apache
- SystemImager for deployment
- Ganglia for distributed system monitoring
- Subcon stores essential system configuration files in a subversion repository for easy deployment to machines in a cluster.
- Cvsup for distributing and updating collections of files across a network.
-
Lessons Learned
- Think of your application as more than just a web application. You'll have REST APIs, SOAP APIs, RSS feeds, Atom feeds, etc.
- Go stateless. Statelessness makes for a simpler more robust system that can handle upgrades without flinching.
- Re-architecting your database sucks.
- Capacity plan. Bring capacity planning into the product discussion EARLY. Get buy-in from the $$$ people (and engineering management) that it’s something to watch.
- Start slow. Don’t buy too much equipment just because you’re scared/happy that your site will explode.
- Measure reality. Capacity planning math should be based on real things, not abstract ones.
- Build in logging and metrics. Usage stats are just as important as server stats. Build in custom metrics to measure real-world usage to server-based stats.
- Cache. Caching and RAM is the answer to everything.
- Abstract. Create clear levels of abstraction between database work, business logic, page logic, page mark-up and the presentation layer. This supports quick turn around iterative development.
- Layer. Layering allows developers to create page level logic which designers can use to build the user experience. Designers can ask for page logic as needed. It's a negotiation between the two parties.
- Release frequently. Even every 30 minutes.
- Forget about small efficiencies, about 97% of the time. Premature optimization is the root of all evil.
- Test in production. Build into the architecture mechanisms (config flags, load balancing, etc.) with which you can deploy new hardware easily into (and out of) production.
- Forget benchmarks. Benchmarks are fine for getting a general idea of capabilities, but not for planning. Artificial tests give artificial results, and the time is better used with testing for real.
- Find ceilings.
- What is the maximum something that every server can do ?
- How close are you to that maximum, and how is it trending ?
- MySQL (disk IO ?)
- SQUID (disk IO ? or CPU ?)
- memcached (CPU ? or network ?) - Be sensitive to the usage patterns for your type of application.
- Do you have event related growth? For example: disaster, news event.
- Flickr gets 20-40% more uploads on first work day of the year than any previous peak the previous year.
- 40-50% more uploads on Sundays than the rest of the week, on average - Be sensitive to the demands of exponential growth. More users means more content, more content means more connections, more connections mean more usage.
- Plan for peaks. Be able to handle peak loads up and down the stack.
-
-
17 Mar 12
-
14 Feb 12
-
01 Feb 12
-
27 Sep 11
-
03 Nov 10
-
The Stats
- More than 4 billion queries per day.
- ~35M photos in squid cache (total)
- ~2M photos in squid’s RAM
- ~470M photos, 4 or 5 sizes of each
- 38k req/sec to memcached (12M objects)
- 2 PB raw storage (consumed about ~1.5TB on Sunday
- Over 400,000 photos being added every day
-
- 38k req/sec to memcached (12M objects)
- 2 PB raw storage (consumed about ~1.5TB on Sunday
- Over 400,000 photos being added every day
-
Hardware:
- EMT64 w/RHEL4, 16GB RAM
- 6-disk 15K RPM RAID-10.
- Data size is at 12 TB of user metadata (these are not photos, this is just innodb ibdata files - the photos are a lot larger).
- 2U boxes. Each shard has~120GB of data.
-
-
20 Apr 10
-
14 Apr 10
-
03 Apr 10
-
28 Mar 10
-
21 Jan 10
-
10 Dec 09
-
03 Dec 09
-
25 Nov 09
-
18 Nov 09
-
02 Sep 09
-
21 Jul 09
-
16 Jul 09
-
11 Jul 09
-
Dual Tree Central Database
-
-
02 Jul 09
-
19 Jun 09
-
04 May 09
Rodrigo de OliveiraArtigo sobre a arquitetura do Flickr, interessante o uso de shards MySQL
arquitetura artigo banco_de_dados escalabilidade flickr mysql performance php servidor from_delicious
-
01 May 09
-
29 Mar 09
-
17 Dec 08
-
01 Dec 08
-
14 Sep 08
-
24 Aug 08
-
01 Aug 08
-
28 Jul 08
-
22 Jul 08
-
17 Jul 08
-
13 Jul 08
-
12 Jul 08
Antoine BertierPuppet is a system for automating system administration tasks.
-
11 Jul 08
-
09 Jul 08
-
25 Jun 08
-
18 Jun 08
-
10 Jun 08
-
Lessons Learned
- Think of your application as more than just a web application. You'll have REST APIs, SOAP APIs, RSS feeds, Atom feeds, etc.
- Go stateless. Statelessness makes for a simpler more robust system that can handle upgrades without flinching.
- Re-architecting your database sucks.
- Capacity plan. Bring capacity planning into the product discussion EARLY. Get buy-in from the $$$ people (and engineering management) that it’s something to watch.
- Start slow. Don’t buy too much equipment just because you’re scared/happy that your site will explode.
- Measure reality. Capacity planning math should be based on real things, not abstract ones.
- Build in logging and metrics. Usage stats are just as important as server stats. Build in custom metrics to measure real-world usage to server-based stats.
- Cache. Caching and RAM is the answer to everything.
- Abstract. Create clear levels of abstraction between database work, business logic, page logic, page mark-up and the presentation layer. This supports quick turn around iterative development.
- Layer. Layering allows developers to create page level logic which designers can use to build the user experience. Designers can ask for page logic as needed. It's a negotiation between the two parties.
-
- Release frequently. Even every 30 minutes.
- Forget about small efficiencies, about 97% of the time. Premature optimization is the root of all evil.
- Test in production. Build into the architecture mechanisms (config flags, load balancing, etc.) with which you can deploy new hardware easily into (and out of) production.
- Forget benchmarks. Benchmarks are fine for getting a general idea of capabilities, but not for planning. Artificial tests give artificial results, and the time is better used with testing for real.
- Find ceilings.
- What is the maximum something that every server can do ?
- How close are you to that maximum, and how is it trending ?
- MySQL (disk IO ?)
- SQUID (disk IO ? or CPU ?)
- memcached (CPU ? or network ?) - Be sensitive to the usage patterns for your type of application.
- Do you have event related growth? For example: disaster, news event.
- Flickr gets 20-40% more uploads on first work day of the year than any previous peak the previous year.
- 40-50% more uploads on Sundays than the rest of the week, on average - Be sensitive to the demands of exponential growth. More users means more content, more content means more connections, more connections mean more usage.
- Plan for peaks. Be able to handle peak loads up and down the stack. <!-- google_ad_section_end -->
-
-
30 May 08
-
12 Mar 08
-
02 Mar 08
-
13 Feb 08
-
Cache. Caching and RAM is the answer to everything.
-
- Abstract. Create clear levels of abstraction between database work, business logic, page logic, page mark-up and the presentation layer. This supports quick turn around iterative development.
- Layer. Layering allows developers to create page level logic which designers can use to build the user experience. Designers can ask for page logic as needed. It's a negotiation between the two parties.
- Release frequently. Even every 30 minutes.
- Forget about small efficiencies, about 97% of the time. Premature optimization is the root of all evil.
-
-
10 Feb 08
-
08 Feb 08
-
06 Nov 07
-
26 Oct 07
-
26 Sep 07
-
22 Sep 07
J BianFlickr is a famous photo gallery website, it's almost built on free software too, this article shows the architecture of Flickr.
-
16 Sep 07
-
06 Sep 07
-
20 Aug 07
-
17 Aug 07
-
16 Aug 07
-
04 Aug 07
Would you like to comment?
Join Diigo for a free account, or sign in if you are already a member.