陈浩民版西游记演员表:Tagged Architecture - Scaling to 100 Million Users, 1000 Servers, and 5 Billion Page Views

来源:百度文库 编辑:九乡新闻网 时间:2024/05/08 16:20:43

Tagged Architecture - Scaling to 100 Million Users, 1000 Servers, and 5 Billion Page Views

Monday, August 8, 2011 at 9:20AM

This is a guest post by Johann Schleier-Smith, CTO & co-founder, Tagged.

Five snapshots on how Tagged scaled to more than 1,000 servers

Since 2004, Tagged has grownfrom a tiny social experiment to one of the largest social networks,delivering five billion pages per month to many millions of memberswho visit to meet and socialize with new people. One step at a time,this evolution forced us to evolve our architecture, eventuallyarriving at an enormously capable platform.

V1: PHP webapp, 100k users, 15 servers, 2004

Tagged was born in the rapid-prototyping culture of an incubatorthat usually launched two new concepts each year in search of the bigwinner. LAMP was the natural choice for this style of work, whichemphasized flexibility and quick turnaround at a time when Javadevelopment was mostly oriented towards development at largeenterprises, Python attracted too few programmers, and Perl brought thewrong sort. Also, we knew that Yahoo was a big proponent of PHP, soit would be possible to scale the business when the need arose.

Significant experience running MySQL on previous projects had leftme with a love-hate relationship with the technology. In the spirit ofexperimentation we purchased a few entry-level Oracle licenses forTagged to see whether that would work better.

Remarkably, many smaller web sites are still built just like theoriginal Tagged. There is beauty in simplicity, and the two-waydivision between stateless PHP and stateful Oracle concentrates thetrickiest bits in a single server, while extra page-renderingcompute-power is easy to add.

V2: Cached PHP webapp, 1m users, 20 servers, 2005

Even at eight servers Tagged had more web traffic than most of ushad known. Fortunately, memcached brought dual advantages, removingover 90% of database reads, and ensuing that social networking pagespacked with diverse information would render quickly.

From the start, our object caching emphasized explicit cache updatesin favor of simpler techniques such as deleting invalid keys, orexpiring stale data based on timers. At the cost of more complex code,this reduces database load substantially and keeps the site fast,particularly when frequently-updated objects are involved.

Our site continued to evolve in complexity beyond standard socialnetworking features (friends, profiles, messages) with the addition ofsearch and social discovery functions. My team talked me into usingJava to build search so that we could benefit from the Lucenelibraries. I was relieved when we learned to run it well, and myreluctance born of early experiences with JDK 1.0 was transformed toenthusiasm for the platform.

V3: Databases scaling, 10m users, 100 servers, 2006

With 10 million registered users and thousands online at any momentwe approached the challenge that I had been dreading. We had justraised capital and were working hard on growth, but the database wasbursting for capacity. We scrambled to release one caching or SQLtuning optimization after the other, but the CPU our servers would timeand again trend towards the 100% mark.

The idea of scaling up offered a quick fix, but the multi-socketserver hardware could cost millions, so we opted for Oracle RAC, whichlet us use standard networking to hook up lots of several commodityLinux hosts to build one big database. When joined with the advantagesof the latest CPUs, Oracle RAC delivered a crucial 20-fold capacityincrease over our first database server, and allowed applicationdevelopers to stay focused on building new features.

Java edged further into the environment when Tagged began to offerpersonalized people-matching recommendations by sewing togetherstatistics from a large in-memory data set, something entirelyimpractical to do with PHP.

V4: Database sharding, 50m users, 500 servers, 2007

Sharding the database was without a doubt the most challenging, butalso the most rewarding episode in scaling Tagged. By splitting upusers among multiple databases we finally had a design that at allplaces allowed us to scale just by adding hardware.

Our rule at Tagged is to shard each table across 64 partitions, andwe hold firm to this default unless there is a very compelling reasonto make an exception. Only certain games that benefit fromhigh-performance protected transactions between players are verticallypartitioned in a separate database.

Sharding existing data represented a complex transformation acrossseveral terabytes. At first we attacked features one-at-a-time,relying on application code to replace joins, but eventually weencountered a bundle of tables at the core of the application tooclosely linked for this approach. Writing migration software togenerate SQL, we exported, transformed, and reloaded hundreds ofmillions of rows, using triggers to track changes on source system andupdating targets incrementally so that the final sync involved anoutage of less than 30 minutes.

Having many databases means having many database connections.Especially as we added more "social discovery" functions like Meet Me,our first dating feature, sharding would have overwhelmed PHP, whichlacked Oracle connection pooling. To cope, we built a Java applicationthat exposes a web service for running queries, one which alsocontinues to provide a very convenient monitoring point and allowsgraceful handling database failures.

V5: Refinements and extensions, 80m users, 1,000 servers, 2010

Here we jump ahead several years. With the crux databasescalability problems solved, we found it straightforward to supportexpansion by adding hardware. PHP and memcached continued to serve uswell, supporting rapid feature development.

During this time, scalability considerations shifted towardsmitigating failures, addressing the threat of an increasing number ofbreakable parts. Protecting the web layer from problems at itsdependencies was achieved through load balancer health checks andautomatic shutdown of unresponsive services. We also engineered corecomponents for resilience, e.g., if memcached becomes overloaded withconnections it must recover immediately once that burden is removed.

Java achieved a much more prominent role, in part due to increasingacceptance and expertise, but also because of increasing challenges.To combat spam and other abuses, our algorithms take advantage of largeshared memory spaces, as well as compute-intensive techniques. Socialgames also benefited from the performance and concurrency control ofJava, but there has been a cost in complexity; we now need to managemany more distinct pools of applications than before.

The future

Today, Tagged delivers five billion page views each month to itsmillions of members. Since we've arrived at scalable design, we canspend most of our energy on creating features that serve users better.We have effective tools for creating scalable software, but we canimagine much better ones, so current investments focus on softwarelibraries, improving programmer effectiveness and productivity, and Stig,our upcoming open source, graph based database project designed forlarge scale social networks, real-time services and cloud applications.

Related Articles

  • Johann Schleier-Smith Interview with theCube