Todd Hoff's picture

Welcome to High Scalability

We started High Scalability to help you build successful scalable websites. This site tries to bring together all the lore, art, science, practice, and experience of building scalable websites into one place so you can learn how to build your system with confidence. Hopefully this site will move you further and faster along the learning curve of success. Please Start Here.

Todd Hoff's picture

Useful Cloud Computing Blogs

Can't get enough cloud computing? Then you must really be a glutton for punishment! But just in case, here are some cloud computing resources, collected from various sources, that will help you transform into a Tesla silently flying solo down the diamond lane.

Meta Sources

  • Cloud Computing Email List: An often lively email list discussing cloud computing.
  • Cloud Computing Blogs & Resources. An excellent and big list of cloud resources.
  • Cloud Computing Portal: A community edited database for making the vendor selection process easier.
  • List of Cloud Platforms, Providers, and Enablers.
  • datacenterknowledge.com's Recap: More than 70 Industry Blogs : A nice set of blog's for: Data Center, Web Hosting, Content Delivery Network (CDN), Cloud Computing

    Specific Blogs

  • James Urquhart's The Wisdom of Clouds : Cloud Computing and Utility Computing for the Enterprise and the Individual. James writes great articles and has a regular can't miss links style post summarizing much of what you need need to know in cloud world.

    Many more below the fold.

  • Todd Hoff's picture

    37signals Architecture

    Update 4: O'Rielly's Tim O'Brien interviews David Hansson, Rails creator and 37signals partner. Says BaseCamp scales horizontally on the application and web tier. Scales up for the database, using one "big ass" 128GB machine. Says: As technology moves on, hardware gets cheaper and cheaper. In my mind, you don't want to shard unless you positively have to, sort of a last resort approach.
    Update 3: The need for speed: Making Basecamp faster. Pages now load twice as fast, cut CPU usage by a third and database time by about half. Results achieved by: Analysis, Caching, MySQL optimizations, Hardware upgrades.
    Update 2: customer support is handled in real-time using Campfire.
    Update: highly useful information on creating a customer billing system.

    In the giving spirit of Christmas the folks at 37signals have shared a bit about how their system works. 37signals is most famous for loosing Ruby on Rails into the world and they've use RoR to make their very popular Basecamp, Highrise, Backpack, and Campfire products. RoR takes a lot of heat for being a performance dog, but 37signals seems to handle a lot of traffic with relatively normal sounding resources. This is just an initial data dump, they promise to add more details later. As they add more I'll update it here.

    Todd Hoff's picture

    An Unorthodox Approach to Database Design : The Coming of the Shard

    Update: Dan Pritchett shares some excellent Sharding Lessons: Size Your Shards, Use Math on Shard Counts, Carefully Consider the Spread, Plan for Exceeding Your Shards

    Once upon a time we scaled databases by buying ever bigger, faster, and more expensive machines. While this arrangement is great for big iron profit margins, it doesn't work so well for the bank accounts of our heroic system builders who need to scale well past what they can afford to spend on giant database servers. In a extraordinary two article series, Dathan Pattishall, explains his motivation for a revolutionary new database architecture--sharding--that he began thinking about even before he worked at Friendster, and fully implemented at Flickr. Flickr now handles more than 1 billion transactions per day, responding in less then a few seconds and can scale linearly at a low cost.

    What is sharding and how has it come to be the answer to large website scaling problems?

    Todd Hoff's picture

    A Scalable, Commodity Data Center Network Architecture

    Looks interesting...

    Abstract:
    Today’s data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. Nonuniform bandwidth among data center nodes complicates application design and limits overall system performance.
    In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today’s higher-end solutions. Our approach requires no modifications to the end host network interface, operating system, or applications; critically, it is fully backward compatible with Ethernet, IP, and TCP.

    Todd Hoff's picture

    Latency is Everywhere and it Costs You Sales - How to Crush it

    Latency matters. Amazon found every 100ms of latency cost them 1% in sales. Google found an extra .5 seconds in search page generation time dropped traffic by 20%. A broker could lose $4 million in revenues per millisecond if their electronic trading platform is 5 milliseconds behind the competition.

    The Amazon results were reported by Greg Linden in his presentation Make Data Useful. In one of Greg's slides Google VP Marissa Mayer, in reference to the Google results, is quoted as saying "Users really respond to speed." And everyone wants responsive users. Ka-ching! People hate waiting and they're repulsed by seemingly small delays.

    The less interactive a site becomes the more likely users are to click away and do something else. Latency is the mother of interactivity. Though it's possible through various UI techniques to make pages subjectively feel faster, slow sites generally lead to higher customer defection rates, which lead to lower conversation rates, which results in lower sales. Yet for some reason latency isn't a topic talked a lot about for web apps. We talk a lot about about building high-capacity sites, but very little about how to build low-latency sites. We apparently do so at the expense of our immortal bottom line.

    I wondered if latency went to zero if sales would be infinite? But alas, as Dan Pritchett says, Latency Exists, Cope!. So we can't hide the "latency problem" by appointing a Latency Czar to conduct a nice little war on latency. Instead, we need to learn how to minimize and manage latency. It turns out a lot of problems are better solved that way.

    How do we recover that which is most meaningful--sales--and build low-latency systems?

    Wuala - P2P Online Storage Cloud

    How do you design a reliable distributed file system when the expected availability of the individual nodes are only ~1/5? That is the case for P2P systems. Dominik Grolimund, the founder of a Swiss startup Caleido will show you how! They have launched Wuala, the social online storage service which scales as new nodes join the P2P network.

    The goal of Wua.la is to provide distributed online storage that is:

    • large
    • scalable
    • reliable
    • secure

    by harnessing the idle resources of participating computers.

    This challenge is an old dream of computer science. In fact as Andrew Tanenbaum wrote in 1995:
    "The design of a world-wide, fully transparent distributed filesystem fot simultaneous use by millions of mobile and frequently disconnected users is left as an exercise for the reader"

    After three years of research and development at at ETH Zurich, the Swiss Federal Institute of Technology on a distributed storage system, Caleido is ready to unveil the result: Wuala. Wuala is a new way of storing, sharing, and publishing files on the internet. It enables its users to trade parts of their local storage for online storage and it allows us to provide a better service for free. In this Google Tech Talk, Dominik will explain what Wuala is and how it works, and he will also show a demo.

    Todd Hoff's picture

    Strategy: Drop Memcached, Add More MySQL Servers

    Update 2: Michael Galpin in Cache Money and Cache Discussions likes memcached for it's expiry policy, complex graph data, process data, but says MySQL has many advantages: SQL, Uniform Data Access, Write-through, Read-through, Replication, Management, Cold starts, LRU eviction.
    Update: Dormando asks Should you use memcached? Should you just shard mysql more?. The idea of caching is the most important part of caching as it transports you beyond a simple CRUD worldview. Plan for caching and sharding by properly abstracting data access methods. Brace for change. Be ready to shard, be ready to cache. React and change to what you push out which is actually popular, vs over planning and wasting valuable time.

    Feedster's François Schiettecatte wonders if Fotolog's 21 memcached servers wouldn't be better used to further shard data by adding more MySQL servers? He mentions Feedster was able to drop memcached once they partitioned their data across more servers. The algorithm: partition until all data resides in memory and then you may not need an additional memcached layer.

    Parvesh Garg goes a step further and asks why people think they should be using MySQL at all?

    Related Articles


  • The Death of Read Replication by Brian Aker. Caching layers have replaced read replication. Cache can't fix a broken database layer. Partition the data that feeds the cache tier: "Keep your front end working through the cache. Keep all of your data generation behind it."
  • Read replication with MySQL by François Schiettecatte. Read replication is dead and it should be used only for backup purposes. Take the memory used for caching and give it to your database servers.
  • Replication++, Replication 2.0, Replication.Next by Ronald Bradford. What should read replication be used for?
  • Replication, caching, and partitioning by Greg Linden. Caching overdone because it adds complexity, latency on a cache miss, and inefficiently uses cluster resources. Hitting disk is the problem. Shard more and get your data in memory.

  • Todd Hoff's picture

    Strategy: Serve Pre-generated Static Files Instead Of Dynamic Pages

    Pre-generating static files is an oldy but a goody, and as Thomas Brox Røst says, it's probably an underused strategy today. At one time this was the dominate technique for structuring a web site. Then the age of dynamic web sites arrived and we spent all our time worrying how to make the database faster and add more caching to recover the speed we had lost in the transition from static to dynamic.

    Static files have the advantage of being very fast to serve. Read from disk and display. Simple and fast. Especially when caching proxies are used. The issue is how do you bulk generate the initial files, how do you serve the files, and how do you keep the changed files up to date? This is the process Thomas covers in his excellent article Serving static files with Django and AWS - going fast on a budget", where he explains how he converted 600K thousand previously dynamic pages to static pages for his site Eventseer.net, a service for tracking academic events.

    Eventseer.net was experiencing performance problems as search engines crawled their 600K dynamic pages. As a solution you could imagine scaling up, adding more servers, adding sharding, etc etc, all somewhat complicated approaches. Their solution was to convert the dynamic pages to static pages in order to keep search engines from killing the site. As an added bonus non logged-in users experienced a much faster site and were more likely to sign up for the service.

    The article does a good job explaining what they did, so I won't regurgitate it all here, but I will cover the highlights and comment on some additional potential features and alternate implementations...

    Product: Terracotta - Open Source Network-Attached Memory

    Terracotta is Network Attached Memory (NAM) for Java VMs. It provides up to a terabyte of virtual heap for Java applications that spans hundreds of connected JVMs.

    NAM is best suited for storing what they call scratch data. Scratch data is defined as object oriented data that is critical to the execution of a series of Java operations inside the JVM, but may not be critical once a business transaction is complete.

    The Terracotta Architecture has three components:

    1. Client Nodes - Each client node corresponds to a client node in the cluster which runs on a standard JVM
    2. Server Cluster - java process that provides the clustering intelligence. The current Terracotta implementation operates in an Active/Passive mode
    3. Storage used as
      • Virtual Heap storage - as objects are paged out of the client nodes, into the server, if the server heap fills up, objects are paged onto disk
      • Lock Arbiter - To ensure that there is no possibility of the classic "split-brain" problem, Terracotta relies on the disk infrastructure to provide a lock.
      • Shared Storage - to transmit the object state from the active to passive, objects are persisted to disk, which then shares the state to the passive server(s).

    JVM-level clustering can turn single-node, multi-threaded apps into distributed, multi-node apps, often with no code changes. This is possible by plugging in to the Java Memory Model in order to maintain key Java semantics of pass-by-reference, thread coordination and garbage collection across the cluster. Terracotta enables this using only declarative configuration with minimal impact to existing code and provides fine-grained field-level replication which means your objects no longer need to implement Java serialization.

    Ari Zilka, the founder and CTO of Terracotta had a
    video session
    organized by Skills Matter. He will show you how it works and how you can start clustering your POJO-based Web applications (based on Spring, Struts, Wicket, RIFE, EHCache, Quartz, Lucene, DWR, Tomcat, JBoss, Jetty or Geronimo etc.).

    Todd Hoff's picture

    Strategy: Limit The New, Not The Old

    One of the most popular and effective scalability strategies is to impose limits (GAE Quotas, Fotolog, Facebook) as a means of protecting a website against service destroying traffic spikes. Twitter will reportedly limit the number followers to 2,000 in order to thwart follow spam. This may also allow Twitter to make some bank by going freemium and charging for adding more followers.

    Agree or disagree with Twitter's strategies, the more interesting aspect for me is how do you introduce new policies into an already established ecosystem?

    One approach is the big bang. Introduce all changes at once and let everyone adjust. If users don't like it they can move on. The hope is, however, most users won't be impacted by the changes and that those who are will understand it's all for the greater good of their beloved service. Casualties are assumed, but the damage will probably be minor.

    Now in Twitter's case the people with the most followers tend to be opinion leaders who shape much of the blognet echo chamber. Pissing these people off may not be your best bet.

    What to do? Shegeeks.net makes a great proposal: Limit The New, Not The Old. The idea is to only impose the limits on new accounts, not the old. Old people are happy and new people understand what they are getting into.

    The reason I like this suggestion so much is that it has deep historical roots, all the way back to the fall of the Roman republic and the rise of the empire due to the agrarian reforms laws passed in 133BC. In ancient Rome property and power, as they tend to do, became concentrated in the hands of a few wealthy land owners. Let's call them the nobility. The greatness that was Rome was founded on a agrarian society. People made modest livings on small farms. As power concentrated small farmers were kicked of the land and forced to move to the city. Slaves worked the land while citizens remained unemployed. And cities were no place to make a life. Civil strife broke out. Pliny said that "it was the large estates which destroyed Italy."

    Marcelb's picture

    Distributed Computing & Google Infrastructure

    A couple of videos about distributed computing with direct reference on Google infrastructure.
    You will get acquainted with:

    --MapReduce the software framework implemented by Google to support parallel computations over large (greater than 100 terabyte) data sets on commodity hardware
    --GFS and the way it stores it's data into 64mb chunks
    --Bigtable which is the simple implementation of a non-relational database at Google

    Cluster Computing and MapReduce Lectures 1-5.

    Todd Hoff's picture

    Scaling Traffic: People Pod Pool of On Demand Self Driving Robotic Cars who Automatically Refuel from Cheap Solar

    Update 8: Great article in Wired on Better Place's proposal for a new electric car distribution system. The idea is to blanket the country with "smart" charge spots. You buy your car from them and purchase a recharge plan. Profit come from selling electricity.
    Update 7: Capturing solar energy from asphalt pavements. An interesting way to make the system self-sufficient.
    Update 6: Why We Drive the Way We Do Unlocks How to Unclog Traffic. Vanderbilt says: The fundamental problem is that you've got drivers who make user-optimal rather than system-optimal decisions. Josh McHugh replies: Make the packets (cars) dumb and able to take marching orders from traffic routing nodes.
    Update 5: Traffic jams are not caused by flaws in road design but by flaws in human nature. Nearly 80 percent of crashes involve drivers not paying attention for up to three seconds. The both good and scary thing about computers is they always pay attention.
    Update 4: Volvo Says It Will Have An Injury Proof Car By 2020.
    Update 3: Map Reading For Dummies. Europe (again) is developing a system that will read satellite navigation maps and warn the driver of upcoming hazards – sharp bends, dips and accident black spots – which may be invisible to the driver. Even better, the system can update the geographic database. Another key capability of the People Pod system.
    Update 2: Road Safety: The Uncrashable Car?. A European research project basic could lead to a car that is virtually uncrashable. An uncrashable car would definitely ease people's concerns over computerized navigation.
    Update: Shockwave traffic jam recreated for first time - "Pinpointing the causes of shockwave jams is an exercise in psychology more than anything else. 'If they had set up an experiment with robots driving in a perfect circle, flow breakdown would not have occurred. Human error is needed to cause the fluctuations in behaviour.'"

    Traffic in the San Francisco Bay area is like Dolly Parton, 10 pounds in a 5 pound sack. Mass transit has been our unseen traffic woe savior for a while. But the ring of political fire circling the bay has prevented any meaningful region wide transportation solution. As everyone scrambles to live anywhere they can afford, we really need a region wide solution rather than the local fixes that can never go quite far enough. The solution: create a People Pod Pool of On Demand Self Driving Robotic Cars who Automatically Refuel from Cheap Solar.

    Todd Hoff's picture

    Strategy: Let Google and Yahoo Host Your Ajax Library - For Free

    Don't have a CDN? Why not let Google and Yahoo be your CDN? At least for Ajax libraries. No charge. Google runs a content distribution network and loading architecture for the most popular open source JavaScript libraries, which include: jQuery, prototype, script.aculo.us, MooTools, and dojo. The idea is web pages directly include your library of choice from Google's global, fast, and highly available network. Some have found much better performance and others experienced slower performance. My guess is the performance may be slower if your data center is close to you, but far away users will be much happier. Some negatives: not all libraries are included, you'll load more than you need because all functionality is included. Yahoo has had a similar service for YUI for a while. Remember to have a backup plan for serving your libraries, just in case.

    The Art of Capacity Planning: Scaling Web Resources

    Update 2: Maybe the iPhone can use a little capacity planning? What's Behind the iPhone 3G Glitches: One source says Apple programmed the Infineon chip to demand a more powerful 3G signal than the iPhone really requires. So if too many people try to make a call or go on the Internet in a given area, some of the devices will decide there's insufficient power and switch to the slower network—even if there is enough 3G bandwidth available.
    Update: To get a taste of what will be served, mySQL DBA has a nice post titled Capacity Planning, Architecture, Scaling, Response time, Throughput. You learn how to figure out when your application will break by building a 3rd order polynomial. Cool stuff!

    John Allspaw who is the Operations Engineering Manager at Flickr is about to publish a book with O'Reilly.

    There are not much details so far but it seems interesting and relevant to High Scalability.

    Allspaw combines personal anecdotes from many phases of Flickr's growth with insights from his colleagues in many other industries to give you solid guidelines for measuring your growth, predicting trends, and making cost-effective preparations.

    Topics include:

    • Evaluating tools for measurement and deployment
    • Capacity analysis and prediction for storage, database, and application servers
    • Designing architectures to easily add and measure capacity
    • Handling sudden spikes
    • Predicting exponential and explosive growth
    • How cloud services such as EC2 can fit into a capacity strategy

    The Art of Capacity Planning: Scaling Web Resources is available for pre-order on amazon.com

    Todd Hoff's picture

    A Bunch of Great Strategies for Using Memcached and MySQL Better Together

    The primero recommendation for speeding up a website is almost always to add cache and more cache. And after that add a little more cache just in case. Memcached is almost always given as the recommended cache to use. What we don't often hear is how to effectively use a cache in our own products. MySQL hosted two excellent webinars (referenced below) on the subject of how to deploy and use memcached. The star of the show, other than MySQL of course, is Farhan Mashraqi of Fotolog. You may recall we did an earlier article on Fotolog in Secrets to Fotolog's Scaling Success, which was one of my personal favorites.

    Fotolog, as they themselves point out, is probably the largest site nobody has ever heard of, pulling in more page views than even Flickr. Fotolog has 51 instances of memcached on 21 servers with 175G in use and 254G available. As a large successful photo-blogging site they have very demanding performance and scaling requirements. To meet those requirements they've developed a sophisticated approach to using memcached that others can learn from and emulate. We'll cover some of the highlightable strategies from the webinar down below the fold.

    Todd Hoff's picture

    Ehcache - A Java Distributed Cache

    Ehcache is a pure Java cache with the following features: fast, simple, small foot print, minimal dependencies, provides memory and disk stores for scalability into gigabytes, scalable to hundreds of caches
    is a pluggable cache for Hibernate, tuned for high concurrent load on large multi-cpu servers, provides LRU, LFU and FIFO cache eviction policies, and is production tested. Ehcache is used by LinkedIn to cache member profiles. The user guide says it's possible to get at 2.5 times system speedup for persistent Object Relational Caching, a 1000 times system speedup for Web Page Caching, and a 1.6 times system speedup Web Page Fragment Caching.
    From the website:

    Todd Hoff's picture

    Product: Amazon's SimpleDB

    Update 31: Amazon fixes a major hole in SimpleDB by adding the ability to sort query results. Previously developers had to sort results by hand which was a non-starter for many. Now you can do basic top 10 type queries with ease.
    Update 30: Amazon SimpleDB - A distributed, highly-scalable, light-weight, query-able, attribute store by Sebastian Stadil. It introduces the CAP theorem and the basics of SimpleDB. Sebastian does a lot of great work in the AWS world and in what must be his limited free time, runs the AWS Meetup group.

    Todd Hoff's picture

    Sharding the Hibernate Way

    Update: A very nice JavaWorld podcast interview with Google engineer Max Ross on Hibernate Shards. Max defines Hibernate Shards (horizontal partitioning), how it works (pretty well), virtual shards (don't ask), what they need to do in the future (query, replication, operational tools), and how it relates to Google AppEngine (not much).

    To scale you are supposed to partition your data. Sounds good, but how do you do it? When you actually sit down to work out all the details it’s not that easy. Hibernate Shards to the rescue! Hibernate shards is: an extension to the core Hibernate product that adds facilities for horizontal partitioning. If you know the core Hibernate API you know the shards API. No learning curve at all. Here is what a few members of the core group had to say about the Hibernate Shards open source project. Although there are some limitations, from the sound of it they are doing useful stuff in the right way and it’s very much worth looking at, especially if you use Hibernate or some other ORM layer.

    Todd Hoff's picture

    Google's Paxos Made Live – An Engineering Perspective

    This is an unusually well written and useful paper. It talks in detail about experiences implementing a complex project, something we don't see very often. They shockingly even admit that creating a working implementation of Paxos was more difficult than just translating the pseudo code. Imagine that, programmers aren't merely typists! I particularly like the explanation of the Paxos algorithm and why anyone would care about it, working with disk corruption, using leases to support simultaneous reads, using epoch numbers to indicate a new master election, using snapshots to prevent unbounded logs, using MultiOp to implement database transactions, how they tested the system, and their openness with the various problems they had. A lot to learn here.

    From the paper:
    We describe our experience building a fault-tolerant data-base using the Paxos consensus algorithm.
    Despite the existing literature in the field, building such a database proved to be non-trivial. We describe
    selected algorithmic and engineering problems encountered, and the solutions we found for them. Our
    measurements indicate that we have built a competitive system.