We started High Scalability to help you build successful scalable websites. This site tries to bring together all the lore, art, science, practice, and experience of building scalable websites into one place so you can learn how to build your system with confidence. Hopefully this site will move you further and faster along the learning curve of success. Please Start Here.
I've been interested in sharding concepts since first hearing the term "shard" a few years back. My interest had been piqued earlier, the first time I read about Google's original approach to distributed search. It was described as a hashtable-like system in which independent physical machines play the role of the buckets. More recently, I needed the capacity and performance of a Sharded system, but did not find helpful libraries or toolkits which would assist with the configuration for my language of preference these days, which is Python. And, since I had a few weeks on my hands, I decided I would begin the work of creating these tools.
The result of my initial work the Pyshards project, a still-incomplete python and MySQL based horizontal partitioning and sharding toolkit. HighScalability.com readers will already know that horizontal partitioning is a data segmenting pattern in which distinct groups of physical row-based datasets are distributed across multiple partitions. When the partitions exist as independent databases and when they exist within a shared-nothing architecture they are known as shards. (Google apparently coined the term shard for such database partitions, and pyshards has adopted it.) The goal is to provide big opportunities for database scalability while maintaining good performance. Sharded datasets can be queried individually (one shard) or collectively (aggregate of all shards). In the spirit of The Zen of Python, Pyshards focuses on one obvious way to accomplish horizontal partitioning, and that is by using a hash/modulo based algorithm.
Pyshards provides the ability to reasonably add polynomial capacity (number of original shards squared) without re-balancing (re-sharding). Pyshards is designed with re-sharding in mind (because the time will come when you must re-balance) and provides re-sharding algorithms and tools. Finally, Pyshards aspires to provide a web-based shard monitoring tool so that you can keep an eye on resource capacity.
So why publish an incomplete open source project? I'd really prefer to work with others who are interested in this topic instead of working in a vacuum. If you are curious, or think you might want to get involved, come visit the project page, join a mailing list, or add a comment on the WIKI.
http://code.google.com/p/pyshards/wiki/Pyshards
Devin
Eucalyptus is generating some excitement on the Cloud Computing group as a potential vendor neutral EC2 compatible cloud platform. Two reasons why Eucalyptus is potentially important: private clouds and cloud portability.
Private clouds. Let's say you want a cloud like infrastructure for architectural purposes but you want it to run on your own hardware in your own secure environment. How would you do this today? Hm....
Cloud portability. With the number of cloud offerings increasing how can you maintain some level of vendor neutrality among this "swarm" of different options? Portability is a key capability for cloud customers as the only real power customers have is in where they take their business and the only way you can change suppliers is if there's a ready market of fungible services. And the only way their can be a market is if there's a high degree of standardization.
What should you standardize on? The options are usually to form a great committee and take many years to spec out something that doesn't exist, nobody will build, and will never really work. Or have each application create a high enough layer interface that portability is potentially difficult, but possible. Or you can take a popular existing API, make it the general API, and everyone else is accommodated using an adapter layer and the necessary special glue to take advantage of value add features for each cloud.
With great foresight Eucalyptus has chosen to create a cloud platform based on Amazon's EC2. As this is the most successful cloud platform it makes a lot of sense to use it as a model. We see something similar with the attempts to port Google AppEngine to EC2 thus making GAE a standard framework for web apps. So developers would see GAE on top of EC2. A lot of code would be portable between clouds using this approach. Even better would be to add ideas in from RightScale, 3Tera, and Mosso to get a higher level view of the cloud, but that's getting ahead of the game.
Just what is Eucalyptus?
Update 2: Hank Williams says iPhone Background Processing: Not Fixed But Halfway There. Excellent analysis of all the reasons you need real background processing. Hey, you can't even build an alarm clock! Hard to believe some commenters say it's not so..
Update: Josh Lowensohn of Webware tells us Why users should be scared of Apple's new notification system.
A big item on the iPhone developer iWishlist has been background processing. If you can't write an app to poll for new data in the background how will you keep your even more important non-foreground app in sync? Live from the Apple developer conference we learn the solution is a centralized push based architecture.
Here's the relevant MacRumorsLive transcript:
A poll based architecture is good when system capabilities are relatively symmetric. Clearly mobile phones are restricted along a number of dimensions, the most important being battery power. Having a large number of apps constantly polling for updates sucks down battery power faster than vampires at phlebotomist convention.
So Apple's logic is sound. Keep a single connection over which data is pushed and work on the phone is minimized. You also maximize battery life and maximize bandwidth usage because data can be aggregated on the server side and be sent in large chunks rather than a random distribution of small packets.
The mechanics of how this works isn't clear. Must all apps needing to push data to a phone become part of Apple's private iPhone cloud? Smart for Apple as it gives them complete control. For sculptors of the ultimate user experience you want total control. Not so good for developers as it's just another garden with a very high wall protecting it.

Lukas Biewald shares a fascinating slam by slam recount of how his FaceStat (upload your picture and be judged by the masses) site was battered by a link on Yahoo's main page that caused an almost instantaneous 650,000 page view jump on their site. Yahoo spends considerable effort making sure its own properties can handle the truly massive flow from the main page. Turning the Great Eye of the Internet towards an unsuspecting newborn site must be quite the diaper ready experience. Theo Schlossnagle eerily prophesized about such events in The Implications of Punctuated Scalabilium for Website Architecture: massive, unexpected and sudden traffic spikes will become more common as a fickle internet seeks ever for new entertainments (my summary). Exactly FaceStat's situation.
This is also one of our first exposures to an application written on Merb, a popular Ruby on Rails competitor. For those who think Ruby is the problem, their architecture now serves 100 times the original load.
How did our fine FaceStat fellowship fair against Yahoo’s onslaught?
If you just can't get enough high scalability talk you might want to take a look GigaOm's Structure 08 conference. The slate of speakers looks appropriately interesting and San Francisco is truly magical this time of year. High Scalability readers even get a price break is you use the HIGHSCALE discount code! I'll be on vacation so I won't see you there, but it looks like a good time. For a nice change of pace consider visiting MoMA next door. Here's a blurb on the conference:
A reminder to our readers about Structure 08, GigaOm's upcoming conference dedicated to web infrastructure. In addition to keynotes from leaders like Jim Crowe, chairman and CEO of Level 3 Communications and Werner Vogels, CTO of Amazon, the event will feature workshops from Google App Engine, Microsoft and a special workshop from Fenwick and West who will cover how to raise money for an infrastructure start up. Learn from the guru's at Amazon, Google, Microsoft, Sun, VMWare and more about what the future of internet infrastructure is at Structure 08, which will be held on June 25 in San Francisco.
Scalability forces us to think differently. What worked on a small scale doesn't always work on a large scale -- and costs are no different. If 90% of our application is free of contention, and only 10% is spent on a shared resources, we will need to grow our compute resources by a factor of 100 to scale by a factor of 10! Another important thing to note is that 10x, in this case, is the limit of our ability to scale, even if more resources are added.
1. The cost of non-linearly scalable applications grows exponentially with the demand for more scale.
2. Non-linearly scalable applications have an absolute limit of scalability. According to Amdhal's Law, with 10% contention, the maximum scaling limit is 10. With 40% contention, our maximum scaling limit is 2.5 - no matter how many hardware resources we will throw at the problem.
This post discuss in further details how to measure the true cost of non linearly scalable systems and suggest a model for reducing that cost significantly.
LinkedIn is the largest professional networking site in the world. LinkedIn employees presented two sessions about their server architecture at JavaOne 2008. This post contains a summary of these presentations.
Key topics include:
Why isn't Google's aggressive new database pricing strategy getting more pub? That's what Bill Katz, instigator of the GAE Meetup and prize winning science fiction author is wondering:
It's surprising that the blogosphere hasn't picked up the biggest difference in pricing: Google's datastore is less than a tenth of the price of Amazon's SimpleDB while offering a better API.
If money matters to you then the burn rate under GAE could be convincingly lower. Let's compare the numbers:
Justin.tv is looking to hire a Scaling Engineer to help scale their video cluster, IRC server, web app, monitoring and search services. I've never seen this job title before. A quick search that showed only a few previous instances of it being used. Has anyone else seen Scaling Engineer as a job title before?
It's a great idea. Scaling is certainly a worthy specialty of it's own. Why there's a difficult lingo, obscure tools, endlessly subtle concepts, a massive body of knowledge to master, and many competing religious factions. All a good start. Next I see a chain of Scalability Universities. Maybe use all those Starbucks that are closing down. Contact me for franchise opportunities :-)
Update 4: David Cancel runs a great test to see if you should be Using Amazon S3 as a CDN?. Conclusion: "CacheFly performed the best but only slightly better than EdgeCast. The S3 option was the worst with the Nginx/DIY option performing just over 100 ms faster." Also take look at Part 2 - Cacheability?
Update 3: Mr. Rayburn takes A Detailed Look At Akamai's Application Delivery Product . They create a "bi-nodal overlay network" where users and servers are always within 5 to 10 milliseconds of each other. Your data center hosted app can't compete. The problem is that people (that is, me) can understand the data center model. I don't yet understand how applications as a CDN will work.
Update 2: Dan Rayburn starts an interesting series of articles on Highlights Of My Day In Cambridge With Akamai. Akamai is moving strong into the application distribution business. That would make an interesting cloud alternative..
Update: Streamingmedia links to new CDN DF Splash that specializes in instant-on TV-quality video streaming.
A question was raised on the forum asking for a CDN recommendation. As usual there are no definitive answers, but here are three useful articles that may help your deliberations.
Lots and lots of good stuff to learn, even if you didn't roll out of bed this morning pondering the deeper mysteries of content delivery networks and the Canadian dollar.
Update: Aaron Worsham Interview with James Lindenbaum, CEO of Heroku. Aaron nicely sums up their goal: Heroku is looking to eliminate all the reasons companies have for not doing software projects.
Adam Wiggins of Heroku presented at the lollapalooza that was theCloud Computing Demo Night. The idea behind Heroku is that you upload a Rails application into Heroku and it automatically deploys into EC2 and it automatically scales using behind the scenes magic. They call this "liquid scaling." You just dump your code and go. You don't have to think about SVN, databases, mongrels, load balancing, or hosting. You just concentrate on building your application. Heroku's unique feature is their web based development environment that lets you develop applications completely from their control panel. Or you can stick with your own development environment and use their API and Git to move code in and out of their system.
For website developers this is as high up the stack as it gets. With Heroku we lose that "build your first lightsaber" moment marking the transition out of apprenticeship and into mastery. Upload your code and go isn't exactly a heroes journey, but it is damn effective...
Now you can buy more cores on EC2 without adding more machines:
Diagonal Scaling is making a site faster by removing machines. More on this intriguing idea in Diagonal Scaling - Don't Forget to Scale Out AND Up.
The following technical Webinar could be of interest to the community.
WHO:
WHAT:
WHEN:
Check out the details here!

Update 3: ReadWriteWeb says Google App Engine Announces New Pricing Plans, APIs, Open Access. Pricing is specified but I'm not sure what to make of it yet. An image manipulation library is added (thus the need to pay for more CPU :-) and memcached support has been added. Memcached will help resolve the can't write for every read problem that pops up when keeping counters.
Update 2: onGWT.com threw a GAE load party and a lot of people came. The results at Load test : Google App Engine = 1, Community = 0. GAE handled a peak of 35 requests/second and a sustained 10 requests/second. Some think performance was good, others not so good. My GMT watch broke and I was late to arrive. Maybe next time. Also added a few new design rules from the post.
Update: Added a few new rules gleaned from the GAE Meetup: Design By Explicit Cost Model and Puts are Precious.
How do you structure your database using a distributed hash table like BigTable? The answer isn't what you might expect. If you were thinking of translating relational models directly to BigTable then think again. The best way to implement joins with BigTable is: don't. You--pause for dramatic effect--duplicate data instead of normalize it. *shudder*
Flickr anticipated this design in their architecture when they chose to duplicate comments in both the commentor and the commentee user shards rather than create a separate comment relation. I don't know how that decision was made, but it must have gone against every fiber in their relational bones...
Update 2: EBay's Randy Shoup spills the secrets of how to service hundreds of millions of users and over two billion page views a day in Scalability Best Practices: Lessons from eBay on InfoQ. The practices: Partition by Function, Split Horizontally, Avoid Distributed Transactions, Decouple Functions Asynchronously, Move Processing To Asynchronous Flows, Virtualize At All Levels, Cache Appropriately.
Update: eBay Serves 5 Billion API Calls Each Month. Aren't we seeing more and more traffic driven by mashups composed on top of open APIs? APIs are no longer a bolt on, they are your application. Architecturally that argues for implementing your own application around the same APIs developers and users employ.
Who hasn't wondered how eBay does their business? As one of the largest most loaded websites in the world, it can't be easy. And the subtitle of the presentation hints at how creating such a monster system requires true engineering: Striking a balance between site stability, feature velocity, performance, and cost.
You may not be able to emulate how eBay scales their system, but the issues and possible solutions are worth learning from.
Om proposes one solution to the Twitter Problem is to limit followers to three square meals a day. The reasonable idea being that lower limits should mean fewer scaling problems. And as a kicker raising those limits is a good way to raise much needed revenue.
Scoble thinks users should consume without limit and will drive to another buffet if all-you-can-eat privileges are revoked. The reasonable idea being that if an internet service can't solve internet scale problems then there's not much use for it.
Dave says comp power users a top floor suite and shower them with free passes to the buffet. Let the good times roll! The reasonable idea being that power users help create popular restaurants, er, services in the first place and limiting them starves users and starved users won't come back.
So, should web services like Twitter be a buffet, a fixed eight course fine dining experience, a small plate restaurant, a family style joint, or a vending machine? Or something else entirely?
In a distant barely remembered past I actually worked at an all-you-can-eat buffet. The food was very good and most customers didn't over over indulge. If they did the place wouldn't stay in business long. But some customers did. They were called stackers.
Stackers were so named because a large stack of plates would pile up on their table throughout the meal. Stackers followed a power law distribution. Few customers at any one time were stackers, but their effect could be devastating. How devastating depended on their favorite foods...
From their website:
Condor is a specialized workload management system for compute-intensive jobs. Like other full-featured batch systems, Condor provides a job queueing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. Users submit their serial or parallel jobs to Condor, Condor places them into a queue, chooses when and where to run the jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon completion.
While providing functionality similar to that of a more traditional batch queueing system, Condor's novel architecture allows it to succeed in areas where traditional scheduling systems fail. Condor can be used to manage a cluster of dedicated compute nodes (such as a "Beowulf" cluster). In addition, unique mechanisms enable Condor to effectively harness wasted CPU power from otherwise idle desktop workstations. For instance, Condor can be configured to only use desktop machines where the keyboard and mouse are idle. Should Condor detect that a machine is no longer available (such as a key press detected), in many circumstances Condor is able to transparently produce a checkpoint and migrate a job to a different machine which would otherwise be idle. Condor does not require a shared file system across machines - if no shared file system is available, Condor can transfer the job's data files on behalf of the user, or Condor may be able to transparently redirect all the job's I/O requests back to the submit machine. As a result, Condor can be used to seamlessly combine all of an organization's computational power into one resource.
Update 2: Nice introductory New York Time's article Cloud Computing: So You Don’t Have to Stand Still. Good example of how Animoto used RightScale and Amazon to meet a Facebook driven demand of 25,000 test drives an hour.
Update: Peter Laird in Understanding the Cloud Computing/SaaS/PaaS markets: a Map of the Players in the Industry paints a very cool visual map of all the cloud service players. It's a larger industry than you might think.
Once upon a time I worked at an Asynchronous Transfer Mode (ATM) switch startup. Over a delicious Christmas punch my grandma asked me what I did for a living that I could afford such extravagantly inexpensive gifts. Always so subtle. I explained I worked on an ATM switch. Mistake. She sniffed, said that's nice, and asked me why the Automated Teller Machine ate her bank card that morning. No matter how hard I tried I couldn't convince her I didn't work on bank ATMs. To all future job interrogations I waxed off, protesting I do boring software stuff that nobody cares about.
Not put off in the least, grandma asked me last night to explain this cloud computing thing she keeps hearing about at her church club. Afraid of being another victim of the distortion field surrounding cloud computing, I instead referred her to Kent Langley's excellent overview of the subject in Cloud Computing: Get Your Head in the Clouds. It does a good job demystifying the very confusing concept of cloud computing. It has nice diagrams, definitions, examples and is a great place to start.
She agreed that she had learned a lot, but one thing still troubled her: what's the difference between cloud computing and utility computing? They seem to be the same to her. Always so perceptive. She felt sure if she could drive this point home she would score big points with her church group. Oh the pressure.
In the GAE Meetup yesterday a very interesting design rule came up: Design By Explicit Cost Model. A clumsy name I know, but it is explained like this:
If you are going to be charged for an operation GAE wants you to explicitly ask for it. This is why some automatic navigation between objects isn't provided because that will force an explicit query to be written. Writing an explicit query is a sort of EULA for being charged. Click OK in the form of a query and you've indicated that you are prepared to pay for a database operation.
Usually in programming the costs we talk about are time, space, latency, bandwidth, storage, person hours, etc. Listening to the Google folks talk about how one of their explicit design goals was to require programmers to be mindful of operations that will cost money made me realize in cloud programming cost will be another aspect of design we'll have to factor in.
Instead of asking for the Big O complexity of an algorithm we'll also have to ask for the Big $ (or Big Euro) notation so we can judge an algorithm by its cost against a particular cloud profile. Maybe something like $(CPU=1.3,DISK=3,IN-BANDWIDTH=2,OUT=BANDWIDTH=3, DB=10). You could look at the Big $ notation for algorithm and shake your head saying that approach will never work for GAE, but it could work for Amazon. Can we find a cheaper Big $? ...