Todd Hoff's picture

Welcome to High Scalability

We started High Scalability to help you build successful scalable websites. This site tries to bring together all the lore, art, science, practice, and experience of building scalable websites into one place so you can learn how to build your system with confidence. Hopefully this site will move you further and faster along the learning curve of success. Please Start Here.

Scalability Perspectives #2: Van Jacobson – Content-Centric Networking

Scalability Perspectives is a series of posts that highlights the ideas that will shape the next decade of IT architecture. Each post is dedicated to a thought leader of the information age and his vision of the future. Be warned though – the journey into the minds and perspectives of these people requires an open mind.

Van Jacobson

Van Jacobson is a Research Fellow at PARC. Prior to that he was Chief Scientist and co-founder of Packet Design. Prior to that he was Chief Scientist at Cisco. Prior to that he was head of the Network Research group at Lawrence Berkeley National Laboratory. He's been studying networking since 1969. He still hopes that someday something will start to make sense.

Scaling the Internet – Does the Net needs an upgrade?

As the Internet is being overrun with video traffic, many wonder if it can survive. With challenges being thrown down over the imbalances that have been created and their impact on the viability of monopolistic business models, the Internet is under constant scrutiny. Will it survive? Or will it succumb to the burden of the billion plus community that is constantly demanding more and more?

Does the Net Need an Upgrade? To answer this question a distinguished panel of Van Jacobson, Rick Hutley, Norman Lewis, David S. Isenberg has discussed the issue on the Supernova conference. In this compelling debate available on IT Conversations, the panel addresses the question and provides some differing perspectives. One of the perspectives is Content-based networking described by Van Jacobson.

Todd Hoff's picture

What CDN would you recommend?

Update 7: Where Amazon’s Data Centers Are Located, Expanding the Cloud: Amazon CloudFront. Why Amazon's CDN Offering Is No Threat To Akamai, Limelight or CDN Pricing. Amazon has launched their CDN with "“low latency, high data transfer speeds, and no commitments.” The perfect relationship for many. The majority of the locations are in North America, but some are in Europe and Asia.
Update 6: Amazon Launching New Content Delivery Network: No Threat To Major CDNs, Yet. All the Amazon will kill all other CDNs is a bit overblown. As usual Dan Rayburn sets us straight: The offering won't support streaming, live broadcasting, or provide many of the other products and services that video content owners need...the real story here is that Amazon is going to offer a high performance method of distributing content with low latency and high data transfer rates.
Update 5: When It Comes To Content Delivery Networks, What Is The "Edge"?. Dan Rayburn is on edge about the misuse of the term edge: closest location to the user does not guarantee quality, often content is not delivered from the closest location, all content is not replicated at every "edge" location. Lots of other essential information.
Update 4: David Cancel runs a great test to see if you should be Using Amazon S3 as a CDN?. Conclusion: "CacheFly performed the best but only slightly better than EdgeCast. The S3 option was the worst with the Nginx/DIY option performing just over 100 ms faster." Also take look at Part 2 - Cacheability?
Update 3: Mr. Rayburn takes A Detailed Look At Akamai's Application Delivery Product . They create a "bi-nodal overlay network" where users and servers are always within 5 to 10 milliseconds of each other. Your data center hosted app can't compete. The problem is that people (that is, me) can understand the data center model. I don't yet understand how applications as a CDN will work.
Update 2: Dan Rayburn starts an interesting series of articles on Highlights Of My Day In Cambridge With Akamai. Akamai is moving strong into the application distribution business. That would make an interesting cloud alternative..
Update: Streamingmedia links to new CDN DF Splash that specializes in instant-on TV-quality video streaming.

A question was raised on the forum asking for a CDN recommendation. As usual there are no definitive answers, but here are three useful articles that may help your deliberations.

  • First, Tony Chang shows how to drive down response times using edge acceleration strategies.
  • Then Pingdom gives a nice overview and introduction to CDNs.
  • And last but not least, Dan Rayburn from StreamingMedia.com gives a master class in how much you should pay for your CDN, what you should be getting for your money, and how to find the right provider for your needs.

    Lots and lots of good stuff to learn, even if you didn't roll out of bed this morning pondering the deeper mysteries of content delivery networks and the Canadian dollar.

  • Todd Hoff's picture

    Is Eucalyptus ready to be your private cloud?


    Rich Wolski, professor of Computer Science at the University of California, Santa Barbara, gave a spirited talk on Eucalyptus to a large group of very interested cloudsters at the Eucalyptus Cloud Meetup. If Rich could teach computer science at every school the state of the computer science industry would be stratospheric. Rich is dynamic, smart, passionate, and visionary. It's that vision that prompted him to create Eucalyptus in the first place. Rich and his group are experts in grid and distributed computing, having a long and glorious history in that space. When he saw cloud computing on the rise he decided the best way to explore it was to implement what everyone accepted as a real cloud, Amazon's API. In a remarkably short time they implement Eucalyptus and have been improving it and tracking Amazon's changes ever since.

    The question I had going into the meetup was: should Eucalyptus be used to make an organization's private cloud? The short answer is no.

    The project is of high quality, the people are of the highest quality, but in the end Eucalyptus is a research project from a university. As an academic project Eucalyptus is subject to changes in funding and the research interests of the team. When funding sources dry up so does the project. If the team finds another research area more interesting, or if they get tired of chasing a continuous stream of new Amazon features, or no new grad students sign on, which will happen in a few years, then the project goes dark.

    Fears over continuity have at least two solutions: community support and commercial support. Eucalyptus could become community supported open source project. This is unlikely to happen though as it conflicts with the research intent of Eucalyptus. The Eucalyptus team plans to control the core for research purposes and encourage external development of add-on service like SQS. Eucalyptus won't go commercial as University projects must stay clear from commercial pretensions. Amazon is "no comment" on Eucalyptus so it's not clear what they would think of commercial development should it occur.

    Taken together these concerns imply Eucalyptus is not a good base for an enterprise quality private cloud. Which they readily admit. It's not enterprise ready Rich repeats. It's not that the quality isn't there. It is and will be. And some will certainly base their private cloud on Eucalyptus, but when making a decision of this type you have to be sure your cloud infrastructure will be around for the long haul. With Eucalyptus that is not necessarily the case. Eucalyptus is still a good choice for it's original research purpose, or as cheap staging platform for Amazon, or as base for temporary clouds, but as your rock solid private cloud infrastructure of the future Eucalyptus isn't the answer.

    The long answer is a little more nuanced and interesting.

    Private/Public Cloud

    Data centers are reshaping themselves by taking ideas from public cloud providers, such as Amazon and Google. The idea is to make the data center more cost-effective by enabling on-demand utility-based computing rather than dedicated machines. At the same time, it is clear that to make IT operations more effective, it doesn't make sense to run all the applications that are currently hosted in a company's data center in the private cloud. This calls for an integration between private and public cloud. In this post i discuss some of the challenges involved in making that happen:
    1. How do we design applications to be cloud-agnostic?
    2. How do we enable seamless fail-over to a public cloud?
    3. Future-proofing: There are many cases in which we can't make a clear decision as to where our application should be running at the time of writing or developing the application. We would like to be in a position to change the decision as to where our application will be running even after our application has been completely developed.

    Todd Hoff's picture

    Useful Cloud Computing Blogs

    Update 2: Overcast: Conversations on Cloud Computing. Listened to the first two podcasts and they're doing a great job. Worth a look. The singing and dance routines are way over the top however :-)
    Update: 9 Sources of Cloud Computing News You May Not Know About by James Urquhart. I folded in these recommendations.

    Can't get enough cloud computing? Then you must really be a glutton for punishment! But just in case, here are some cloud computing resources, collected from various sources, that will help you transform into a Tesla silently flying solo down the diamond lane.

    Meta Sources

  • Cloud Computing Email List: An often lively email list discussing cloud computing.
  • Cloud Computing Blogs & Resources. An excellent and big list of cloud resources.
  • Cloud Computing Portal: A community edited database for making the vendor selection process easier.
  • List of Cloud Platforms, Providers, and Enablers.
  • datacenterknowledge.com's Recap: More than 70 Industry Blogs : A nice set of blog's for: Data Center, Web Hosting, Content Delivery Network (CDN), Cloud Computing
  • Cloud Computing Wiki: A cloud computing wiki started by participants of the cloud email list.

    Specific Blogs

  • Cloud Computing on Twitter : Geva Perry's Big List of People Who Twitter About Cloud Computing
  • Overcast: Conversations on Cloud Computing : Podcast series on cloud computing by James Urquhart and Geva Perry.
  • James Urquhart's The Wisdom of Clouds : Cloud Computing and Utility Computing for the Enterprise and the Individual. James writes great articles and has a regular can't miss links style post summarizing much of what you need need to know in cloud world.

  • Paper: Pig Latin: A Not-So-Foreign Language for Data Processing

    Yahoo has developed a new language called Pig Latin that fit in a sweet spot between high-level declarative querying in the spirit of SQL, and low-level, procedural programming `a la map-reduce and combines best of both worlds.

    The accompanying system, Pig, is fully implemented, and compiles Pig Latin into physical plans that are executed over Hadoop, an open-source, map-reduce implementation. Pig has just graduated from the Apache Incubator and joined Hadoop as a subproject.

    The paper has a few examples of how engineers at Yahoo! are using Pig to dramatically reduce the time required for the development and execution of their data analysis tasks, compared to
    using Hadoop directly.

    References: Apache Pig Wiki

    CloudCamp London 2: private clouds and standardisation

    CloudCamp returned to London yesterday, organised with the help of Skills Matter at the Crypt on the Clarkenwell green. The main topics of this cloud/grid computing community meeting were service-level agreements, connecting private and public clouds and standardisation issues.

    Todd Hoff's picture

    Plenty of Fish Says Scaling for Free Doesn't Pay

    Plenty of FishCEO Markus Frind, famous nerd hero for making over $10 million a year from Google ads on a free dating site he made and ran all by himself, now sees a problem with the free model:


    The problem with free is that every time you double the size of your database the cost of maintaining the site grows 6 fold. I really underestimated how much resources it would take, I have one database table now that exceeds 3 billion records. The bigger you get as a free site the less money you make per visit and the more it costs to service a visit...There is really no money in being free and we have to start experimenting with other models now or we won’t be able to compete in 3 or 4 years.

    As one commenter succinctly put it: the “golden time” of AdSense is over. Time to look at costs. The POF architecture is to run scarily huge tables on single machines. They also buy and maintain their own SAN. So it seems scaling up is what is increasing costs and decreasing profits. I wonder if the economics of cloud storage and cloud architectures might have a more linear cost curve?

    Scalability Perspectives #1: Nicholas Carr – The Big Switch

    Scalability Perspectives is a series of posts that highlights the ideas that will shape the next decade of IT architecture. Each post is dedicated to a thought leader of the information age and his vision of the future. Be warned though – the journey into the minds and perspectives of these people requires an open mind.

    Nicholas Carr

    A former executive editor of the Harvard Business Review, Nicholas Carr writes and speaks on technology, business, and culture. His provocative 2004 book Does IT Matter? set off a worldwide debate about the role of computers in business.

    The Big Switch – Rewiring the World, From Edison to Google

    Carr's core insight is that the development of the computer and the Internet remarkably parallels that of the last radically disruptive technology, electricity. He traces the rapid morphing of electrification from an in-house competitive advantage to a ubiquitous utility, and how the business advantage rapidly shifted from the innovators and early adopters to corporate titans who made their fortune from controlling a commodity essential to everyday life. He envisions similar future for the IT utility in his new book

    Managing application on the cloud using a JMX Fabric

    This post describes how you can create a federated management model using JMX standard API. Applications that are already using a standard JMX interface can plug-in the new federated implementation without changing the application code and without introducing additional performance overhead.

    Todd Hoff's picture

    How Sites are Scaling Up for the Election Night Crush

    Election night is a big traffic boost for news and social sites. Yahoo expects up to 400 million page views on Election Day. Data Center Knowledge has an excellent article how various sites are preparing to handle spikes in election night traffic. Some interesting bits:

  • Prepare ahead. Don't wait to handle spikes, plan and prepare before the blessed event.
  • Use a CDN. Daily Kos puts images on a CDN, but the dynamic nature of their site means the can't use CDN for their other content.
  • Scale up. Daily Kos "to handle the traffic better, we moved to a cluster of six quad core Xeons with 8GB RAM for webheads that all boot off a central NFS (Network File System) root, with the capability of adding more webheads as needed,” . They also "added two 16GB eight-core Xeons and a 6×73GB RAID-10 array for database files running a MySQL master/slave setup."
  • Add Cache. Daily Kos added 1GB instances memcached to each webhead.
  • Change Caching Strategy. Daily Kos puts fully rendered pages into memcached.
  • Change Serving Strategy. Daily Kos directly serves cached pages from memcached directly to anonymous users from lighttpd running as the front end proxy. The moves a lot of work off the backend and distributes work on the new hefty webheads. Site performance has improved greatly.
  • Add Capacity. Limelight expanded its network capacity to over 2 Terabytes per second.

    Tonight is a big night for a lot of sites. It's interesting to see how some are responding to the challenge. A lot of what they are doing will work for you too.

  • Todd Hoff's picture

    Strategy: How to Manage Sessions Using Memcached

    Dormando shows an enlightened middle way for storing sessions in cache and the database. Sessions are a perfect cache candidate because they are transient, smallish, and since they are usually accessed on every page access removing all that load from the database is a good thing. But as Dormando points out session caches have problems. If you remove expiration times from the cache and you run out of memory then no more logins. If a cache server fails or needs to be upgrade then you just logged out a bunch of potentially angry users.

    The middle ground Dormando proposes is using both the cache and the database:

  • Reads: read from the cache first, then the database. Typical cache logic.
  • Writes: write to memcached every time, write to the database every N seconds (assuming the data has changed).

    There's a small chance of data loss, but you've still greatly reduced the database load while providing reliability. Nice solution.

  • Olio Web2.0 Toolkit - Evaluate Web Technologies and Tools

    How do you evaluate and decide which web technologies (and there are myriads out there) to use for your new web application, which one potentially gives you the best performance, which one will likely give you the shortest time-to-market? The Apache incubator project Olio might help.

    Olio is a is an open source web 2.0 toolkit to help evaluate the suitability, functionality and performance of web technologies. Olio defines an example web2.0 application (an events site somewhat like yahoo.com/upcoming) and provides three initial implementations : PHP, Java EE and RubyOnRails (ROR). The toolkit also defines ways to drive load against the application in order to measure performance.

    Apache Olio could be used to

    • Understand how to use various web 2.0 technologies such as AJAX, memcached, mogileFS etc. Use the code in the application to understand the subtle complexities involved and how to get around issues with these technologies.
    • Evaluate the differences in the three implementations: php, ruby and java to understand which might best work for your situation.
    • Within each implementation, evaluate different infrastructure technologies by changing the servers used (e.g: apache vs lighttpd, mysql vs postgre, ruby vs Jruby etc.)
    • Drive load against the application to evaluate the performance and scalability of the chosen platform.
    • Experiment with different algorithms (e.g. memcache locking, a different DB access API) by replacing portions of code in the application.

    Olio started it's life as the web2.0kit developed by Sun Microsystems in colloboration with U.C. Berkeley RAD Lab and was presented on Velocity2008.

    Todd Hoff's picture

    CTL - Distributed Control Dispatching Framework

    CTL is a flexible distributed control dispatching framework that enables you to break management processes into reusable control modules and execute them in distributed fashion over the network.

    From their website:
    CTL is a flexible distributed control dispatching framework that enables you to break management processes into reusable control modules and execute them in distributed fashion over the network.

    What does CTL do?
    CTL helps you leverage your current scripts and tools to easily automate any kind of distributed systems management or application provisioning task. Its good for simplifiying large-scale scripting efforts or as another tool in your toolbox that helps you speed through your daily mix of ad-hoc administration tasks.

    What are CTL's features?
    CTL has many features, but the general highlights are:

    * Execute sophisticated procedures in distributed environments - Aren't you tired of writing and then endlessly modifying scripts that loop over nodes and invoke remote actions? CTL dispatches actions to remote controllers with network transparency (over SSH), parallelism, and error handling already built in.
    * Comes with pre-built utilities - CTL comes with pre-built utilities so you don't have to script actions like file distribution or process and port checking.
    * Define your own automation using the tools/languages you already know - New controller modules are defined in XML and your scripting can be done in multiple scripting languages (Perl, Python, etc.), *nix shell, Windows batch, and/or Ant.
    * Cross platform administration - CTL is Java-based, works on *nix and Windows.

    Three steps for turning a tier-based/Spring-application into dynamically scalable services (video)

    Summary
    In this presentation, a three steps approach for turning your existing stateful tier-based/Spring-application into a dynamically scalable services application using OpenSpaces is demonstrated. The existing programming model is kept the same while focusing on abstracting and replacing the underlying implementations of the middleware stack in a way that will fit the scale-out model.

    Bio
    Nati Shalom is the CTO and Founder of GigaSpaces and responsible for the technology roadmap. He has 10 years of experience with distributed technology and architecture namely CORBA, Jini, J2EE, Grid and SOA. Nati is the Head of the Israeli Grid consortium and an evangelist of Space Based Architecture and Data Grid patterns. Blog: Gigaspaces Blog

    Read the rest of the article here on InfoQ.

    Todd Hoff's picture

    Notify.me Architecture - Synchronicity Kills

    What's cool about starting a new project is you finally have a chance to do it right. You of course eventually mess everything up in your own way, but for that one moment the world has a perfect order, a rightness that feels satisfying and good. Arne Claassen, the CTO of notify.me, a brand new real time notification delivery service, is in this honeymoon period now.

    Arne has been gracious enough to share with us his philosophy of how to build a notification service. I think you'll find it fascinating because Arne goes into a lot of useful detail about how his system works.

    His main design philosophy is to minimize the bottlenecks that form around synchronous access, that is when some resource is requested and the requestor ties up more resources, waiting for a response. If the requested resource can’t be delivered in a timely manner, more and more requests pile up until the server can’t accept any new ones. Nobody gets what they want and you have an outage. Breaking synchronous operations into asynchronous operations by separating request and response into separate message passing actions, stops the resource overload. Instead of a system going down from too many parallel requests, it can works its way through a backlog of requests as fast as it can. And in most cases the request/response cycles are so fast that they appear like a linear sequence of events.

    Notify.me is taking the innovative and risky strategy of using ejabberd, an XMPP based system, as their internal messaging and routing layer. Will Erlang and Mnesia (Erlang's database) be able to keep up with traffic and keep low latencies as traffic scales? It will be interesting to find out.

    If you are interested in notify.me they've kindly offered 500 beta accounts for HS readers: http://notify.me/user/account/create/highscale

    Who are you?

    My name is Arne Claassen, the CTO of notify.me. I've been working on highly scalable web
    based applications and services for the past decade. These sites have employed various
    combinations of traditional scaling techniques such as server farms, caching, content pregeneration
    and highly available databases using replication and clustering. All of these
    techniques are ways to mitigate scarce resources (generally the database) being in
    contention by many users. Knowing the benefits and pitfalls of these techniques, it has
    become my focus to architect systems that circumvent scarce resource scenarios.

    What is notify.me, why did you make it, and why is it a good thing?

    Todd Hoff's picture

    Should you use a SAN to scale your architecture?

    This is a question everyone must struggle with when building out their datacenter. Storage choices are always the ones I have the least confidence in. David Marks in his blog You Can Change It Later! asks the question Should I get a SAN to scale my site architecture? and answers no. A better solution is to use commodity hardware, directly attach storage on servers, and partition across servers to scale and for greater availability.

    David's reasoning is interesting:

  • A SAN creates a SPOF (single point of failure) that is dependent on a vendor to fly and fix when there's a problem. This can lead to long down times during this outage you have no access to your data at all.
  • Using easily available commodity hardware minimizes risks to your company, it's not just about saving money. Zooming over to Fry's to buy emergency equipment provides the kind of agility startups need in order to respond quickly to ever changing situations.

    It's hard to beat the power and flexibility (backups, easy to add storage, mirroring, etc) of a good SAN, but Mark makes a good case.

  • Todd Hoff's picture

    Product: Puppet the Automated Administration System

    Update: Digg on their choice and use of Puppet. They chose puppet over cfengine, and bcfg2 because they liked Puppet's resource abstraction layer (RAL), the ability to implement configuration management incrementally, support for bundles, and the overall design philosophy.

    Puppet implements a declarative (what not how) configuration language for automating common administration tasks. It's the system every large site writes for themselves and it's already made for you! Ilike was able to "easily" scale from 0 to hundreds of servers using Puppet. I can't believe I've never seen this before. It looks really cool. What is Puppet and how can it help you scale your website operations?

    Todd Hoff's picture

    11 Secrets of a Cloud Scale Consultant That They Dont' Want You to Know

    OK, there is no "they" and "they" wouldn't care if you knew anyway. After all, this isn't a blog about really important stuff like investing, acne cures, or cheap natural cleansing products.

    But the secrets are real. Super cloud scaling consultant Kent Langley has put together a comprehensive checklist to consider when developing for the cloud:

  • ORM for Data Partitioning and Query Splitting - Split queries between updates and deletes from the start
  • Monitoring process, resources, and uptime - Process Monitoring, Resource Monitoring, UpTime Monitoring
  • Performance Testing and Capacity Planning - Can't make good decisions without doing some degree of Performance Testing and Capacity planning.
  • Static vs. Dynamic Content splitting / CDN -