As Jon Udell pointed out, Amazon’s S3 service is filled with potential. But I’m looking for an enhancement, which if they implemented it would add instant scalability and reliability to hundreds of thousands of applications. I didn’t invent caching or CDNs, but I’ve been a huge fan of this architecure for many years, and I wish it was more common in the web-hosting industry. Here’s a copy of a post I just left on the Amazon Web Services Developer Connection forum:
I wonder if there’s a way to use S3 as a cache or content-delivery network (CDN)?
We, like others, have an application containing a large number large objects. The challenge is that while each of them may be modified each day, relatively few of them are downloaded by the public on any given day. Pushing new versions of each object to S3 each day would be very wasteful of bandwidth, since most of the updated versions won’t be accessed.
This is why we like caching/CDN architectures, and it’s something I’d love to see S3 support. It’s an extraordinarily cool architecture that painlessly gives small-server apps large-server scalability. Here’s how I imagine it working:
- We (the S3 customer) upload an object using the APIs.
- Along with this upload, we specify an “Origin Server URL” on our own servers where we have stored the original copy of the object.
- We publish the S3 public URL of the object for external access by the public.
- When S3 receives a request for the registered object, it first sends an HTTP HEAD request to our origin server to see whether the object has changed.
- If the object has not changed since the most-recently uploaded version, or if the origin server doesn’t respond promptly for whatever reason, S3 delivers the object to the public requester.
- However, if the object on the origin server is newer than S3’s copy, S3 fetches the new copy from the origin server and, while doing so, delivers that version to the requester.
If you’ve ever used a CDN or even a standard cache (like Squid) you know how brilliant this architecture can be. As I mentioned above, it *instantly* adds scalability and reliability to a small-server application. (If S3’s HEAD request fails for whatever reason, it returns its most-recent version of the object to the requester.)
An app developer can then simply write new or modified objects to his local low-capacity, low-cost server then use the APIs to upload to S3. That’s it. Done. Got an update or new version? Just write it to the origin server. Your local server goes down? No problem. The S3 infrastructure keeps on ticking.
S3’s pricing of $0.20 (USD) per GB of traffic is actually very good. It’s extremely good as compared to commercial CDNs. If you have to upload all your objects every day, even if they’re not downloaded by your visitors, however, the economics rapidly deteriorate. Caching solves all of that.