Today we posted the final program on IT Conversations and The Conversations Network: an interview by Executive Producer Phil Windley with me. As I previously blogged, The Conversations Network will be shuttered at the end of 2012. If you’ve been a listener to IT Conversations or other channels on The Conversations Network, I think you’ll find it interesting. I hope so.
(The following letter was sent to all members of The Conversations Network earlier today, Sunday, August 16, 2012.)
Hello, Members of The Conversations Network!
It’s been a long time since I’ve sent out a Newsletter, but we’ve been working behind the scenes on some important changes here at The Conversations Network. We’ve been discussing these plans for the past two months with our Board of Directors, Executive Producers and Senior Managers. Channel-by-channel and site-by-site, here’s what we’re going to do.
- Social Innovations Conversations will continue at the Center for Social Innovation at the Stanford Graduate School of Business. All existing programs will be migrated to CSI’s website, which is where all new episodes will appear.
- CHI Conversations will return to its original home on the BayCHI web site.
- IT Conversations production of new programs will cease around December 1.
- SpokenWord.org will be shut down around December 1.
The remaining assets of the Conversations Network (cash and intellectual property) will be acquired by the Internet Archive, another U.S. 501(c)(3) non-profit organization. All existing programs will be moved to the Internet Archive where the world will be able to continue to listen to them for free.
Trying to anticipate some of the questions you may have:
- We hope to preserve all existing URLs by running a “redirection server” for many years if not permanently.
- We will stop accepting new and renewal membership dues and donations within 48 hours.
- All monthly subscriptions will be canceled via PayPal within the next 48 hours.
- Dues and donations already received will be used to help preserve the content (audio files and webpages) we’ve published over the past 9+ years.
So why are we doing this? A bit of history will help explain.
Our flagship channel, IT Conversations, was the second podcast ever published and today is still the longest running of all podcasts. In The Conversations Network’s nearly ten years we’ve published more than 3,300 programs on our three primary channels.
When we started this project, no one else was publishing free audio from conferences or other events. We were the first to stream live tech-conference audio and the first to offer recordings of conference sessions as free podcasts.
We created the Levelator software to standardize audio levels. It’s now in common use by podcasters and broadcasters worldwide and has been downloaded more than 350,000 times.
SpokenWord.org, our a metadata/search site for all audio and video recordings of spoken-word content, has cataloged more than 1.5 million audio and video programs.
Most significantly, we pioneered the concept of a worldwide distributed team of part-time (essentially volunteer) writers, audio engineers and producers to publish broadcast-quality programs. Since 2003, 215 people in all corners of the planet have been members of TeamITC. They are the real force behind what you see and hear on The Conversations Network.
And we’ve done it all on a shoestring budget thanks to our contributing members, content providers, underwriters and Limelight Networks, our long-time content-delivery partner.
We’re proud of what we’ve accomplished. Much of what we’ve pioneered in the past ten years is now commonplace. Our goal was to make it easy for others to produce audio recordings of events and make them available to the world for free. That’s now the norm. We have succeeded.
We’ve helped event producers and podcasters to create and publish programs themselves, and increasingly that’s what they’re doing. There simply isn’t as great a need for a service like The Conversations Network. So we’ve decided to complete our mission by helping our remaining partners continue their podcasts on their own websites.
If you have any questions about these changes, feel free to reply publicly or privately. The best place for your public comments is here on my personal blog.
Thanks again for listening and for your support of The Conversations Network.
Like many other sites hosted on AWS, all of The Conversations Network’s websites went down at 1:41am PDT on April 22, 2011. It would be 64.5 hours until our sites and other servers would be fully restored. A lot has been written about this outage, and I’m sure there’s more to come. Don MacAskill, another early adopter of AWS, has posted a good explanation of SmugMug’s experiences during the outage. Phil Windley and I are hoping to interview our friend Jeff Barr from AWS for Phil’s Technometria podcast once the dust has settled at Amazon.
Many pundits have suggested this event highlights a fundamental flaw in the concept of cloud computing. Others have forecast doom and gloom for AWS in particular. I disagree with both arguments. While it certainly was the most significant failure of cloud computing to date, I predict this event will become not much more than a course correction and a “teachable moment” for Amazon, their competitors, all cloud architects and of course us here at The Conversations Network. For the geeks in the audience, I’m going to describe our architecture, the AWS services we utilize, and give a bit of an explanation about what happened and what we learned.
The Conversations Network utilizes three basic AWS services, plus a few more that aren’t really pertinent to this episode. Our servers are actually instances of AWS Elastic Compute Cloud (EC2) servers. The root filesystem for each server is stored in a small (15GB) AWS Elastic Block Storage (EBS) volume. Not only are these volumes faster than local storage, they’re also persistent. So if/when an EC2 instance stops, the root filesystem for that instance remains intact and will continue to be usable if the instance is re-started. [EC2 instances are booted from Amazon Machine Images (AMIs). In our case, these are based on Fedora 8 (Linux) customized to our standards. The AMIs are identical for all our servers, but the EBS root filesystems, which change dynamically once a server is booted, are unique to each server.]
We also use EBS volumes for non-relational storage. For example, we have one large EBS volume for IT Conversations and other podcast filesystems. This holds all the audio files and images used on the website. We have another for SpokenWord.org, and so on. These EBS volumes are each mounted to one EC2 instance, which in turn shares them with the other servers via NFS. Finally, we use the Relational Database Service (RDS) for our MySQL databases. Like EBS, this is a true service as opposed to a “box” or physical server.
One very important feature of EBS is that you can take snapshots at any time. For example, we make a snapshot each night of each EBS volume. We keep all snapshots of all volumes (other than the EC2 root filesystems) for the past seven days, plus the weekly snapshots for the past four weeks and the monthly snapshots for the past year. The cost of keeping a snapshot is based only upon the incremental differences since the previous snapshot, so it’s quite a reasonable backup strategy even for large volumes so long as they don’t have changes that are both major and frequent.
Designing any server architecture, cloud-based or otherwise, requires that you consider the failure modes. What can fail? What will you lose when that happens? How will you recover? Automatically or manually? How long will recovery take for each failure mode? It’s not about eliminating failures — you can’t really do that. Rather, it’s about planning to deal with them. And like traditional architectures, the cost of the configuration increases geometrically as you increase the reliability (ie, decrease the amount of time it will take to recover from a failure).
We’ve been using AWS for more than four years. During the period when IT Conversations was part of GigaVox Media, we were the basis of one of the first case studies published by Amazon. [Here’s a diagram of one of our AWS-based configurations.] Because The Conversations Network (a non-profit) runs on a shoestring budget and can’t afford the level of redundancy deployed by some commercial enterprises (eg, SmugMug), we’re not looking for a particularly high-reliability architecture. Until last week, we’ve have EC2 instances that haven’t stopped in well over a year. We can’t tolerate any significant loss of data so we need the redundant storage of EBS, but a 99.9% uptime is good enough for us, and that’s what we’ve had from AWS until now. Because of our experience with the high-reliability of AWS, we have never gotten around to automating the re-launching of EC2 instances in case of failure. We do use two separate monitoring services, and there are two of us (me and Senior Sysadmin Tim) who are capable of restarting servers, etc., if something does go wrong.
AWS operates in five regions around the world. We happened to pick US East in Virginia instead of US West (northern California) for no particular reason. Within each region there are multiple physical locations called availability zones. These are probably separate data centers within a metropolitan area. The availability zones within a region are connected by very high-speed fiber. This means you can have some degree of geographic redundancy by deploying servers in multiple availability zones, or achieve even greater protection by also deploying duplicate systems in multiple regions. The latter is far more complex, since the connectivity between regions is not as good as between availability zones. Our needs are humble, so all of The Conversations Network EC2 instances, EBS volumes and RDS databases are located in the us-east-1a availability zone. And of course, that’s where last week’s failures occurred.
Amazon hasn’t yet said what the original failure was. All of our EC2 instances were running and they could communicate with the RDS databases. I think the problem might have been the association between the EC2 instances and the EBS volumes. The volumes used as root filesystems were reachable, but not the others that contained our site-specific files.
After a few hours of downtime, I decided to re-boot our EC2 instances and that’s when things went from bad to worse. All of our EC2 instances entered the Twilight Zone. They were stuck in the “stopping” state. The operating system halted (no SSH access) but the servers didn’t release their EBS volumes. I could have launched all-new EC2 instances, but I wouldn’t be able to connect them to the volumes and hence, no websites.
Because of our backup strategy, however, we did have one more option: We had snapshots of our EBS volumes. I could have created all-new EBS volumes from the daily snapshots, and I could have done so in a different availability zone to get away from the problems. But there was one gotcha. We make the backup snapshots at 2am Pacific time each night. The failure occurred 19 minutes before that, which means our snapshots lacked the most-recent 24 hours of activity: new programs, audio and image files, logs, etc. As with the few previous problems we’ve had with AWS (mostly of our own causing) we thought this outage would be fixed quickly. It was a tradeoff: It seemed better to wait an hour or two rather than to re-launch with day-old data.
Of course “an hour or two” dragged on. Soon the outage was 24 hours old; then 48. It always seemed that the fix was imminent, so we delayed the restart process. Eventually, we decided to go ahead, and that’s when we discovered our one real mistake. Remember that we make snapshots of our EBS volumes every night? Well it turned out that we weren’t making those snapshots of all of our volumes. There was one volume that we somehow missed. The only snapshot we had of that volume was from the date it was created, more than a year ago. That means we would have had to launch our sites with some very old data. In this case, when we finally got access to the most-recent data (on the in-limbo EBS volumes) it would be difficult to reconcile it all. In the end, we decided just to wait it out. Finally, after 64.5 hours, the one EC2 instance that was holding hostage our last EBS volume stopped. We were then able to re-attach that volume to a newly-launched instance. We brought up all-new EC2 instances, attached all the then-current volumes and we were up and running, still in availability zone us-east-1a.
So what did we learn from all this? We re-learned that you have to think through these architectures carefully and understand the failure modes. But most importantly, we learned that once you have a good plan, you have to follow through with it. If we had been making nightly snapshots of that one remaining EBS volume all along, we would have been able to re-start the websites with day-old data at any time, regardless of the problems AWS was having disconnecting EBS volumes from running EC2 instances.
I also have a new strategy for deciding when to stop waiting for AWS to recover and instead switch to the snapshots: Once the length of the outage exceeds the age of the backups, it makes more sense to switch to the backups. If the backups are six hours old, then after six hours of downtime, it makes sense to restart from backups. In this case, we should have done that after the first 24 hours.
But we still know we don’t have ultimate redundancy: We still have to re-start things manually. So long as we accept the downtime, we can survive the total failure of the us-east-1a availability zone and even the entire US East region. That’s because all EBS volumes are first replicated to multiple availability zones within the region, and our nightly snapshots are stores in Amazon’s Simple Storage Service (S3), which is replicated across multiple regions. So our current data can survive a failure within a region and our day-old data can survive a failure of our entire region.
We still have a few things to cleanup and repair from this experience, but all-in-all we remain fairly happy with how things turned out. We didn’t, after all, lose any data. And while we aren’t proud that our sites were down for nearly three days, the world as we knew it did not come to an end. Maybe our team is even glad to have a few days off. (Too bad we couldn’t have told them in advance.) We still have one EC2 instance that refuses to stop, but it’s one of those that used NFS to reach EBS volumes attached to another server. Amazon says “We’re working on it.” Other than that, we’re now better prepared for the next failure, so long as its just like this one. Actually, I think we’re in pretty good shape for most events I can foresee. AWS. It continues to be a great platform for us.
Over the past two months we’ve been discussing the future of SpokenWord.org with our advisors, directors and members. We now have a new plan for SpokenWord.org and we need your help.
The web is awash with audio and video. There are great programs out there, but they’re just too hard to separate from the noise. We created SpokenWord.org because we wanted to help people locate the best podcasts, videos and slideshows. We got the basics right — topics and collections — but our homepage in particular isn’t discriminating enough. Literally every five minutes we display the latest programs in each topic, but they’re not filtered. There’s little sense of what’s worth watching or listening to as opposed to just being “new”.
What’s missing is the human touch. For example, I’ve recently become obsessed with photography, and I’ve been looking everywhere for the best podcasts and videos to help me learn more. Along the way I’ve had to work my way through all sorts of junk in order to find the good stuff. If only there were a photography guru who would take the time to find the best podcasts and individual episodes for me. That would be awesome.
So that’s what we’re doing in SpokenWord.org 2.0. We’re building a team of expert curators, each with his or her own specialty. These curators will find the very best audio and video programs and use SpokenWord.org to present them to you. These curators and their collections will be the primary feature of our website.
Is there a topic you’re particularly passionate and knowledgeable about? Would you be willing to share your expertise by maintaining a curated list of feeds and episodes for SpokenWord.org? Would you like to become one of our curators?
There’s no monetary compensation for your effort, but I think you’ll be rewarded by the appreciation you receive and the credibility you’ll gain within your niche. We’re going to work hard to spread the word about SpokenWord.org and our curators, and I think being the SpokenWord.org curator for a particular topic will eventually carry some real weight.
We’re still early in the process of implementing the website features to support this new concept. In fact, the concept itself is still evolving. If you’re interested either in becoming a curator or just participating in the discussion of how our curation system will function, please join the brand-new Google Group dedicated to SpokenWord.org curation.
We’ll soon have a way for you to formally apply to become a curator, but for now, joining the discussion is the best way to get involved.
IT Conversations will be seven years old in three weeks, and as often happens at this time of year I find myself taking a step back from the day-to-day issues surrounding The Conversations Network to try and see the big picture. Where are we and where are we going?
I’ve published the Annual Report and assimilated the results from our annual survey of members as I do every year, but those only address the mostly tactical issues (How well are we doing what we’re already doing?) as opposed to the more strategic ones (What should we be doing?).
This time around I’m going to go through the process more publicly than usual, partly because blogging about it helps me organize my thoughts, but mostly because I want to get input from as many people as possible.
When I started IT Conversations in 2003 virtually no one else was posting free audio recordings of conferences, events and interviews. It was relatively hard to do, so I had to invent many of the tools, processes and even a suitable content-management system for high-volume audio post-production. Over the years this became known as podcasting and hundreds of thousands of people learned how to do it.
Two years ago with help from our Boards of Advisors and Directors I realized that podcasting and video had become so easy and ubiquitous that the needs of the larger community had shifted from “How do you do it?” to “How do you find it?” The discussions that followed led to the creation of SpokenWord.org, our site for finding and sharing audio and video podcasts.
But while SpokenWord.org now has metadata for over 640,000 audio and video programs from nearly 7,500 RSS feeds, it hasn’t really caught on in the way that IT Conversations did in those early years. Ask most geeks, and they’ve probably heard of IT Conversations. But aside from our 4,000+ registered members, virtually no on has ever heard of SpokenWord.org. Sure, we haven’t done much to promote it, but neither did we do so for IT Conversations. SpokenWord.org just isn’t solving a big enough problem for enough people to make it worth our user’s time and effort to tell someone else about it.
Taking stock, what are our assets and our strengths?
- We have an excellent team of 35 (active) part-time writers, producers and audio engineers who create IT Conversations, Social Innovation Conversations and CHI Conversations, and good processes for recruiting, training and management.
- We have excellent processes and technology for audio post-production, task allocation, content management and automated show assembly.
- We have a good metadata directory for audio/video programs and feeds with personal-collection features (SpokenWord.org).
- We have an archive of 2,500 of our own programs.
- We do this all for less than $35,000 per year.
- The growth of podcasting (not just ours) is flat.
- SpokenWord.org has a very small user base and in it’s current form isn’t solving any big problems.
Don’t get me wrong. The Conversations Network’s channels are the best podcasts on their topics and SpokenWord.org is a terrific resource for those who do use it. But I believe we can (and should) do a lot more with what we have.
The Conversations Network is a 501(c)3 non-profit, which implies a mission to benefit the public. So the question to you (staff, listeners, members and readers) is: What should we do next to continue that mission? I’ve got my own ideas, but I want to hear from you first.
Our annual survey of SpokenWord.org members included five essay-style questions. Here are some of the answers that don’t necessarily correlate with any consensus; they’re just the most interesting.
“How can we improve SpokenWord.org? (What’s the one thing you wish we did that we don’t already do?)” (53 answers)
- “Most popular” (today, this week, this month, ever) by category is a plus. [There was some consensus on this idea of per-category most-popular lists.]
- I find it confusing for reasons I can’t articulate. It’s not crystal clear exactly what I’m supposed to do. [I sense that’s true for many first-time visitors.]
- Ogg Vorbis content encoding option [We don’t control the encoding; that’s up to the publishers.]
- It would be great if SpokenWord.org could offer files from the Internet Archive. [Yes, we need to re-visit that idea.]
- Sync with any mp3 player. [We’ve published extensive APIs with the hope that others will pick up this ball and run with it.]
- Make discovery easier. I would also like a feed or a page that shows all new programs. [From many questions like this I get the feeling that people don’t realize that we get thousands of new programs every day.]
“What do you like most about SpokenWord.org?” (62 answers)
- The variety of content. [By far the most common response.]
- That it provides an open, public place to archive ratings data on podcasts.
- The ability to simplify the process of managing podcasts and subscribe to only a few collections in iTunes.
- One stop shopping and not iTunes-centric. [“Not iTunes” shows up frequently.]
“If you were running SpokenWord.org, what would you do to increase the number of people who use it?” (49 answers)
- Advertise [No budget!]
- Joint programs with schools, college and other educational institutions (younger people have larger social networks)
- Redesign the homepage.
- Try and get some influential technologists using it, such as Leo Laporte, Patrick Norton, Dave Winer, etc.
- Make it easy to post programs and collections on Facebook and Twitter.
“The Conversations Network is a U.S. 501(c)(3) non-profit public-benefit corporation. How can we make SpokenWord.org better fulfill its mission of service to the community?” (30 answers)
- Maybe some collaboration with PBS and NPR.
- Introduce it to other non-profit organizations that are doing a Podcast.
- Create an educational hub similar to iTunes U.
“Anything else you want to tell us?” (35 answers)
- Keep up the great work, and thanks for all you do.