The Conversations Network: Mission Accomplished

(The following letter was sent to all members of The Conversations Network earlier today, Sunday, August 16, 2012.)

Hello, Members of The Conversations Network!

It’s been a long time since I’ve sent out a Newsletter, but we’ve been working behind the scenes on some important changes here at The Conversations Network. We’ve been discussing these plans for the past two months with our Board of Directors, Executive Producers and Senior Managers. Channel-by-channel and site-by-site, here’s what we’re going to do.

  • Social Innovations Conversations will continue at the Center for Social Innovation at the Stanford Graduate School of Business. All existing programs will be migrated to CSI’s website, which is where all new episodes will appear.
  • CHI Conversations will return to its original home on the BayCHI web site.
  • IT Conversations production of new programs will cease around December 1.
  • SpokenWord.org will be shut down around December 1.

The remaining assets of the Conversations Network (cash and intellectual property) will be acquired by the Internet Archive, another U.S. 501(c)(3) non-profit organization. All existing programs will be moved to the Internet Archive where the world will be able to continue to listen to them for free.

Trying to anticipate some of the questions you may have:

  • We hope to preserve all existing URLs by running a “redirection server” for many years if not permanently.
  • We will stop accepting new and renewal membership dues and donations within 48 hours.
  • All monthly subscriptions will be canceled via PayPal within the next 48 hours.
  • Dues and donations already received will be used to help preserve the content (audio files and webpages) we’ve published over the past 9+ years.

So why are we doing this? A bit of history will help explain.

Our flagship channel, IT Conversations, was the second podcast ever published and today is still the longest running of all podcasts. In The Conversations Network’s nearly ten years we’ve published more than 3,300 programs on our three primary channels.

When we started this project, no one else was publishing free audio from conferences or other events. We were the first to stream live tech-conference audio and the first to offer recordings of conference sessions as free podcasts.

We created the Levelator software to standardize audio levels. It’s now in common use by podcasters and broadcasters worldwide and has been downloaded more than 350,000 times.

SpokenWord.org, our a metadata/search site for all audio and video recordings of spoken-word content, has cataloged more than 1.5 million audio and video programs.

Most significantly, we pioneered the concept of a worldwide distributed team of part-time (essentially volunteer) writers, audio engineers and producers to publish broadcast-quality programs. Since 2003, 215 people in all corners of the planet have been members of TeamITC. They are the real force behind what you see and hear on The Conversations Network.

And we’ve done it all on a shoestring budget thanks to our contributing members, content providers, underwriters and Limelight Networks, our long-time content-delivery partner.

We’re proud of what we’ve accomplished. Much of what we’ve pioneered in the past ten years is now commonplace. Our goal was to make it easy for others to produce audio recordings of events and make them available to the world for free. That’s now the norm. We have succeeded.

We’ve helped event producers and podcasters to create and publish programs themselves, and increasingly that’s what they’re doing. There simply isn’t as great a need for a service like The Conversations Network. So we’ve decided to complete our mission by helping our remaining partners continue their podcasts on their own websites.

If you have any questions about these changes, feel free to reply publicly or privately. The best place for your public comments is here on my personal blog.

Thanks again for listening and for your support of The Conversations Network.

…doug

Doug Kaye, Executive Director
The Conversations Network
A 501(c)(3) Non-Profit
doug@rds.com
twitter (DougKaye)
facebook (doug.kaye)
google plus

Happy Birthday, The Conversations Network

Yesterday was the 8th anniversary of IT Conversations, the longest running podcast in existence and the flagship channel of The Conversations Network. Since its founding, The Conversations Network has published 2,918 audio programs for an average of one every day for these past eight years.

Thanks to our members,major supporters and TeamITC, the wonderful folks you never hear about that bring you those new programs every day.

The Amazon Web Services (AWS) Outage

Like many other sites hosted on AWS, all of The Conversations Network’s websites went down at 1:41am PDT on April 22, 2011. It would be 64.5 hours until our sites and other servers would be fully restored. A lot has been written about this outage, and I’m sure there’s more to come. Don MacAskill, another early adopter of AWS, has posted a good explanation of SmugMug’s experiences during the outage.  Phil Windley and I are hoping to interview our friend Jeff Barr from AWS for Phil’s Technometria podcast once the dust has settled at Amazon.

Many pundits have suggested this event highlights a fundamental flaw in the concept of cloud computing. Others have forecast doom and gloom for AWS in particular. I disagree with both arguments. While it certainly was the most significant failure of cloud computing to date, I predict this event will become not much more than a course correction and a “teachable moment” for Amazon, their competitors, all cloud architects and of course us here at The Conversations Network. For the geeks in the audience, I’m going to describe our architecture, the AWS services we utilize, and give a bit of an explanation about what happened and what we learned.

The Conversations Network utilizes three basic AWS services, plus a few more that aren’t really pertinent to this episode. Our servers are actually instances of AWS Elastic Compute Cloud (EC2) servers. The root filesystem for each server is stored in a small (15GB) AWS Elastic Block Storage (EBS) volume. Not only are these volumes faster than local storage, they’re also persistent. So if/when an EC2 instance stops, the root filesystem for that instance remains intact and will continue to be usable if the instance is re-started. [EC2 instances are booted from Amazon Machine Images (AMIs). In our case, these are based on Fedora 8 (Linux) customized to our standards. The AMIs are identical for all our servers, but the EBS root filesystems, which change dynamically once a server is booted, are unique to each server.]

We also use EBS volumes for non-relational storage. For example, we have one large EBS volume for IT Conversations and other podcast filesystems. This holds all the audio files and images used on the website. We have another for SpokenWord.org, and so on. These EBS volumes are each mounted to one EC2 instance, which in turn shares them with the other servers via NFS. Finally, we use the Relational Database Service (RDS) for our MySQL databases. Like EBS, this is a true service as opposed to a “box” or physical server.

One very important feature of EBS is that you can take snapshots at any time. For example, we make a snapshot each night of each EBS volume. We keep all snapshots of all volumes (other than the EC2 root filesystems) for the past seven days, plus the weekly snapshots for the past four weeks and the monthly snapshots for the past year. The cost of keeping a snapshot is based only upon the incremental differences since the previous snapshot, so it’s quite a reasonable backup strategy even for large volumes so long as they don’t have changes that are both major and frequent.

Designing any server architecture, cloud-based or otherwise, requires that you consider the failure modes. What can fail? What will you lose when that happens? How will you recover? Automatically or manually? How long will recovery take for each failure mode? It’s not about eliminating failures — you can’t really do that. Rather, it’s about planning to deal with them. And like traditional architectures, the cost of the configuration increases geometrically as you increase the reliability (ie, decrease the amount of time it will take to recover from a failure).

We’ve been using AWS for more than four years. During the period when IT Conversations was part of GigaVox Media, we were the basis of one of the first case studies published by Amazon. [Here’s a diagram of one of our AWS-based configurations.] Because The Conversations Network (a non-profit) runs on a shoestring budget and can’t afford the level of redundancy deployed by some commercial enterprises (eg, SmugMug), we’re not looking for a particularly high-reliability architecture. Until last week, we’ve have EC2 instances that haven’t stopped in well over a year. We can’t tolerate any significant loss of data so we need the redundant storage of EBS, but a 99.9% uptime is good enough for us, and that’s what we’ve had from AWS until now. Because of our experience with the high-reliability of AWS, we have never gotten around to automating the re-launching of EC2 instances in case of failure. We do use two separate monitoring services, and there are two of us (me and Senior Sysadmin Tim) who are capable of restarting servers, etc., if something does go wrong.

AWS operates in five regions around the world. We happened to pick US East in Virginia instead of US West (northern California) for no particular reason. Within each region there are multiple physical locations called availability zones. These are probably separate data centers within a metropolitan area. The availability zones within a region are connected by very high-speed fiber. This means you can have some degree of geographic redundancy by deploying servers in multiple availability zones, or achieve even greater protection by also deploying duplicate systems in multiple regions. The latter is far more complex, since the connectivity between regions is not as good as between availability zones. Our needs are humble, so all of The Conversations Network EC2 instances, EBS volumes and RDS databases are located in the us-east-1a availability zone. And of course, that’s where last week’s failures occurred.

Amazon hasn’t yet said what the original failure was. All of our EC2 instances were running and they could communicate with the RDS databases. I think the problem might have been the association between the EC2 instances and the EBS volumes. The volumes used as root filesystems were reachable, but not the others that contained our site-specific files.

After a few hours of downtime, I decided to re-boot our EC2 instances and that’s when things went from bad to worse. All of our EC2 instances entered the Twilight Zone. They were stuck in the “stopping” state. The operating system halted (no SSH access) but the servers didn’t release their EBS volumes. I could have launched all-new EC2 instances, but I wouldn’t be able to connect them to the volumes and hence, no websites.

Because of our backup strategy, however, we did have one more option: We had snapshots of our EBS volumes. I could have created all-new EBS volumes from the daily snapshots, and I could have done so in a different availability zone to get away from the problems. But there was one gotcha. We make the backup snapshots at 2am Pacific time each night. The failure occurred 19 minutes before that, which means our snapshots lacked the most-recent 24 hours of activity: new programs, audio and image files, logs, etc. As with the few previous problems we’ve had with AWS (mostly of our own causing) we thought this outage would be fixed quickly. It was a tradeoff: It seemed better to wait an hour or two rather than to re-launch with day-old data.

Of course “an hour or two” dragged on. Soon the outage was 24 hours old; then 48. It always seemed that the fix was imminent, so we delayed the restart process. Eventually, we decided to go ahead, and that’s when we discovered our one real mistake. Remember that we make snapshots of our EBS volumes every night? Well it turned out that we weren’t making those snapshots of all of our volumes. There was one volume that we somehow missed. The only snapshot we had of that volume was from the date it was created, more than a year ago. That means we would have had to launch our sites with some very old data. In this case, when we finally got access to the most-recent data (on the in-limbo EBS volumes) it would be difficult to reconcile it all. In the end, we decided just to wait it out. Finally, after 64.5 hours, the one EC2 instance that was holding hostage our last EBS volume stopped. We were then able to re-attach that volume to a newly-launched instance. We brought up all-new EC2 instances, attached all the then-current volumes and we were up and running, still in availability zone us-east-1a.

So what did we learn from all this? We re-learned that you have to think through these architectures carefully and understand the failure modes. But most importantly, we learned that once you have a good plan, you have to follow through with it. If we had been making nightly snapshots of that one remaining EBS volume all along, we would have been able to re-start the websites with day-old data at any time, regardless of the problems AWS was having disconnecting EBS volumes from running EC2 instances.

I also have a new strategy for deciding when to stop waiting for AWS to recover and instead switch to the snapshots: Once the length of the outage exceeds the age of the backups, it makes more sense to switch to the backups. If the backups are six hours old, then after six hours of downtime, it makes sense to restart from backups. In this case, we should have done that after the first 24 hours.

But we still know we don’t have ultimate redundancy: We still have to re-start things manually. So long as we accept the downtime, we can survive the total failure of the us-east-1a availability zone and even the entire US East region. That’s because all EBS volumes are first replicated to multiple availability zones within the region, and our nightly snapshots are stores in Amazon’s Simple Storage Service (S3), which is replicated across multiple regions. So our current data can survive a failure within a region and our day-old data can survive a failure of our entire region.

We still have a few things to cleanup and repair from this experience, but all-in-all we remain fairly happy with how things turned out. We didn’t, after all, lose any data. And while we aren’t proud that our sites were down for nearly three days, the world as we knew it did not come to an end. Maybe our team is even glad to have a few days off. (Too bad we couldn’t have told them in advance.) We still have one EC2 instance that refuses to stop, but it’s one of those that used NFS to reach EBS volumes attached to another server. Amazon says “We’re working on it.” Other than that, we’re now better prepared for the next failure, so long as its just like this one. Actually, I think we’re in pretty good shape for most events I can foresee. AWS. It continues to be a great platform for us.

PodCorps.org is Closing

podcorpsThree years ago The Conversations Network launched the PodCorps.org website, a place to match producers with audio and video stringers around the world. Nearly 1,000 stringers have joined PodCorps.org, but the website has not achieved the kind of critical mass required to make it a success in anyone’s book. We have therefore decided to close the PodCorps.org website as of July 5, 2010.

The reason we failed to reach that critical mass is rather straightforward: We are spread too thin among multiple projects and didn’t commit the resources required for PodCorps.org’s success. The Conversations Network has a very small budget and depends entirely on volunteers. And while many people supported the concept by registering on the website, we were not able to recruit a volunteer team to manage and promote PodCorps.org.

I want to personally thank everyone who registered for their participation and support of the PodCorps.org concept. I only wish we had the resources to fulfill our side of the bargain. The Conversations Network’s other projects (SpokenWord.org and our proprietary podcast channels) get all of our attention and are doing quite well, but we need to accept our limitations in order to ensure our successful projects continue without distraction.

MEDIAmobz: An Introduction

For those of you in the video world, I want to use this opportunity to introduce a somehwat different alternative to PodCorps.org. We have a long standing friendship with a for-profit company called MEDIAmobz. They have a network of producers that provide video production services for the business market via partners such as Business Wire and Cisco. As PodCorps.org is closing, we thought you might want to sign up with MEDIAmobz as a way to find video production jobs around the world.

Dave Toole, founder and CEO of MEDIAmobz passed along this note:

“Thanks for considering joining our producer community at MEDIAmobz. We provide you free tools to post your video reels and links to your work to help market your capabilities to the business market. We have provided dozens of clients turn key video solutions for business story telling. We do not charge clients to post jobs and only charge a small fee when they have agreed to hire a production resource. We hope that we are able to help provide an easier way for clients to connect with creative resources to help them tell their story. Please have a look around and let us know what we can do to help you in providing your services.”

Public Media Opportunities

For those of you interested in public radio or TV in the U.S., here are some additional related sites you should check out:

Taking a Step Back

IT Conversations will be seven years old in three weeks, and as often happens at this time of year I find myself taking a step back from the day-to-day issues surrounding The Conversations Network to try and see the big picture. Where are we and where are we going?

I’ve published the Annual Report and assimilated the results from our annual survey of members as I do every year, but those only address the mostly tactical issues (How well are we doing what we’re already doing?) as opposed to the more strategic ones (What should we be doing?).

This time around I’m going to go through the process more publicly than usual, partly because blogging about it helps me organize my thoughts, but mostly because I want to get input from as many people as possible.

When I started IT Conversations in 2003 virtually no one else was posting free audio recordings of conferences, events and interviews. It was relatively hard to do, so I had to invent many of the tools, processes and even a suitable content-management system for high-volume audio post-production. Over the years this became known as podcasting and hundreds of thousands of people learned how to do it.

Two years ago with help from our Boards of Advisors and Directors I realized that podcasting and video had become so easy and ubiquitous that the needs of the larger community had shifted from “How do you do it?” to “How do you find it?” The discussions that followed led to the creation of SpokenWord.org, our site for finding and sharing audio and video podcasts.

But while SpokenWord.org now has metadata for over 640,000 audio and video programs from nearly 7,500 RSS feeds, it hasn’t really caught on in the way that IT Conversations did in those early years. Ask most geeks, and they’ve probably heard of IT Conversations. But aside from our 4,000+ registered members, virtually no on has ever heard of SpokenWord.org. Sure, we haven’t done much to promote it, but neither did we do so for IT Conversations. SpokenWord.org just isn’t solving a big enough problem for enough people to make it worth our user’s time and effort to tell someone else about it.

Taking stock, what are our assets and our strengths?

  1. We have an excellent team of 35 (active) part-time writers, producers and audio engineers who create IT Conversations, Social Innovation Conversations and CHI Conversations, and good processes for recruiting, training and management.
  2. We have excellent processes and technology for audio post-production, task allocation, content management and automated show assembly.
  3. We have a good metadata directory for audio/video programs and feeds with personal-collection features (SpokenWord.org).
  4. We have an archive of 2,500 of our own programs.
  5. We do this all for less than $35,000 per year.

And weaknesses?

  1. The growth of podcasting (not just ours) is flat.
  2. SpokenWord.org has a very small user base and in it’s current form isn’t solving any big problems.

Don’t get me wrong. The Conversations Network’s channels are the best podcasts on their topics and SpokenWord.org is a terrific resource for those who do use it. But I believe we can (and should) do a lot more with what we have.

The Conversations Network is a 501(c)3 non-profit, which implies a mission to benefit the public. So the question to you (staff, listeners, members and readers) is: What should we do next to continue that mission? I’ve got my own ideas, but I want to hear from you first.

The Conversations Network — Annual Survey Results

A few weeks ago we published the results of the annual survey of SpokenWord.org members. Beginning with this post, I’ll be blogging the results of the larger survey of all registered members of The Conversations Network. While SpokenWord.org is our 14-month old site for finding and sharing all spoken-word audio and video, The Conversations Network survey covers our proprietary channels: IT Conversations, Social Innovation Conversations and CHI Conversations. As of this date, the new survey has been completed by 302 members versus 461 last year. (The new survey is still open.) Note that last year’s data are shown in [square brackets].

How do you listen to The Conversations Network?

How do you listen to podcasts?

  • Computer: 56% [57%]
  • Portable device: 76% [75%]
  • Burn to CD/DVD: 7% [6%]

Do you subscribe to one or more of our RSS feeds?

  • Yes: 60% [83%] This question was asked in a somewhat different manner last year.

For those who subscribe to RSS feeds, what software or service do you use?

  • Google Reader/FeedFetcher: 25% [25%]
  • iTunes/Mac: 24% [26%]
  • iTunes/Windows: 23% [22%]
  • other: 53% [52%]

Are you an Audible.com customer?

  • Current customer: 15% [14%]
  • Never been a customer: 66% [72%]
  • Used to be a customer: 19% [14%]

How many programs on The Conversations Network have you heard in the past month?

  • None: 17% [11%]
  • 1: 15% [7%]
  • 2-5: 40% [37%]
  • 6-25: 25% [37%]
  • 26 or more: 2% [8%]

Tomorrow: Social Innovation Conversations