Bad RSS

The greatest challenge in keeping SpokenWord.org running on a daily basis is dealing with rogue RSS feeds. We’ve got a bit over 3,000 feeds at the moment, most of which are being scanned every hour. But I just checked the admin report, and 27 feeds (nearly 1%) have been disabled for one reason or another.

For those of you in control of your feeds, here are some of the problems we encounter on a regular basis.

  • HTTP 404 errors. If your server isn’t accessible, we can’t read your feed.
  • Invalid characters. One bad character in your feed keeps our parser from reading the whole thing.
  • Missing GUIDs. Globally Unique IDs (GUIDs) are very important.
  • Duplicate GUIDs. (They’re supposed to be Unique!)
  • Incorrect MIME types. Should be:
    • application/rss+xml
    • application/atom+xml
    • application/xml
  • The following are common, but they’re wrong:
    • text/xml
    • text/plain
    • text/html

The GUID issues deserve more discussion. When you rescan feeds every hour, one of bigest challenges is to figure out if an <item> is old, new or modified. Here’s our logic:

  • If we’ve never seen this GUID before, we assume it’s a new <item>.
  • If we’ve already ingested an <item> with this GUID, we check all the pertinent elements and attributes for changes.

The GUID allows you to make changes like correcting a spelling error in a title. We see the unchanged GUID, notice that the title has changed, and just replace the title. Without the GUID, we have a helluva time trying to figure out whether an <item> with a one-character change in its title is just that or a whole-new program. We want you to be able to correct your titles, descriptions and media URLs without our system creating a duplicate program. Only your proper use of GUIDs makes that possible.

Once you assign a GUID to a program, never change it. That means never. And make sure your GUIDS are truly globally unique. Using a unique URL from your site as a GUID is a good way to do this. No other site is likely to include http://yourdomain in their GUIDs. And never, never, never reuse a GUID for another program. You’d be amazed at the number of feeds that include the same GUID for more than one <item>. I’ve designed our system to immediately disable any feed in which a duplicate GUID is detected.

As a somewhat defensive move, but also to help those who submit RSS/Atom feeds to SpokenWord.org, I’ve added code that runs submitted feeds through the W3C RSS Validator. We’ll accept Warnings, but if your feed generates Errors from the validator, we will reject it. My next step is to likewise call the W3C validator when we encounter a problem and to after-the-fact disable feeds that don’t validate.

Fresh Hot Radio Replaces Muzak

Okay, so Lucas is gonna kill me for the comparison, but is it just coincidence that only three weeks after Muzak filed for bankrupcy, Lucas Gonze launched his new site, Fresh Hot Radio? I think not. Lucas is one of those true web pioneers who’s always a few steps ahead of the rest of us, so I’m not sure I yet understand all the implications of his new site. It’s subversively simple. Open it in a web browser tab or new window and listen to the music. That’s it. Options? Nope. It’s not a stream — all the files are played using Flash from their original locations — but it plays like one, except you can pause and skip. It’s Lucas’ personally curated music. There’s no social-networking facility. It’s just for you. You can, however, click through to the source of each song to explore further. Think of discovering new up-tempo music and he artists who create it.

I look forward to the day when all elevators are playing Fresh Hot Radio instead of that Muzak stuff.

Using Kampyle.com

A few weeks ago we started using an online service, Kampyle.com, for all The Conversations Network’s web sites including SpokenWord.org. Kampyle.com is one of those services like Google Analytics and ShareThis.com: They do one relatively small thing and they do it very well. In the case of Kampyle.com, it’s website or application-software feedback. On SpokenWord.org, for example, you’ll notice the yellow triangle that always floats in the lower-right corner of your browser. Click it, and you get a convenient form for sending us feedback, reporting a bug, etc. From the user’s perspective, it couldn’t be much easier. But the real magic is on our side. For example, here’s just some of the metadata we get from Kampyle.com when you report a problem:

For debugging a web site, this is invaluable. It typically saves us at least one complete email exchange with someone reporting a problem. No longer do we have to ask, “What OS and browser are you using?” Given that we’re still at the stage where we have a fair number of JavaScript and CSS problems, this alone has made deploying Kampyle.com worthwhile. In fact, I was initialy concerned that adding a floating widget to our pages would itself create CSS nightmares, but my fears have proven unfounded. We’re not getting ay complaints about it. And ever since we added the Kampyle.com widget, our website feedback has increased about 400%. I only wish we’d had it avaialble during our alpha-test. Very cool.

Podiobooks

We just added the entire audiobook catalog from Podiobooks.com to SpokenWord.org. That picked up 6,087 chapters from 284 books, with more being added every day. You’ll find one of the most-recent Podiobooks on our home page or you can browse the entire collection. Special thanks to Ray, Evo, Chris and Tee for creating a great site and for making it so easy to pull in their catalog.

Preliminary Survey Results

We’ve only been running our annual survey for a few days, but we’ve already had 389 responses. Some early highlights:

  • 47% use iTunes on OS X or Windows.
  • 63% subscribe to one or more of our RSS feeds.
  • 89% are male.
  • 41% have a Master’s degree or higher. (This has been consistent year after year and still surprises me.)
  • 55% are in the U.S.
  • 38% didn’t realize The Conversations Network was a 501(c)(3) nonprofit.