LibriVox.org

In about an hour from now the SpokenWord.org servers will have ingested 37,897 new programs from 2,011 RSS feeds. They’re all from LibriVox.org, an awesome and fast-growing collection of volunteer-read public-domain books. More than 2,000 of them! Special thanks to Huch McGuire and Chris Goringe for their help with this wholesale addition to our database and for creating and operating LibriVox.org.

Importing LibriVox.org is a great pre-beta test of the scalability of many components of SpokenWord.org. There’s one peculiar MySQL oddity that has already been triggered by this process. I used MySQL’s fulltext search to search programs by iTunes category. All of the LibriVox.org programs are in the category Arts:Literature. As of fifteen minutes ago — the ingestion is still running — the LibrVox.org programs exceeded 50% of all programs in our database. The quirk is that MySQL can’t find any words if more than half of the rows in the table contain that word. That’s why the top-level category Arts currently shows (0) programs. Ih fact, there are nearly 40,000. I’ve got a kludgy workaround in mind that I may implement until such time as LibriVox.org once again accounts for less than half of the programs in the system. That’s going to take a while since there are new chapters being added to LibriVox.org every day.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s