Taxonomic Challenges

Over on SpokenWord.org we started with a set of “source” categories such as Conference, Interview, Lecture, Sermon and so on. These categories turned out to be rather useless since very few visitors really cared whether a recording was from a conference or a lecture, for example. What they cared about was whether it was about chemistry or China, which this taxonomy didn’t address.

Next we decided to go with a free-form tagging folksonomy as do many other content sites. For better or worse, we have a semi-automated source of tags: the <keyword> elements of the RSS feeds that supply most of our new programs. Tagging has worked quite well as a search mechanism: a way to actively find content. You can now search for chemistry or China and get reasonable results.

But we also want to present content in a more traditional manner. We want to proactively feature programs (particularly on the home page) in ways that will encourage first-time visitors to listen and view. So we’re thinking of re-instituting a taxonomy of categories in addition to our tags. Now comes the challenge of defining the categories. Here’s the taxonomy we have so far. We want to keep the count to no more than fifteen, so we need to combine where possible, but we want to make sure any spoken-word content fits into at least one category appropriately.

  • business and finance
  • science and technology
  • health and medicine
  • education
  • arts, entertainment, media and literature
  • energy/environment
  • food and drink
  • religion
  • government and politics (current affairs?)
  • sports, recreation & hobbies
  • travel/history
  • comedy (humor)

Anything missing? Remember, these are topical categories, not sources, media, etc.

Update: Here’s another option. We could simply adopt the categories used by iTunes for podcasts. It’s not perfect, but it has the advantage that all of our collections and feeds would be guaranteed compatible with iTunes’ taxonomy. Here’s the list from Apple:

  • Arts
    • Design
    • Fashion & Beauty
    • Food
    • Literature
    • Performing Arts
    • Visual Arts
  • Business
    • Business News
    • Careers
    • Investing
    • Management & Marketing
    • Shopping
  • Comedy
  • Education
    • Education Technology
    • Higher Education
    • K-12
    • Language Courses
    • Training
  • Games & Hobbies
    • Automotive
    • Aviation
    • Hobbies
    • Other Games
    • Video Games

  • Government & Organizations
    • Local
    • National
    • Non-Profit
    • Regional
  • Health
    • Alternative Health
    • Fitness & Nutrition
    • Self-Help
    • Sexuality
  • Kids & Family
  • Music
  • News & Politics
  • Religion & Spirituality
    • Buddhism
    • Christianity
    • Hinduism
    • Islam
    • Judaism
    • Other
    • Spirituality

  • Science & Medicine
    • Medicine
    • Natural Sciences
    • Social Sciences
  • Society & Culture
    • History
    • Personal Journals
    • Philosophy
    • Places & Travel
  • Sports & Recreation
    • Amateur
    • College & High School
    • Outdoor
    • Professional
  • Technology
    • Gadgets
    • Tech News
    • Podcasting
    • Software How-To
  • TV & Film

A Liberal Against a Detroit Bailout

I find myself siding with the Repulicans on this one. Sorta weird. Tom Friedman has it right. I can’t see a good reason why we should put taxpayers’ dollars into a dying industry. GM, Ford and Chrysler’s management have done a miserable job, and unless they go through a serious shakeup such as a Chapter 11 bankruptcy, they shouldn’t continue to exist. The writing is on the wall for them. The Emperor has no clothes. As far as investors, lenders — I am one, through funds — and management, they deserve to suffer the consequences of how these companies have been run. The only consituents who may be entiteld to taxpayer assistance are the autoworkers and employees (not executives) of the small suppliers.

This brings up the union, healthcare and pension issues. Messy, to say the least. Personally, I’ve had a love/hate relationship with unions. I believe in the basic concepts of collective bargaining and I recognize that without the ability of employees to organize, employers will exploit them unfairly. But while the major U.S. unions have done an admirable job of growing benefits for their members, there now exist inequities in the benefits and pensions between union and non-union workers in this country. True, the UAW has accepted some concessions in recent years, but the fact is that GM and others are under a tremendous burden in supporting their former employees. This, by the way, is what the Republicans are thinking but not saying. By withholding from Detroit another $25 billion, they’re fostering union-busting through the bankruptcy process.

Although I’m not anti-union, this could ultimately be a good thing. Rather then spending billions on propping up the corporations, I’d like to see Obama and the Congress take this as an opportunity to start providing universal healthcare for all (not just out-of-work auto workers) and beefing up the Social Security System. I think we’re the only country in the world that ties healthcare to employment, which is nuts. And we’ve all seen what will happen if we continue down the Republican path of increased privitazion of retirement benefits.

Let the Big Three go into Bankruptcy. That’s what it’s for. There’s a process that has been tweaked for decades as opposed to the Paulson/Bernanke methodology of writing checks without adequate conditions and then seeing what works and what doesn’t. Let the old and broken institutions crumble. Only then can we get to the bottom and build a more honest and sustainable world. Avoiding the inevitable never works, by definition.

Amazon CloudFront

For the past three months we’ve been beta-testing a new Amazon web service now named CloudFront. The best way to think of CloudFront is a high-performance front end for Amazon’s S3, based upon edge servers located closer to your web site’s visitors.

I’ve been favorably impressed with the new service. To try it out, I went for the low-hanging fruit by simply changing delivery of our CSS and JavaScript files to CloudFront. Performance-wise, these are our most-critical files because browsers run single-threaded while fetching and processing CSS/JS files. After the change, the download speeds of these files fluctuated between 3x and 4x faster than when delivered from our dedicated servers at The Planet in Texas. The key, in looking at the network histograms, is the all-important ‘first-byte delivery time.’ Net improvement: ~750 milliseconds for the load of any of our pages, based on measurements here, 12 miles north of San Francisco. The entire change took only about 15 minutes of effort, including creating a new S3 bucket, copying the files, modifying our code — all the changes were in one file — and establishing a new CNAME, which is optional.

Amazon calls CloudFront a “web service for content delivery,” which isn’t quite the same thing as a content-delivery network (CDN). The difference (for us) is that CloudFront doesn’t (yet?) operate as a pure cache, running off our “origin server” in the same way as we deliver our media files via Limelight Networks, a true CDN. In the case of Limelight, we just maintain the files on our own server, setup a CNAME that refers to Limelight’s edge servers and that’s it. When we add or modify a file on the origin server, that’s all we have to do. Limelight instantly (and I mean that literally) begins to deliver the new version worldwide. We don’t have to do anything manual or otherwise to keep the CDN copies of our files fresh. In the case of CloudFront, you still have to take certain actions (which could be automated, of course) to get new and updated assets from your primary servers pushed to their edge servers.

But while CloudFront may not be a pure CDN at this time, it’s extraordinarily cost-effective. It’s a no-brainer way to speed up almost any web site. For those assets like CSS, JavaScript files, frequently used images, icons, etc., the performance is as good as any CDN I’ve used but at a fraction of the cost. Pricing has two components. For assets served from U.S. edge locations:

  1. $0.170/GB data transfer out
  2. $0.010 per 1,000 GET requests

Charges are lower as volume increases, but higher for delivery from their European and Asian edge locations.

(Aside: One thing I love about all of the AWS services is that by publishing their prices so clearly, they set a very public bar against which all other providers are instantly measured. This happened with S3, and it’s going to happen with CloudFront. Pricing of storage, hosting, servers and now content delivery was previously mysterious and highly negotiable — like by an order of magnitude. AWS has brought transparency to the world of web-service pricing.)

Consider, too, that CloudFront is a completely self-service offering with no minimums, setup costs or hassles once you’re into the whole AWS world. As far as reliability, we never had a single failure or outage that I’m aware of during the entire three-month test period.

Highest-Rated Programs

Over the weekend I added a Highest Rated tab to the SpokenWord.org homepage. This is something I’ve done before on sites like IT Conversations, and it’s always a challenge. On one hand you want the feature to honestly display the highest-rated programs, but on the other hand you don’t want the list to get stale. You want to avoid the situation in which the most-popular items become increasingly popular and lock themselves into the top slots.

Working with my personal on-call mathematician, Bruce Sharpe, I’ve implemented an algorithm that is at least a good first cut. There are a number of tweakable parameters that have yet to be tweaked. The concept is to discount ratings by two factors: (1) discount each individual rating by the age of that rating; and (2) discount the adjusted average rating by the inverse of the number of ratings the object has received. Highest Rated is therefore influenced by (but not the same as) a popularity index.

At Bruce’s urging, I’m using the tanh() (hyperbolic tangent) function to determine the curves for both discounting formulas. In about 34 years of writing code I can honestly say that’s a first for me. I once wrote an entire floating-point runtime library in assembler language — yeah, that’s a challenge! — but I’ve never had much need for those trig functions myself.

The Highest Rated tab on the homepage currently shows too many programs from IT Conversations because of the recovery from a recent database coding error (mine), but over the next few days as the ratings age, the fairness of algorithms should kick in yielding more valuable data.

What About the Rental Option?

All the plans we’ve heard to date for solving the housing portion of the financial crisis are focused on keeping people in their homes by reducing the costs of mortgages until even the unemployed can afford one. That’s the kind of populist thinking that got us into this mess in the first place.

Let’s be honest: Not everyone should be a homeowner. Regardless of whom you want to blame for how we got here, some of us are facing mortgage payments we’ll never be able to make even under renegotiated terms and reduced interest rates. Even in what Conservatives call an Ownership Society, those without the cash flow necessary to build equity are better off as tenants rather than be burdened with the debt of ownership.

Instead of the government purchasing bad loans, as Senator McCain once suggested, or buying up the loan derivatives, as the Paulson plan originally intended, or just handing money to financial institutions for them to use for “whatever,” let’s create a program similar to the depression-era Reconstruction Finance Corporation (RFC) to federally fund state and local governments to acquire the underlying properties of defaulted loans at a steep discount and then turn around and rent those homes to the current occupants. Besides, it looks as though it’s going to be impossible to refinance any of the securitized loans. They’ve been bundled, chopped into traunches, then bundled again and there’s no way to figure out who holds which mortgages. (I’ll bet that’s something that won’t be permitted once the dust settles from all of this.) Regardless of who they are, the mortgage holders (lenders, insurers and hedge funds) will feel some of the pain for their indiscretions, but it will stabilize and put a stop to their losses, allowing the credit markets to finally move forward.

People who stand to lose their homes and who would otherwise be out on the street become renters, which of course they should have been all along. This eliminates the problem of those homeowners who continue to pay their mortgages feeling like their neighbors are receiving an unfair bailout. And setting a value on the real estate is far simpler than trying to find the fair price to pay for credit-default swaps on securitized loans. The federal government would set standards for the program and provide oversight.

With an average pre-slump U.S. home price of $215,000 and a 25% discount, $700 billion allows us to acquire nearly four million homes, even including a 10% cost to administer the program. Why fund state and local governments? Because the closer you get to the properties, the better a landlord you can be. Yes, as landlords the cities and states will have to manage and maintain these homes, but that’s much easier at the local level than from Washington. We can learn a lot from both the strengths and weaknesses of the RFC, created in 1932 and rolled into Treasury in 1953.

To make sure our governments don’t stay in the real-estate business for the long term, after a two-year cooling-off period, the homes would first be offered for sale to the then-current occupants, then auctioned randomly over a five-year period to avoid further depressing the market with a sudden glut of even more homes for sale.

We won’t solve the housing crisis so long as we pretend that families who can barely make ends meet can afford the increased burden of building equity. It’s those “affordable” but unrealistic zero-down principal-only loans are what got us into this situation. So long as we pretend that we can make home ownership inexpensive enough for everyone, we’ll never dig our way out of this hole. Allowing people to rent the homes they currently occupy not only keeps roofs over their heads, it’s also simple (as compared to other options) and solves our housing-crisis problems directly. It removes the bad-loan problem from the books of financial instututions without rewarding them for their misconduct.

(Seven weeks ago I blogged a draft of this idea, which I followed up with an op-ed submission to the NY Times. Of course, lacking a Nobel Prize, I wasn’t likely to be successful, but I had to try, right? The above is an updated version of the article I submitted to The Times.)

The First Bit of Magic

I think I just added the first piece of magic to SpokenWord.org. If you click on the Recently Collected tab, you’ll see a list of the programs most-recently added by members to their collections. But click on the numbers under the images and you’ll see *which* collections those are. Why is this magical? Because that’s the way you’ll find “more like this” — other programs explicitly collected by other members. There’s a lot more of this to come, but this is the first step.

Homepage Experiment

Spending an hour on a Skype call with Bruce Sharpe last night gave me some good ideas about the SpokenWord.org homepage. We looked at a few sites together, and with Bruce’s encouragement I decided that SoundFlavor has a lot of good ideas on their homepage. So in the past 24 hours I’ve completely re-done the SpokenWord.org homepage, lifting many ideas from SoundFlavor and other sites. Yeah, the colors are still awful and I can’t design (or use Photoshop) worth a damn, but I think it’s a whole lot better for first-time visitors without sacrificing usability for our experts.