Spam Filtering

I have a very visible email address that must be in every possible spam database, and I receive literally thousands of spam messages each day. A few people have asked me to document how I deal with that. After years of experimenting, I’ve finally settled on a three-stage solution.

  1. All email for goes to a $19/month web-hosting account. I don’t use the hosting part — just the email. This service does three important things: (a) it ignores email to unknown addressees such as (we get a lot of that); (b) it supports an explicit whitelist and blacklist; and (c) messages addressed to legitimate addressees are run through SpamAssassin. The latter isn’t great, and it’s configured for fairly non-aggressive interpretation to avoid false positives, but it’s a decent first-round of defense. One important configuration option was to disable message bouncing. When 99% of your email is spam, you don’t want to be automatically replying to it. Let the bogus messages fall on the floor.
  2. From the web-hosting account, everything is forwarded to Gmail accounts. The quality of Gmail’s spam filtering varies greatly, although not so much recently. It used to be highly accurate, but I think that since it’s such a CPU-intensive activity that Google has had to settle on less-accurate filtering. I actually think they have the ability to throttle the quality of their spam filtering based upon their load and available processing power, but that’s purely speculative on my part. A side benefit of running everything through Gmail is that it provides automatic archiving and searching of email and allows for remote access. While not on the road, I retrieve mail from Gmail using POP3 and the standard OS X Mail program.
  3. The final step is a $30 utility called SpamSieve. It’s one of those Bayesian filter applications, and works very well. After training with a few hundred messages, it is quite accurate, and it’s also quite easy to use.

Not related to spam per se, I also use MailSteward Lite to archive old messages in a searchable database. I’ve kept every non-spam email message for the past 11 years or so, and it’s all in there — even those messages that survived the migration from Outlook on Windows to Mac Mail a few years ago, which I did with a marvelous $10 program called O2M. The only reason I use MailSteward Lite is that OS X Mail gets slow when the mailboxes contain many thousands of messages. The big disadvantage is that it’s not externally searchable, most notably not by Spotlight.

Yes, it was a lot of work to get to this point. I haven’t mentioned the many tools I’ve tried and abandoned. But I now have a configuration that works well and is easy to use. I recommend it to anyone who, like me, has a very visible email address and who gets a lot of spam.

Amazon Goes Wholesale

Amazon’s latest change in its pricing for Amazon Web Services (AWS) is an indication of their plans for the future. Specifically, it suggests that they’re focusing on customers who will aggregate/resell AWS services. First is the new tiered pricing for bandwidth, reduced from a flat rate of $0.20/GB downloaded to as little as $0.13/GB over 50TB. For the first time this offers a margin forĀ  aggregators. Second is the segregation of charges for HTTP requests: $0.01 per 10,000 GETs, for example. (10x as much for PUT or LIST requests.) The implication is that Amazon’s per-transaction requests are the issue when the payloads are small. But just breaking out these costs reminds us that AWS isn’t an end-user retail offering. Can you imagine the average web-site owner trying to understand what this is all about? Amazon understands that their market is for developers and sophisticated resellers.

Writers and Audio Engineers Wanted

Like (our grassroots project) the “curated” side
of The Conversations Network is also run by volunteer
producers. From time to time we invite new audio engineers and
writer/editors to join the team, and this is one of those

This is a great way to get more involved in the non-profit
world of podcasting, and because we use the same platform,
you’ll also have a chance to do some paid work for our for-
profit sister company, GigaVox Media, producers of IT
Conversations and the Podcast Academy channel. Wants You is off to a great start with 143 stringers registered so far. But we need everyone’s help to achieve our goal of 1,000. Please help us spread the word in your blog, podcast, videocast, bar converations, etc.

We’ve got a cool map showing where all our stringers are located. We’re particularly in need of more video folks.