Spam Filtering

I have a very visible email address that must be in every possible spam database, and I receive literally thousands of spam messages each day. A few people have asked me to document how I deal with that. After years of experimenting, I’ve finally settled on a three-stage solution.

  1. All email for rds.com goes to a $19/month web-hosting account. I don’t use the hosting part — just the email. This service does three important things: (a) it ignores email to unknown addressees such as somebody@rds.com (we get a lot of that); (b) it supports an explicit whitelist and blacklist; and (c) messages addressed to legitimate addressees are run through SpamAssassin. The latter isn’t great, and it’s configured for fairly non-aggressive interpretation to avoid false positives, but it’s a decent first-round of defense. One important configuration option was to disable message bouncing. When 99% of your email is spam, you don’t want to be automatically replying to it. Let the bogus messages fall on the floor.
  2. From the web-hosting account, everything is forwarded to Gmail accounts. The quality of Gmail’s spam filtering varies greatly, although not so much recently. It used to be highly accurate, but I think that since it’s such a CPU-intensive activity that Google has had to settle on less-accurate filtering. I actually think they have the ability to throttle the quality of their spam filtering based upon their load and available processing power, but that’s purely speculative on my part. A side benefit of running everything through Gmail is that it provides automatic archiving and searching of email and allows for remote access. While not on the road, I retrieve mail from Gmail using POP3 and the standard OS X Mail program.
  3. The final step is a $30 utility called SpamSieve. It’s one of those Bayesian filter applications, and works very well. After training with a few hundred messages, it is quite accurate, and it’s also quite easy to use.

Not related to spam per se, I also use MailSteward Lite to archive old messages in a searchable database. I’ve kept every non-spam email message for the past 11 years or so, and it’s all in there — even those messages that survived the migration from Outlook on Windows to Mac Mail a few years ago, which I did with a marvelous $10 program called O2M. The only reason I use MailSteward Lite is that OS X Mail gets slow when the mailboxes contain many thousands of messages. The big disadvantage is that it’s not externally searchable, most notably not by Spotlight.

Yes, it was a lot of work to get to this point. I haven’t mentioned the many tools I’ve tried and abandoned. But I now have a configuration that works well and is easy to use. I recommend it to anyone who, like me, has a very visible email address and who gets a lot of spam.

2 thoughts on “Spam Filtering

  1. I can understand that one must do what one needs to do, but with so many levels of defense where (or how) do you look for false positives? Or maybe you don’t even bother since your inbound volume is so high?

    Like

  2. Frank, Spam Assassin is configured rather weakly, so I don’t check there. About once a month I have to fish something out of the Gmail Spam folder. Most of the false positives are from SpamSieve, but that’s just due to my aggressive settings. I still (quickly) review SpamSieve’s positives and use them to further train the algorithm.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s