SpamAssassin is a frequently used companion for Exim. However, most people set it up in a synchronous manner – spam is checked directly when the SMTP session is opened. While this is certainly a valid technique, it has it’s drawbacks. It leaves the server vulnerable to DOS attacks because the spam filtering is a big resource hog. Having SpamAssassin headers in the mail from the remote servers is also an issue, because the
$h_X-Spam-* variables will start misbehaving suddenly.
For the purpose of this article I am going to assume you are fairly familiar with writing your own Exim configuration and you are also able to set up your SpamAssassin configuration. If you lack either of these abilities, please read up on both topics first.
After filtering spam with Exim, I wanted to add Spamassassin to do content based filtering. While testing the spam filtering, I ran into a bit of an issue: I encountered a spam score factor in every single e-mail:
RDNS_NONE with the score of 1.3.
Doing a quick Google turns up some less-than-useful documentation pages and a lot of people with the same problem, yet no solution. So let’s go hunting…
Defense against spam has always been a hassle. Statistical filters only get you so far and they consume a LOT of resources. For exactly that reason I like to employ basic checking policies before accepting e-mail at all. These policies have gotten me pretty far and my false positive rate is pretty low.
During my time as a sysop and later as CTO I had quite a few e-mail servers under me. Over 50 to be exact. These servers were not standalone ones, but passed on e-mails to each other. We designed the system to avoid bottlenecks and make it easily extendable. It fully met the expectations in this respect, it was very easy to plug additional nodes into the system.
There was however an other aspect, where the system was not so great. And as a matter of fact, I haven’t seen any e-mail system, that was great at this. What I mean is tracking e-mails, debugging problems. Having to SSH into just a fraction of this many servers or reading logs from this amount of servers is a really, really painful way to do it.