For years I’ve fought spam with all sorts of techniques, some limited server side tricks in setting my postfix rules to very strict adherence and using RBLs, but ultimately settling on whitelist filtering on my Trusty Eudora client, POPping all that spam over whatever airport international dialup I happened to be on and cursing it even as it disappeared into the UBC folder for bulk deletion.
And I dreamed of the day when I would switch to IMAP and set up all those cool anti-spam server-side techniques I’d been reading about, primarily SpamAssassin. The problem with spam filtering is that it often catches your friends.
So I found this great procmail filter that whitelisted on the server side and sent confirmation requests to unlisted addresses. So I installed Procmail on my server, then SpamAssassin, and rewrote the filter below to do just what I wanted:
Walking through the filter:
First there is some set up, most of it from Nic’s original filter. I use Maildir style mailboxes, so I had to fix that, and I wanted server-side filtering into those mailboxes.
So section one sets up the variables, including the locations of the mailboxes and the locations of lists like “accept-list” that are lists of email addresses that go in various mailboxes or should be whitelisted (or blacklisted).
# Don’t change anything else unless you know why you’re doing it!
I did rather a lot of messing about.
First I load the Sender address, Sender Domain, and Subject of the mail into variables (the first two for checking against lists, the last for logging).
Then Nic wrote some cool filters to allow you to add whitelist or black list addresses by email. I added a test to make sure those requests came from me.
I moved some of the checks around because SpamAssassin is fairly expensive in time and computation, so I wanted that last and only for unsorted mail.
- I check first against the whitelist and (optionally) log the result. If the sender is listed, the mail is delivered to my inbox.
- I then check against various mailing lists and sort the remaining mail into the right mailbox. This is mostly because web and WAP clients can’t do this sorting at the client level and I don’t want my inbox cluttered. If the sender is listed in one of the address lists, the mail is delivered directly to the corresponding mailbox.
- I then check against a black-list (note white lists come first, in case someone ends up on both) and /dev/null the offenders.
- Everything left gets run through SpamAssassin set to call Spam at a 4. I think I could move that down to 3 as no Ham has scored above a 3 and to even be checked it has to be an unsolicted sender anyway.
- If the sender is unknown, but the message isn’t spam, then I check again against sender domains (not sender addresses) for some of the lists since a number of listservs use morphing from addresses.
- If the sender’s domain isn’t listed and the message isn’t spam it is delivered to my inbox. A few spam messages (<1%) have gotten through, but they are being used to train SpamAssassin’s baysian filter, and I think I can turn the threashold down to 3.
- “Spam” (per SpamAssassin) from unknown senders gets, as Nic calls it, a Chicken – a request to confirm they really want to send me email. Spammers do not respond, real people usually do.
- All the sender has to do is reply to the chicken request and their original message is released and they are whitelisted.
- If the sender doesn’t respond, a script deletes their message from the holding pen after about a week.
The script, as written, logs non-spam delivery emails the way the script sees it (the “from” address looks different than it does in Thunderbird) to make adding the address to the appropriate list easier. It also logs the from and subject of messages that get a chicken request for easier review.
My server is rejecting a hundred or more spam messages a day and I’m getting on average 1 now. Thanks Nic.