home | list info | list archive | date index | thread index

[OCLUG-Tech] Re: [oclug] SpamAssassin

On February 22, 2005 10:02 am, Jon Earle wrote:
>
> Questions:

How to get linux/unix mailservers to catch spam . . .

Let me start by saying that I have been using the same email address for many 
years so it's probably on every spammer's list.  I get a lot of spam . . . 

My mailserver runs sendmail on RH7.2 (I'm busy upgrading to FC3 and just 
finished installing Spamassassin and MailScanner and ClamAV)

As a first level of defense, I have sendmail rejecting mail sources on the 
basis of black-hole lists as well as hosts listed in my "access" database.  
Those methods reject several thousand e-mails per day (about 5600 yesterday 
based on my log files).  Most of those are sent here by people who don't know 
the difference between somedomain.com and somedomain.com.xx and who can blame 
them?

For the mail that gets past that hurdle, the spamassassin threshold is set at 
2 ( yes, two! ) and that filters out about 150-250 additional emails per day.  
However, at a setting of 2, I know I'm going to get false positives (mail 
marked as spam but it's not) - usually about 1 or 2 per day even though I 
keep adding addresses to the "whitelist".  Today my father asked me if I was 
able to "work at home".  That combined with a "hello" in the subject line 
shot him well over the threshold. Added him to the whitelist - another 
performance hit on the system.

At the same time I'm still getting up to 50 messages per day that are spam but 
are not caught by the filter - and I don't even have to read them to see that 
they're junk - but try and do that with an algorithm.  I decided on the "2" 
setting based on trial and error and that setting of 2 is much more 
aggressive than any recommendations I have found - most say to start with 5.  
If I set it much higher than 2, I might as well not bother since most of the 
spam will get through.

I've never got the bayes filter to work properly - all it was doing was 
increasing the number of false positives.  Obviously not being used right by 
me and my system but rather than try figure it out - I turned it off. So if 
you go the spamassassin route and find all this info useful and figure out 
how to use the bayes filter - keep me in mind.

The spam sources are continuously getting smarter about how they get through 
the filters too.  And of course the worst culprit is html based e-mail - I'm 
one of those who thinks it's evil but refuse to rant about it since I can 
understand why lots of people use it.

So the way I see it, with all these methods, I get a very low percentage of 
false positives (but enough that I have to scour through the spam mail daily) 
although who knows how many false positives are rejected long before 
spamassassin gets hold of the mail). Even though about 50 spams per day get 
through the filters, that's a very low % of the total junk trying to get in.   

Based on these results though it's still a major pain since the spam getting 
through is a large proportion compared to the other valid mail - which is 
probably similar to what most IT folks get per day (barring of course 
subscription to lists like this one).  So for me the measure of a good spam 
filter is expressed as the ratio of:  

( missed spam / valid mail ) 

and on most days I still get more spam than valid mail. YMMV

Buried under spam . . . . 

Alex
====


-- 
This message has been scanned for viruses and
dangerous content, and is believed to be clean.