On Thu, Jul 20, 2006 at 10:08:31AM -0400, Hugh Campbell wrote: > The spam filtering definitely _does_ work but it doesn't seem > particularly effective. At a minimum, it seems much less trainable > than my cat (that's not a lot). I seem to be training the spam > filter repetitively against the same or very similar types of mail. You need huge amounts of spam to train spamassassin, but it does indeed work. After a bout of spam making it through, I trained it against my 4000 spams collected over time (most of them categorised as such by spamassassin itself), and then received no uncaught spam for a while. Note that even with "autolearn" turned on, I believe it will learn from "ham" (non-spam), but not automatically learn from spam, since that could be dangerous (a la "contagious") if it starts to classify good messages as spam. With autolearning on, when you tell spamassassin to learn a message as spam, it also undoes any "ham" learning it might have done. So it's still a good idea to learn spams as spam, lest similar spam messages start to be classified as ham instead of spam. Finally, even a 100% Bayesian match can only contribute 3.5 points to a message in the default configuration. Unless you raise the value of a Bayesian match and/or lower your spam threshold, you need other characteristics to classify a message -- which, thankfully, are usually present in professional spams. As a side note, it's been my experience that the most easy spams to catch (ironically) are the professional ones, since their huge reader base means tons of people bugged enough to add the appropriate rules to anti-spam software. If they're not excessively pushy in their language, local businesses can add you to their "mailing list" and their mails may get through easily. > In frustration, I finally decided to set up a quick filter in KMail > to get rid of the e-mail "What is OEM Software" that I keep getting, > and remembered that I shouldn't have to hand write a filter to get > rid of such a trivially easy-to-spot spam. Easy to spot for a human. For spamassassin, it needs other cues. The default rules are generally more aimed towards "typical" spams -- penis enlargement / "performance enhancement", Nigerian letters, etc. Aside from those, it also checks for known spamming hosts (via dynamic online lists), and characteristics of spam in general (like HTML-only messages, failure to adhere to mail protocol, etc.). If spamassassin is working, but then similar spams start to get through, generally the best option is to upgrade spamassassin to get the latest set of rules. If you have enough of a spam backlog, though, training can provide an additional anti-spam edge.
Attachment:
signature.asc
Description: Digital signature