home | list info | list archive | date index | thread index

Re: [OCLUG-Tech] Spam Assassin question

On Fri, Jul 21, 2006 at 12:25:40AM -0400, Milan Budimirovic wrote:

> I would think that it would be better to feed spamassassin the
> "false negatives", i.e. the spam messages it *doesn't* catch, and
> run "sa-learn --spam" on that.

Yes, but generally only if autolearn is on, since you're making it
"un-learn" that message as "ham".  And in any case, feeding it *all*
spams (true positives and false negatives alike) still helps plenty.

Before learning, it classifies messages based on their characteristics
(minus how "hammy" they look, via Bayesian autolearning).  False
negatives are just messages that didn't have enough spammy
characteristics to qualify on their own.

Once it's learned what your average spam looks like, that little extra
point boost from Bayesian classification helps push false negatives
over the edge, and it becomes a true positive.

Using true positives is really just as effective as false negatives
for learning what spam looks like.  The only difference is that false
negatives have the added "need to un-learn this" issue, assuming
autolearn is on.

> Fortunately I have on my main server several old accounts that
> collect nothing but tons of spam. I just alias all that mail to the
> same account, filter all the spam that is caught into a separate
> folder, and I end up with hundreds of false negatives.

I would just learn using all of them.  Automatically via
procmail, even.

Attachment: signature.asc
Description: Digital signature