On Fri, Jul 21, 2006 at 12:25:40AM -0400, Milan Budimirovic wrote: > I would think that it would be better to feed spamassassin the > "false negatives", i.e. the spam messages it *doesn't* catch, and > run "sa-learn --spam" on that. Yes, but generally only if autolearn is on, since you're making it "un-learn" that message as "ham". And in any case, feeding it *all* spams (true positives and false negatives alike) still helps plenty. Before learning, it classifies messages based on their characteristics (minus how "hammy" they look, via Bayesian autolearning). False negatives are just messages that didn't have enough spammy characteristics to qualify on their own. Once it's learned what your average spam looks like, that little extra point boost from Bayesian classification helps push false negatives over the edge, and it becomes a true positive. Using true positives is really just as effective as false negatives for learning what spam looks like. The only difference is that false negatives have the added "need to un-learn this" issue, assuming autolearn is on. > Fortunately I have on my main server several old accounts that > collect nothing but tons of spam. I just alias all that mail to the > same account, filter all the spam that is caught into a separate > folder, and I end up with hundreds of false negatives. I would just learn using all of them. Automatically via procmail, even.
Attachment:
signature.asc
Description: Digital signature