Erik Enge writes:
Anthony Ventimiglia anthony@ventimiglia.org writes:
I don't know what kind of filtering rates Spamassassin gets, but the Bayesian filter I use (PopFile) has a success rate over 98%.
The biggest problem I have here is that I get false positives when I set the this-is-spam score lower than four and most spam I get is between three and four.
So spamassassin still has those silly scores, that's the main problem, A pure Bayesian filter gives a score 0 < score < 1, which is a percentage, so basically anything over .50 is spam and under is not.
I recommend trying Popfile or bogofilter for your personal use and you'll see how quickly it "learns". False positives are the biggest problem, but after a while, you'll see that they become quite rare (under .5%). If you want a good argument read Paul Graham's web site.
Like I said I used spamassassin a while ago, before Bayesian filters came to the forefront. When I learned about Bayesian filters (thanks to Graham and ESR), I ended up writing my own Library (C++) and wrote my own filter. They aren't perfect, but the spam that makes it through is very un spam like, usually it looks like a spammer trying to beat the filter, but by that point it's not very effective spam.
I have been slowly converting my C++ library to Lisp, which I'll eventually bring here.