Over the last 6 months on and off, I’ve built myself a bayesian spam filter. In this article I’ll give an overview of the thing, the theory behind it and the experiences I’ve had and the things I’ve learned while building it.
Intro Bayesian spam filters classify email messages into basically two categories:
“Spam”, i.e. unwanted messages such as advertisement “Ham”, i.e. wanted messages such as corrospondence from friends There are gray areas between these two when a filter does not know how to classify a message.