A common problem with filters is the fact that they are a one-size-fits all solution
to SPAM. The rules are concrete and only change based on input from updates from
the Anti-spam service.
SPAM changes too quickly to make that method effective. Additionally, what is
SPAM to you may not be to someone else. That is where Bayesian filters come in.
They are very effective at eliminating SPAM and have very low false-positive
rates for their users.
Bayesian filters are based on Bayesian logic, a branch of logic named for Thomas
Bayes, an eighteenth century Mathematician.
This type of logic applies to decision making by determining the probability
of a certain event based on the history of past events.
Using this as a model seemed a logical step for SPAM filtering. If you can predict
what SPAM will look like now based on what is has looked like in the past, you are
halfway to the solution.
To finish solving the problem, Bayesian filters were developed to be dynamic
and continue to be effective as the SPAM changes.
Bayesian filters are content based. They look for characteristics in each email
that you receive and calculate the probability of it actually being SPAM.
These characteristics are generally words in the content and the header file
information that each email contains. They can also include common SPAM HTML code,
word pairs, phrases, and the location of a phrase in the body of the email.
Typical words in SPAM would be "Free" and "Win", while "humility" would probably
not appear. The filter begins with a 50% neutral score for the email, and then adds
points for SPAM characteristics.
Likewise, deductions are made for non-SPAM characteristics present. The total
score is calculated and then action is taken based on its likelihood of being SPAM.
The filter does not assume that all arriving email is bad, rather that all email
is neutral and should be considered equally.
Bayesian filters are better than traditional content scoring filters in that
they are trained by you to recognize your email.
A doctor, for example, might have many emails legitimately using the word "Viagra".
A traditional content scoring filter would probably shoot that email to the SPAM
folder, or delete it.
This would result in a high false-positive rate for the doctor, even if you don't
want Viagra emails. The filter will build a list based on the doctors email use
and corrections to incorrectly marked email.
The initial training period may be a little time consuming, but once complete
offers a tailored solution to SPAM control for each user.
In addition to protecting the good email, the filter makes it difficult for Spammers
to trick as every filter will have individual requirements.
That being said, Spammers do have a few weapons in their arsenal to attempt to
circumvent Bayesian filters. The easiest would be to create SPAM that looks like
an everyday letter.
This would remove their ability to use typical marketing techniques and so is
not as likely with normal commercial email. For the purveyors of fraud, however,
this would be easier.
Spammers could also so weight a message with a common good word, or distort the
bad ones, that it becomes scored as neutral or lower and get through.
Once correctly marked as SPAM by you, though, the filter will adjust and not
be fooled again. This automation and ability of the software to grow as you and
SPAM change over time is key to the significance of these types of filters.
Widespread use of good Bayesian filters will not only eliminate SPAM on your
end, but would reduce the practice of Spamming altogether. If they cannot get the
mail through, they are just wasting their time.