Feeding SpamAssassin

agsteele · September 24, 2007

I'm wondering if anyone knows how the SpamAssassin data for bayesian filtering is fed and updated - if at all...?

I'm finding particular types of Email which are patently spam and should be pretty easy to identify are not being caught by the SpamAssassin filter in the Email service.

I report fairly quickly and, no doubt, the specific source gets identified and blocked soon enough. But the spammers move home pretty quickly and a means of updating SpamAssassin would hopefully catch the messages which are typically quite brief in content, contain a single URL for a spamvertised site and speak plain language with little hyperbole.

Any thoughts?

Andrew

Wazoo · September 24, 2007

I used the "Help Search ..." link, which is set to default to the Advanced search screen ... plugged in the keywords "spamassassin train" .. author "jefft" ... selected "All Forums" .... I recall ancient discussions about this, and others dealing with adding in some other 'packages' .. anyway, the most direct Topic/Discussion seems to start with Bayesian filtering, Also why spam reporting is slower

agsteele · September 25, 2007

the most direct Topic/Discussion seems to start with Bayesian filtering, Also why spam reporting is slower

Hi Wazoo!

I'm not sure how I missed that thread in my attempts to locate information

It seems that the bayesian filtering was probably turned off (see post) as a failed experiment so that part of SpamAssassin isn't available which probably accounts for why the obvious spam continues to slip through the net...

Andrew

DavidT · September 25, 2007

It seems that the bayesian filtering was probably turned off as a failed experiment so that part of SpamAssassin isn't available which probably accounts for why the obvious spam continues to slip through the net...

Not "probably" -- it was turned off. The SA system handling our email isn't being trained. In the absence of training, I think that SA's effectiveness can be improved by regularly evaluating and updating the "custom rulesets" that are applied:

http://wiki.apache.org/spamassassin/CustomRulesets

I've not yet tweaked a SA installation myself, so I speak from very limited knowledge on this issue.

DT

Wazoo · September 25, 2007

I think that SA's effectiveness can be improved by regularly evaluating and updating the "custom rulesets" that are applied:

One of those discussions can be found at New Spamassassin Rules

DavidT · September 25, 2007

One of those discussions can be found at New Spamassassin Rules

Yes, thanks, but...that discussion is very old (over three years ago, just after the "failed Bayes experiment"), and the last message in that topic was from JT, where he wrote:

No, we're not using the scripting to update the SA rules. I'm a little uncomfortable with stuff like that happening automatically without any intervention here.

So, it appears that he decided not to utilize automated updating of the rules. Therefore, the rules need regular manual tweaks to be very effective, and I'm not so sure that's been going on. Perhaps JT or Trevor will see this and reassure us with some details.

DT

trevorb · September 25, 2007

Yes, thanks, but...that discussion is very old (over three years ago, just after the "failed Bayes experiment"), and the last message in that topic was from JT, where he wrote:
So, it appears that he decided not to utilize automated updating of the rules. Therefore, the rules need regular manual tweaks to be very effective, and I'm not so sure that's been going on. Perhaps JT or Trevor will see this and reassure us with some details.

Bayes learning is still disabled, but we update the rules periodically. We run the updater every monday, but the SpamAssassin rules don't usually update that often.

DavidT · September 26, 2007

Bayes learning is still disabled, but we update the rules periodically. We run the updater every monday, but the SpamAssassin rules don't usually update that often.

Sounds good to me...thanks for the speedy response.

DT

happy to be a SC email account customer :-)

agsteele · September 26, 2007

Bayes learning is still disabled, but we update the rules periodically. We run the updater every monday, but the SpamAssassin rules don't usually update that often.

Hi Trevor!

I suppose the question that follows your helpful and speedy contribution is what is needed to tweak the rules so that these large volumes of evident spam that slip through the current rules are trapped by SpamAssassin?

I'm sure you don't want quantities of examples, although there will be many here happy to dump their folders into your mailbox

Andrew

trevorb · September 26, 2007

I suppose the question that follows your helpful and speedy contribution is what is needed to tweak the rules so that these large volumes of evident spam that slip through the current rules are trapped by SpamAssassin?
I'm sure you don't want quantities of examples, although there will be many here happy to dump their folders into your mailbox

I know, we get them too... the spams that SpamAssassin considers harmless and whose source changes faster than the blacklists can catch them. Bayesian filtering probably is the way to stop those messages, and we want to get it running again, but past experiences show it's difficult to train. One of the biggest problems is that Bayesian filtering is worthless unless people are training both spams AND good mail, but convincing users to report all their good mail is difficult.

DavidT · September 26, 2007

Maybe a bit tangential, but I just looked at the headers on a spam message that just made it to my inbox, and it's score was:

X-spam-Status: hits=0.0 tests=NORMAL_HTTP_TO_IP version=3.2.3

However, the scoring on the same message from the other server through which the message passed was:

X-Barracuda-spam-Score: 2.71
X-Barracuda-spam-Status: Yes, SCORE=2.71 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=3.0 KILL_LEVEL=4.0 tests=HELO_DYNAMIC_IPADDR, NORMAL_HTTP_TO_IP, NO_REAL_NAME, RCVD_IN_PBL

X-Barracuda-spam-Report: Code version 3.1, rules version 3.1.23667

Rule breakdown below

pts rule name description

---- ---------------------- --------------------------------------------------

0.80 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL

[86.204.165.228 listed in zen.spamhaus.org]

0.55 NO_REAL_NAME From: does not include a real name

1.36 HELO_DYNAMIC_IPADDR Relay HELO'd using suspicious hostname (IP addr 1)

0.00 NORMAL_HTTP_TO_IP URI: Uses a dotted-decimal IP address in URL

I realize that our BL tests are done *after* the SA analysis (and it would sure be nice to either add the SpamHaus PBL or Zen to our options!), but I wonder why SpamCop's SA didn't pick up on the NO_REAL_NAME or the HELO_DYNAMIC_IPADDR, both of which should have increased the score?

Tracking URL:

http://www.spamcop.net/sc?id=z1445311437ze...134506bc23d898z

(some manual munging to protect my info...I didn't actually submit this version, however)

DT

Sign In

Feeding SpamAssassin

Recommended Posts

agsteele

Link to comment

Share on other sites

Wazoo

Link to comment

Share on other sites

agsteele

Link to comment

Share on other sites

DavidT

Link to comment

Share on other sites

Wazoo

Link to comment

Share on other sites

DavidT

Link to comment

Share on other sites

trevorb

Link to comment

Share on other sites

DavidT

Link to comment

Share on other sites

agsteele

Link to comment

Share on other sites

trevorb

Link to comment

Share on other sites

DavidT

Link to comment

Share on other sites

Archived

Browse

Activity