Jump to content

Feeding SpamAssassin


agsteele
 Share

Recommended Posts

I'm wondering if anyone knows how the SpamAssassin data for bayesian filtering is fed and updated - if at all...?

I'm finding particular types of Email which are patently spam and should be pretty easy to identify are not being caught by the SpamAssassin filter in the Email service.

I report fairly quickly and, no doubt, the specific source gets identified and blocked soon enough. But the spammers move home pretty quickly and a means of updating SpamAssassin would hopefully catch the messages which are typically quite brief in content, contain a single URL for a spamvertised site and speak plain language with little hyperbole.

Any thoughts?

Andrew

Link to comment
Share on other sites

I used the "Help Search ..." link, which is set to default to the Advanced search screen ... plugged in the keywords "spamassassin train" .. author "jefft" ... selected "All Forums" .... I recall ancient discussions about this, and others dealing with adding in some other 'packages' .. anyway, the most direct Topic/Discussion seems to start with Bayesian filtering, Also why spam reporting is slower

Link to comment
Share on other sites

the most direct Topic/Discussion seems to start with Bayesian filtering, Also why spam reporting is slower

Hi Wazoo!

I'm not sure how I missed that thread in my attempts to locate information :)

It seems that the bayesian filtering was probably turned off (see post) as a failed experiment so that part of SpamAssassin isn't available which probably accounts for why the obvious spam continues to slip through the net... :(

Andrew

Link to comment
Share on other sites

It seems that the bayesian filtering was probably turned off as a failed experiment so that part of SpamAssassin isn't available which probably accounts for why the obvious spam continues to slip through the net... :(
Not "probably" -- it was turned off. The SA system handling our email isn't being trained. In the absence of training, I think that SA's effectiveness can be improved by regularly evaluating and updating the "custom rulesets" that are applied:

http://wiki.apache.org/spamassassin/CustomRulesets

I've not yet tweaked a SA installation myself, so I speak from very limited knowledge on this issue.

DT

Link to comment
Share on other sites

One of those discussions can be found at New Spamassassin Rules

Yes, thanks, but...that discussion is very old (over three years ago, just after the "failed Bayes experiment"), and the last message in that topic was from JT, where he wrote:

No, we're not using the scripting to update the SA rules. I'm a little uncomfortable with stuff like that happening automatically without any intervention here.

So, it appears that he decided not to utilize automated updating of the rules. Therefore, the rules need regular manual tweaks to be very effective, and I'm not so sure that's been going on. Perhaps JT or Trevor will see this and reassure us with some details.

DT

Edited by DavidT
Link to comment
Share on other sites

Yes, thanks, but...that discussion is very old (over three years ago, just after the "failed Bayes experiment"), and the last message in that topic was from JT, where he wrote:

So, it appears that he decided not to utilize automated updating of the rules. Therefore, the rules need regular manual tweaks to be very effective, and I'm not so sure that's been going on. Perhaps JT or Trevor will see this and reassure us with some details.

Bayes learning is still disabled, but we update the rules periodically. We run the updater every monday, but the SpamAssassin rules don't usually update that often.

Link to comment
Share on other sites

Bayes learning is still disabled, but we update the rules periodically. We run the updater every monday, but the SpamAssassin rules don't usually update that often.

Sounds good to me...thanks for the speedy response.

DT

happy to be a SC email account customer :-)

Edited by DavidT
Link to comment
Share on other sites

Bayes learning is still disabled, but we update the rules periodically. We run the updater every monday, but the SpamAssassin rules don't usually update that often.

Hi Trevor!

I suppose the question that follows your helpful and speedy contribution is what is needed to tweak the rules so that these large volumes of evident spam that slip through the current rules are trapped by SpamAssassin?

I'm sure you don't want quantities of examples, although there will be many here happy to dump their folders into your mailbox :)

Andrew

Link to comment
Share on other sites

I suppose the question that follows your helpful and speedy contribution is what is needed to tweak the rules so that these large volumes of evident spam that slip through the current rules are trapped by SpamAssassin?

I'm sure you don't want quantities of examples, although there will be many here happy to dump their folders into your mailbox :)

I know, we get them too... the spams that SpamAssassin considers harmless and whose source changes faster than the blacklists can catch them. Bayesian filtering probably is the way to stop those messages, and we want to get it running again, but past experiences show it's difficult to train. One of the biggest problems is that Bayesian filtering is worthless unless people are training both spams AND good mail, but convincing users to report all their good mail is difficult.

Link to comment
Share on other sites

Maybe a bit tangential, but I just looked at the headers on a spam message that just made it to my inbox, and it's score was:

X-spam-Status: hits=0.0 tests=NORMAL_HTTP_TO_IP version=3.2.3

However, the scoring on the same message from the other server through which the message passed was:

X-Barracuda-spam-Score: 2.71

X-Barracuda-spam-Status: Yes, SCORE=2.71 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=3.0 KILL_LEVEL=4.0 tests=HELO_DYNAMIC_IPADDR, NORMAL_HTTP_TO_IP, NO_REAL_NAME, RCVD_IN_PBL

X-Barracuda-spam-Report: Code version 3.1, rules version 3.1.23667

Rule breakdown below

pts rule name description

---- ---------------------- --------------------------------------------------

0.80 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL

[86.204.165.228 listed in zen.spamhaus.org]

0.55 NO_REAL_NAME From: does not include a real name

1.36 HELO_DYNAMIC_IPADDR Relay HELO'd using suspicious hostname (IP addr 1)

0.00 NORMAL_HTTP_TO_IP URI: Uses a dotted-decimal IP address in URL

I realize that our BL tests are done *after* the SA analysis (and it would sure be nice to either add the SpamHaus PBL or Zen to our options!), but I wonder why SpamCop's SA didn't pick up on the NO_REAL_NAME or the HELO_DYNAMIC_IPADDR, both of which should have increased the score?

Tracking URL:

http://www.spamcop.net/sc?id=z1445311437ze...134506bc23d898z

(some manual munging to protect my info...I didn't actually submit this version, however)

DT

Edited by DavidT
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...