Jump to content

Another idiot direction to "Unsubscrie" from the spam


axxx007

Recommended Posts

Gmail I can accept. They offer a free service. While spammers take advantage of them, they provide a free service to a lot of regular people.

Except that I'm starting to see spam messages coming to my inbox that contain Gmail response addresses, and I've read that Gmail isn't being very responsive to this growing problem. I hope that's incorrect, or their hat color is quickly darkening. :angry:

DT

Link to comment
Share on other sites

Forgive me for the excessively long rant ...

I think that having negative scoring tests is a GOOD idea. Just amassing negative scores will make false positives too common.

But including a 3rd party whitelist? By default. That's just stupid. A spammer could (although not many would take the time) invest a little work in getting onto that whitelist and then slowly corrupt the Bayes-based filtering system of anyone using it. Provided that they also use a Bayesian system.

Since you know you'll have 4 points knocked off of your spam score, keep the spammishness down for the first few messages. They'll be learned by the Bayesian system as "okay". Then you can ratchet up the spam content.

Do this late at night (the messages from MessageLabs.com arrive at 3am my time) and you can have someone's Bayesian filter learning "v1agr4" as legit content by the time they get in at 7am.

spam is dynamic and so are SA rules. The rules and scores associated with them don't only change with version releases, they are updated every two or three weeks. Run "sa-update" to get the latest ruleset.

spam also varies from site to site and the SA rules are a compromise designed to catch the largest amount of spam for the fewest false positives based on a large sample of spam and ham. You can change the scores for any rules that you find problematic by editing SAs local.cf file. The following will disable the rules causing you problems:

score RCVD_IN_DNSWL_LOW 0
score RCVD_IN_DNSWL_MED 0
score RCVD_IN_DNSWL_HI 0

Also, you should not rely on the Bayes algorithm auto-learning where the score has produced an error. Tell it that it got it wrong. Copy false negatives into a bad-spam box and false negatives into a bad-ham box. Run "sa-learn --spam --mbox /path/to/bad-spam" and "sa-learn --ham --mbox /path/to/bad-ham" (Replace --mbox with --mbx if you're using mbx instead of mbox. Use --folders=/path/to/bad-spam" if you're using maildir.) Finally clear out the mailboxes ready for another run. It's fairly simple to write something to automate all of that.

Link to comment
Share on other sites

spam is dynamic and so are SA rules. The rules and scores associated with them don't only change with version releases, they are updated every two or three weeks. Run "sa-update" to get the latest ruleset.

Yep. Got that. I might as well avail myself to the experience of everyone else. It saves me a lot of time.

You can change the scores for any rules that you find problematic by editing SAs local.cf file. The following will disable the rules causing you problems:

I changed those rules as soon as I saw that they had a 3rd party whitelist. I still think that that is a completely idiotic idea.

Also, you should not rely on the Bayes algorithm auto-learning where the score has produced an error. Tell it that it got it wrong. Copy false negatives into a bad-spam box and false negatives into a bad-ham box. Run "sa-learn --spam --mbox /path/to/bad-spam" and "sa-learn --ham --mbox /path/to/bad-ham" (Replace --mbox with --mbx if you're using mbx instead of mbox. Use --folders=/path/to/bad-spam" if you're using maildir.)

That's the problem. If it happens in the middle of the night, there's no way for you to undo the damage before your users see the flood of Bayes-approved spam in the morning.

And if it was done intelligently, the spammers would be able to exploit the whitelist multiple times.

Simply send some spam from the whitelisted site with with some strings that you want to try getting accept by the Bayesian system.

Then send non-spammy email with those strings from the zombies and see if it gets rejected. If not, ratchet up the spam bit by bit. Bayes should learn it as okay because it will be very similar to the stuff that it accepted before.

Since only the initial strings are from the whitelist site, it should be very time consuming to find the problem. Particularly since it will be buried in a flood of spam.

3rd party blacklists are acceptable because it's easier to tell if some site is bad than to tell if some never before seen site is good.

Link to comment
Share on other sites

<snip>

That's the problem. If it happens in the middle of the night, there's no way for you to undo the damage before your users see the flood of Bayes-approved spam in the morning.

<snip>

...Your machine can't run command-line requests ("sa-learn --spam --mbox /path/to/bad-spam" and "sa-learn --ham --mbox /path/to/bad-ham") at designated times? Heck, even Windows can do that! :) <g>

...Obviously, I don't understand SA! :) <g>

Link to comment
Share on other sites

...Your machine can't run command-line requests ("sa-learn --spam --mbox /path/to/bad-spam" and "sa-learn --ham --mbox /path/to/bad-ham") at designated times? Heck, even Windows can do that! :) <g>

...Obviously, I don't understand SA! :) <g>

The problem is that SA has miss-identified the mails in the 1st place. He has spam that SA identified (and possibly scanned with the Bayes engine (see below)) as ham. While the Bayes engine can be told that a mistake was made, it needs a real human to identify that mistake (whether the rest of the process is automated or not).

Brandioch, you obviously already know a lot about the workings of Spamassassin, so forgive me if you know this part too. The SA Bayes autolearn function doesn't work on all mail that has been scanned, it will ignore mail that is around the scoring borders (I haven't done the research to identify what the actual rules are). If you have terse reporting turned off, you can see what the Bayes engine actually did with the mail. The three results shown in the mail headers are:

  • autolearn=spam - The engine learned from the mail treating it as spam.

  • autolearn=ham - The engine learned from the mail treating it as ham.

  • autolearn=no - The score was close to the border, so no Bayes learning was performed.

So even if SA identified the mail as ham, it doesn't necessarily follow that the Bayes engine modified its tokens based on the mail content.

It should also be pointed out that a small amount of poisoning results will not have a significant effect on the reliability of the classifications within a large corpus of ham and spam. In situations where the size of the training corpus is small, it could have a large effect. This is why SA will not score on the Bayes engine results until tokens have been built from a significant corpus of both spam and ham. Read up on Bayesian Belief Networks if you want to dig into the details . :P

Link to comment
Share on other sites

...Your machine can't run command-line requests (<snip>) at designated times? Heck, even Windows can do that! :) <g>

...Obviously, I don't understand SA! :) <g>

The problem is that SA

<snip>

needs a real human to identify that mistake (whether the rest of the process is automated or not).

<snip>

...Gotcha! I thought it might be something like that (so my disclaimer about my ignorance was quite relevant!). Thanks. :) <g>
Link to comment
Share on other sites

Brandioch, you obviously already know a lot about the workings of Spamassassin, so forgive me if you know this part too. The SA Bayes autolearn function doesn't work on all mail that has been scanned, it will ignore mail that is around the scoring borders (I haven't done the research to identify what the actual rules are). If you have terse reporting turned off, you can see what the Bayes engine actually did with the mail. The three results shown in the mail headers are:

Yep. The problem is that with the whitelist included in the default rules (and with a -4 score), Sending a random word string (bouncy_bunny) in an otherwise clean message COULD get the word string learned as "ham". As long as nothing else was included.

That's the first message in the first minute.

Now that Bayes has learned that string, you can send messages from the zombies and include that word string.

That gives you the next 10 messages in the second minute.

SpamAssassin (and most other Bayesian-based systems) try to limit the possible damage by NOT including the Bayes score in the spam score to determine whether the message should be learned (either as spam or as ham).

The problem is that the spammers could still get your Bayes database to learn "v1agra" as a legit ham word if they wanted to.

It should also be pointed out that a small amount of poisoning results will not have a significant effect on the reliability of the classifications within a large corpus of ham and spam. In situations where the size of the training corpus is small, it could have a large effect. This is why SA will not score on the Bayes engine results until tokens have been built from a significant corpus of both spam and ham. Read up on Bayesian Belief Networks if you want to dig into the details . :P

It doesn't have to be much poisoning for this to succeed. It just has to be very focused.

The spammer knows what messages he wants to get through. That means he knows what strings he'll have to get past Bayes. He can already run the regular SpamAssassin tests against his own messages to see what that score would be.

That will tell him what strings to poison in your Bayesian system in order to get through.

Without the whitelists, he could not do that. The best he could do would be to test against the stock SpamAssassin rules and rely upon Chance to get past your Bayes system. And given my experience, that is very unlikely. Bayes is so idiosyncratic ... it's almost magic.

Which is why I get so annoyed about this.

Link to comment
Share on other sites

Another day, another spam run from MessageLabs.com.

The idiot I talked to yesterday said that they were investigating it and that none would come through today.

So, I'm back on the phone with MessageLabs.com. Waiting for the tech people to pick up the line. It's a rather long wait. I wonder if they have ANI and are avoiding me now.

:)

Remember, it's not spam if someone paid you to send it.

And even though it can be shown to have come from your servers, it wasn't sent by you.

Link to comment
Share on other sites

No spam from them today. That's good. But some very interesting items came up yesterday in my phone conversation with them. It is possible that I may have not understood what he was saying, but ...

#1. MessageLabs.com does not check their logs for rejected messages. Well, neither do spammers.

#2. MessageLabs.com does not send the rejected messages back to the original sender. I don't believe that zombies do, either.

#3. MessageLabs.com had not seen a need for any type of communication with their clients regarding specific email messages. Neither do spammers for the most part.

I have been told that MessageLabs.com will probably be keeping the customer who was sending the spam (big surprise there, it's about the money) but they claim that they'll be implementing new processes for handling issues such as mine.

Now it may just be me .... but if I were setting up a relay business, the SECOND thing I'd do would be to make sure that I had a way of getting rejection notices and such to my customers and validating that they were cleaning their lists.

The first thing would be setting up the abuse[at] and such so that it would be easy for other people to complain through official channels.

What kind of business does not care if their emails are being rejected or delivered?

Link to comment
Share on other sites

<snip>

What kind of business does not care if their emails are being rejected or delivered?

...The kind that send 1 trillion e-mails, hoping a few arrive in the inboxes of the gullible.

...Contrary to implications from others, above, greed leading to profit does not always equate to bad. Greedy smart people who make a profit tend to use the less valuable to make and sell the more valuable. It's when they knowingly take advantage of "free" resources (for which others actually pay, as in the present case) or people who for some valid reason are not able to collect information on things they purchase, or otherwise break the law, that it might be bad.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...