Why does this spam get past spamcop

elind · February 3, 2009

I'm not sure if I have asked this before, but can anyone explain how this spam got past spamcop?

http://www.spamcop.net/sc?id=z2583318867za...9d3f82ba6186eaz

I regularly receive a handful of spam that are not held by spamcop, but I also see the same basic spam in my held mail which may be in the hundreds per day (always to my spamcop address). There must be something in this one and others like it (addressed in this case to "asl" at spamcop, but many say "my name" at spamcop) that prevents it from being recognized as spam, even thought everything in the mail is obviously spam.

I do have one email address that my cleared mail is forwarded to and which is not filtered, but would not that be shown in the TO field if it was the one being used?

rconner · February 3, 2009

I'm not sure if I have asked this before, but can anyone explain how this spam got past spamcop?
http://www.spamcop.net/sc?id=z2583318867za...9d3f82ba6186eaz

As you may or may not know, SpamCop filters mail primarily by checking its source IP address, specifically whether this address is on the SpamCop blocking list for very recent spam activity (past 24 hours). It appears that the source of this message was 189.106.161.246. I pasted it into the SpamCop web page to check on it. Apparently it went on the SCBL about 3 hours previously to this post (about 1900 GMT). Perhaps this was just after you reported it, maybe it was your report that put it over the top. So, give yourself a pat on the back.

I regularly receive a handful of spam that are not held by spamcop, but I also see the same basic spam in my held mail which may be in the hundreds per day (always to my spamcop address). There must be something in this one and others like it (addressed in this case to "asl" at spamcop, but many say "my name" at spamcop) that prevents it from being recognized as spam, even thought everything in the mail is obviously spam.

Again, it is the origin IP address of the mailing that counts most heavily to SpamCop. Since spammers will send from many, many different machines and addresses in the course of single run, any given address is not going to be trapped until its spam activity reaches a certain volume (known only to the SCBL operators). SpamCop does examine other parts of the message via a SpamAsasssin filter, but according to your tracker this message only scored 2.5, below the typical threshold for declaring it to be spam. If there were more "spamminess" in the body of the message, it might have pushed this score up to the point where SpamCop would have detained it on this basis, even if the source IP were not blocklisted.

I do have one email address that my cleared mail is forwarded to and which is not filtered, but would not that be shown in the TO field if it was the one being used?

No. The "To:" field is NOT trustworthy. It is not used in the transmission of the mail and so can be set to any value. I see on the first "Received" line that there is a "for" clause, this is probably where your address would appear (I can't see it, SpamCop has munged it).

-- rick

elind · February 4, 2009

Thanks. Regarding the last point above, I was not sure about the TO address, but where in the source is the actual TO address? Doesn't it have to be there?

In this case a few lines say:

Received: (qmail 19448 invoked from network); 3 Feb 2009 18:50:13 -0000

X-spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on filter7

X-spam-Level: **

X-spam-Status: hits=2.5 tests=URIBL_BLACK version=3.2.4

Which is what you pointed out about spamassassin and I presume this means it had to go through spamcop. But what I don't understand is the 2.5 by spamassasin.

Surely just the subject "Best Quality Swiss & Japanese Rep1icaWatches" flags spam to a newborn? On top of that it has the trademark host sites (several) as in http://code.client.cn

How can spamassassin miss that?

Another standard one that get through to me is the fake diploma ploy, and they are always identical.

I don't get it.

rconner · February 4, 2009

Thanks. Regarding the last point above, I was not sure about the TO address, but where in the source is the actual TO address? Doesn't it have to be there?

Reread the last para of my post, your answer is there.

Which is what you pointed out about spamassassin and I presume this means it had to go through spamcop. But what I don't understand is the 2.5 by spamassasin.
Surely just the subject "Best Quality Swiss & Japanese Rep1icaWatches" flags spam to a newborn? On top of that it has the trademark host sites (several) as in http://code.client.cn

How can spamassassin miss that?

Seems to me it did NOT miss it. It evidently consulted a URI blocking list and found a hit, then assigned that result a score of 2.5. If you are asking why SpamAssassin did not peg the meter when it detected this, you will have to direct this question either to the developers of SpamAssassin or the people at SpamCop who are responsible for having configured their own copy of SA.

As for the subject line, yes it looks spammy to a human, but it is very hard to get a machine to figure this out. Just trust me on this.

Once again, SpamCop is designed first and foremost to detect spam outbreaks based on the origin IP address. Dealing with spam URLs, and detailed analysis of the contents of the messages, are not as high a priority.

-- rick

Miss Betsy · February 4, 2009

I don't have the email service, but I thought that email customers could choose among several blocklists as well as the scbl and sa to filter their email.

If there is a 'signature' URL, then wouldn't the bl that uses spamvertised URLs catch it?

If I am wrong that one can use more than those two filters, then it seems to me there was a really long topic on how to tweak filters at one point. It might be worth searching for it.

Replica watch spammers seem to work really hard at evading content filters (as well as probably evading IP addresses that are pegged as spammy). Hotmail has extremely aggressive filters, it seems to me, yet replica watch spam regularly shows up for a couple of days until the filters catch up.

Good Luck!

Miss Betsy

rconner · February 4, 2009

I don't have the email service, but I thought that email customers could choose among several blocklists as well as the scbl and sa to filter their email.
If there is a 'signature' URL, then wouldn't the bl that uses spamvertised URLs catch it?

If I am wrong that one can use more than those two filters, then it seems to me there was a really long topic on how to tweak filters at one point. It might be worth searching for it.

I had just had a look for that post at the available blocklists, and while you could choose more than one, none were for URLs as far as I could tell. The filters are selected via a setup page in the webmail app.

Apparenly you do get some benefit if you use SpamAssassin. If elind had lowered his minimum SpamAssassin score to 2, then this spam might have been held by SpamCop. However, this risks a lot of false positives (non-spam messages identified as spam).

Replica watch spammers seem to work really hard at evading content filters (as well as probably evading IP addresses that are pegged as spammy). Hotmail has extremely aggressive filters, it seems to me, yet replica watch spam regularly shows up for a couple of days until the filters catch up.

That is my impression as well.

-- rick

elind · February 4, 2009

I had just had a look for that post at the available blocklists, and while you could choose more than one, none were for URLs as far as I could tell. The filters are selected via a setup page in the webmail app.
Apparenly you do get some benefit if you use SpamAssassin. If elind had lowered his minimum SpamAssassin score to 2, then this spam might have been held by SpamCop. However, this risks a lot of false positives (non-spam messages identified as spam).

That is my impression as well.

Sorry for speedreading your first answer above. I'll try to figure out the detailed source in more detail next time, before spamcop munges it.

I found I had spamassassin set to 4, I've lowered it to 3 and maybe that will reduce these that get through, although the few that do get through are kind of educational since they are the only ones I typically look at.

Nevertheless, I don't know how spamassassin works, but I thought that part of it was algorithms for parsing messages for spam characteristics. I'm pretty sure I could write one that would catch 99.99% of these common spam, which is why I find it puzzling; after all many of them are identical in every way, except the URLs, not to mention misspellings of viagra or rolex.

Anyway, thanks for the replies.

BTW, geocities is becoming common in some of these again. Doesn't Yahoo know how to recognise these things by now?

rconner · February 4, 2009

Nevertheless, I don't know how spamassassin works, but I thought that part of it was algorithms for parsing messages for spam characteristics.

SpamAssassin is a "tree" on which you can hang a variety of tests (which can be selected or even created by the individual SpamAssassin operator). Each test that flunks contributes a small numerical score. If the total score exceeds a given threshold (e.g., 5), then the message is considered spam.

BTW. now that you have lowered your SA score, you are going to want to scrutinize your held mail pretty carefully to make sure it isn't picking up non-spam mail that trips the now-lower SA score.

I'm pretty sure I could write one that would catch 99.99% of these common spam, which is why I find it puzzling; after all many of them are identical in every way, except the URLs, not to mention misspellings of viagra or rolex.

Maybe you can, why not give it a try? Just make sure you test it on a pretty big corpus of spam and non-spam messages so you can make sure it works reliably with low false-positive and false-negative rates.

-- rick

Miss Betsy · February 4, 2009

Sometimes I think some spam is just written by someone trying to 'trick' the filters, as a hobby!

While it may seem obvious to you, I think that when you start to write it, you will find that the spammer has found a 'hole' - probably something that you have had to put in to avoid catching real email. I haven't really played around with filters that much, but enough so that I can see that if you have someone trying to beat your content filters, that you will always be a few spam behind in updating your own. (That's why I think blocking is a better solution. If one of your correspondents is using an ISP or mail service, that doesn't pay attention to outgoing spam, it should not be your problem. Let him or her find another way to contact you.)

The Bayesian filters were supposed to be able to learn what was spam and what you wanted better than a pre-coded filter. So I suppose that one could write a filter to catch something that was coming regularly to you that wouldn't be caught by the filter designed for many people. I think, nowadays, most people just have an extensive 'whitelist' - particularly if they don't add correspondents frequently - and then filter aggressively.

However, it would be a fun hobby - like doing crosswords - to see how often you could 'catch' the ones who are trying to get past common filters without catching your real email.

Miss Betsy

elind · February 6, 2009

When I said I could, I believed it but wasn't really intending to take the time to do so, although I might. I've actually never looked into spamassassin as a customizable tool. Can anyone direct me to sources for that?

However, conceptually I think a good syntax analysis system could easily catch virtually all spam, except perhaps those that are clones of legitimate businesses.

At least of the stuff I get, 75% or more are identical or minor variations of prior spam, same spelling and grammar mistakes included.

However since most of it is caught by the methods mentioned above this is probably redundant anyway, but I would be surprised if some of the filters out there don't already do it. I seem to recall that when sending x rated pictures in the spam was more common, that filters looked for unusually large expanses of pink in a message......

It wouldn't surprise me if someone called that a racist ploy

Anyway, on the plus side spam has some amusing and entertaining aspects to it, if only to verify that there are many people out there more stupid than oneself.

BTW, without counting I am under the impression that volume is increasing significantly in the past month or two.

Also, since lowering the score to 3, I haven't had any of the spam mentioned above get through, but I have started to get a few Julia ones with yahoo email addresses.

Since the email addresses are all similar but with number variations, I suspect they have hundreds or more of yahoo email addresses for contact. I've informed abuse at yahoo, but I wonder if they act on something if it is not hosted or sent from them. I wonder of they read spamcop reports, being busy and all that?

rconner · February 6, 2009

When I said I could, I believed it but wasn't really intending to take the time to do so, although I might. I've actually never looked into spamassassin as a customizable tool. Can anyone direct me to sources for that?

Google "spamassassin". Lots to read. Find the main site for a starter.

However, conceptually I think a good syntax analysis system could easily catch virtually all spam, except perhaps those that are clones of legitimate businesses.

Having had some experience in the area, I'm inclined to disagree. If content analysis were that effective (and efficient) we would all be using it rather than DNSBLs like SpamCop or Spamhaus or CBL.

Also, content analysis is all but useless to an MX host, which generally has to make an accept/reject decision before the body is even available for inspection.

But maybe you know something I don't, do your homework and come back and let us know.

Since the email addresses are all similar but with number variations, I suspect they have hundreds or more of yahoo email addresses for contact. I've informed abuse at yahoo,

I assume you mean the "From" addresses that appear at the top of the e-mail. It is most unwise to assume that these are valid, deliverable addresses belonging to the spammer, unless you are specifically asked within the spam to reply to the addresses.

-- rick

elind · February 6, 2009

Well, on context analysis I agree that a completely new piece of spam, originally written, would need to be verified by other methods before being branded, but given that so much of it is duplicated, either identically or very very similarly, much of the subsequent ones could be recognized easily.

As the yahoo emails, no, it was a contact address in the body. The only contact address. Either fishing for emails or an invitation to a sex site, and as usual many identical ones except the sending source and a few number variations in the email name.

I didn't think there was any point in providing the tracking number.

On reading your above comments about building up a score on a particular spam; how is a piece of spam tagged in that system? It can't be by the sending source since that is not unique and it can't be by subject alone.

What is left except by analyzing the body text?

rconner · February 6, 2009

On reading your above comments about building up a score on a particular spam; how is a piece of spam tagged in that system? It can't be by the sending source since that is not unique and it can't be by subject alone.
What is left except by analyzing the body text?

Not clear on your question, but I assume you are wondering about how spam Assassin works. A trip to the SA website should help but in brief SpamAssassin would be run on messages that have already been accepted, so we have the complete packet including the header and body. Therefore, it would be run on an MDA or some other "back office" machine that is out of the firehose of mail relaying and has some time to spend on content filtering.

spam Assassin has access to all parts of the message for tests. The tests vary in nature, from keyword or key-phrase detection, to detection of malformed MIME or illegal characters, problems in the header, etc. Someone (don't know who or how) decides how many points each of these tests costs if it is failed. You run all the tests, then add up all the scores you get, and check against the threshold score. Voila.

No point running SA on an MX host, because the MX must accept or reject the offered message BEFORE it sees the body. It CANNOT read in the body, take time to analyze it, and then throw it back onto the sender. This is explained in the SMTP RFC.

You might take a look at this Wiki page to review the different varieties of mail hosts, including MXs.

-- rick

elind · February 6, 2009

OK. Thanks.

That gives me plenty to play with.

Why does this spam get past spamcop

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived