Spamcop failing ? 5 of 10 in INbox were spam this

ob1db · March 15, 2004

This AM, 50% of what was in my Inbox were spam! There were also 60+ that made their way to my held mail, only one of which was not spam. That side is good , the Inbox is not acceptable. I was able, only 1-2 months ago, to just check my email from my Email client and expect under 5% spam. Recently it crept up to 20% and now 50%. Now I feel I HAVE to check my online Inbox BEFORE I can use Eudora to fetch it. A lot of wasted time here.

Yeah, I know JT loves to say that SC is blocking 80%. However, I increasingly find blatant spams in my Inbox. It has averaged 2 per session lately as I said (I get a LOT of email, so I check as many as 8-10 times a day) or 20% on average. 1-2 months ago it was under 5%. To me even 20% getting through is way too high. Am I alone in my thinking ?

I know the whole Bayesian filter experiment failed miserably. I am less than convinced that we were NOT doing something wrong in Bayesian for it to respond SO differently from every other SA story I have read or heard.

Any ideas what or why this apparent failure/breakdown is occurring? These are not particularly clever spams, very generic with piles of gibberish. Anyone else seeing this ?

I am too tired and discouraged to post them to .spam. If I hit 50% again, I will post the batch.

To me, it is getting more and more obvious: Spamcop mail filtering is failing. I am back to spending as much time filtering and forwarding as I was before I began this sorry saga. If there is not some major improvement, I think I will look into other filtering services, sigh.

David

(posted to NG and forum intentionally)

StevenUnderwood · March 15, 2004

The only spam I have had in my inbox in the past few days was a paypal phishing expedition that used a whitelisted paypal.com address. I don't get as much valid email as you (5-10 per day) and regularly have 100-125 held messages per day. My incorrect category percentage is WAY above 80%.

I use all of the bl's and SA set to 2 and whitelist of any sender that does get caught once (about 55 entries right now).

ob1db · March 16, 2004

I use ALL bl lists as well and Whitelist all incorrects...

See the NG, I am NOT alone, I am happy to say!

;-)

kre · March 27, 2004

As with all things AI, the bayesian perceptrons suffer from overlearning too. Usually the faster an engine learns, the higher the risk of overlearning. In mundane terms, the AI goes bonkers if you feed it to the overflowing

The description of CRM114(1) for instance says to only learn the incorrectly categorized email. This will be most of all at the start, and then getting fewer with each added correction. But sure, that's harder to do on the ISP side compared to integrating it in a mail reader.

Another thing that probably is not a good idea is feeding it with the input from the heuristics of SA. At best, this creates an engine that perceives near-identical to the rest of SA rules.

Lets see, I'll try to sketch a way I think it can be done. Be aware that I cannot try it this way because I don't run an ISP which does webmail. It depends on the ability to syndicate emails moved between folders on the webmail backend into the learning pipeline.

1. Have it set up as usual within SA, adding the BAYES_XX attributes as usual but *not* assigning any scores, neither direction.

2. On any learning attempt, learn only spam when it has a low bayes score, and only as ham when it has a high bayes score.

2. Learn only messages moved from inbox to held mail to the spam side.

3. Any message moved off the inbox somewhere else but held mail, learn as ham.

4. Again, 2. and 3. only when the bayes score would have said otherwise.

This will probably get a better distribution of spam vs. ham learned, because all spam already caught by the other SA rules is never learned. And why should you, the other methods work fine on these.

And it will minimize load caused by the learning process. It should drop over time, because if all goes well, the bayes attribute will get more accurate.

The amount of email which gets through to the learning process is also a good indicator as to when to start assigning scores to the bayes attribute.

I think it would be good to exclude the SA header lines when learning. And lines added by the webmail software, if any. The goal is basically having the input as authentic as you can make it will help with the results.

But I don't believe it's necessary to filter the spam report copies from the ham learning, at least my personal bayes had no problems with these. However, it might still be worthwhile because there's no real point in learning them. And learning costs both CPU and space in the hash maps.

(1) CRM114 http://crm114.sourceforge.net/ look for "TOE strategy - that is, Train Only Errors" in the FAQ

jefft · March 30, 2004

This AM, 50% of what was in my Inbox were spam! There were also 60+ that made their way to my held mail, only one of which was not spam. That side is good , the Inbox is not acceptable. I was able, only 1-2 months ago, to just check my email from my Email client and expect under 5% spam. Recently it crept up to 20% and now 50%. Now I feel I HAVE to check my online Inbox BEFORE I can use Eudora to fetch it. A lot of wasted time here.

We have no control over what percentage of your inbox is spam. Clearly, the less legit mail you receive, the higher the spam percentage would be. Taken at its extreme, if you got no legit mail at all overnight, then your inbox would be 100% spam. If, on the other hand, you got 10,000 legit messages and 100 spams, of which we didn't block a single one, your inbox would be only 1% spam. Would that be acceptable?

It's the blocking percentage that matters, not the percent of your mail which is spam. So, what percent are we blocking?

JT

NBlasgen · March 31, 2004

JT,

First off I love SC and could not live without it. It blocks 120 or so spams per day. But I do agree with OB1DB ... I have noticed an increase of spams in my inbox since about 3 months ago. I normally just thinks its a fluke, but after waiting a while it's true I now get double the spam I used to get. It's still small, only about 10 spams per day, but you used to be able to block 97% and now it's somewhere more like 93%. It's all very acceptable. I think my false-positives have droped to such a low number (much less than 1 in 1,000) that I don't even bother looking for false positives anymore. So that's an improvement, but the additional spams are annoying.

But whatever, I love your service. Just worth pointing out that something did change to increase everyones spam count (I have 3 accounts with you).

Wazoo · March 31, 2004

Just in defense of JT ... the increase of spam is everywhere. The infamous spam spew stoppage at Hotmail a few months back seems to have weakened. As a matter of fact, this morning got me one spam, did the complaint, and the scumsucking lowlife nailed me with another 4 of essentially the same spam to the same account within twenty minuts, all coming through different open proxies .... further ticked that another 4 of them arrived at other addresses within that same timeframe .... Point being, it's not just JT, SpamCop, etc ... it's happening everywhere ... and as noted before, most of these spammers also use the anti-spam tools to see how to get around them, just a way of their doing business.

ob1db · June 1, 2004

Something is TRULY failing in the filtering: I got an ad for Child Porno, blatantly labeled, that scored only 2.7 SpamAssassin. This week has been TERRIBLE: yesterday it was 8 of 16 in my Inbox (including www.fu**megay which scored 1.0, today 14 of 53 including the kiddy porn ad. I entered it in full here, it is very short. I am also posting to the forums, this is getting serious. Also posted in on the newsgroups. Very curious about Mr. Easter's suggestion of changing to Dspam from SpamAssassin.

Return-path: <jcahn[at]hcsattys.com>

Delivered-To: x

Received: (qmail 16779 invoked from network); 1 Jun 2004 10:05:13 -0000

Received: from unknown (192.168.1.101)

by blade1.cesmail.net with QMQP; 1 Jun 2004 10:05:13 -0000

Received: from fowl.mail.pas.earthlink.net (207.217.121.50)

by mailgate.cesmail.net with SMTP; 1 Jun 2004 10:05:12 -0000

Received: from robin-120.pocket ([10.4.120.65] helo=robin)

by fowl.mail.pas.earthlink.net with smtp (Exim 3.36 #1)

id 1BV68u-0002W2-00

for x; Tue, 01 Jun 2004 03:05:12 -0700

X-MindSpring-Loop: x

Received: from cae88-6-229.sc.rr.com ([24.88.6.229])

by robin (EarthLink SMTP Server) with SMTP id 1bv68S5J43NZFjX0

for <x>; Tue, 1 Jun 2004 03:05:09 -0700 (PDT)

Received: from hcsattys.com (hcsnotes.hcsattys.com [65.201.231.100])

by cae88-6-229.sc.rr.com (Postfix) with ESMTP id AFF0E12C8C

for <ob1db[at]earthlink.net>; Tue, 01 Jun 2004 05:38:21 -0400

Message-ID: <001101c447bc$30790fd4$83d97e3c[at]hcsattys.com>

From: "Tramping H. Milieu" <jcahn[at]hcsattys.com>

To: x

Subject: super new ! Child porno sites !

Date: Tue, 01 Jun 2004 05:38:21 -0400

MIME-Version: 1.0

Content-Type: text/plain

Content-Transfer-Encoding: 7bit

X-Priority: 1

X-MSMail-Priority: High

X-Mailer: Microsoft Outlook Express 6.00.2800.4682

X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1082

X-ELNK-AV: 0

X-spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on blade1

X-spam-Level: **

X-spam-Status: hits=2.7 tests=MANY_EXCLAMATIONS,PORN_4,X_MSMAIL_PRIORITY_HIGH,

X_PRIORITY_HIGH version=2.63

X-SpamCop-Checked: 192.168.1.101 207.217.121.50 10.4.120.65 24.88.6.229 65.201.231.100

CHILD PORNO!!!!

http://www.photo-angels.info/

http://www.forever-models.com/

http://www.ultra-models.com/

http://www.sweetest-teens.com/

http://www.tiny-models.com/

PeterJ · June 1, 2004

Two possible ways of catching this spam include bayesian filtering methods or the detection of known bad URIs (see my post here.)

Since bayesian filtering has already been hashed out with JT and we cannot expect a repeat soon, then (you) we are left with a few alternatives.

One is to implement bayesian filtering of your own to complement SpamCop's filtering (or replace it.) Another is to encourage JT to increase SpamCop's SA filtering ability by improving it in all ways possible except bayesian filtering. The most effective addition I can think of at this time to SpamCop's SA would be URI RBL checking. In my other post if you follow the link, one individual states that with 3 URI RBLS running it hits 50% of spam received. Extremely low FP rates are being seen with these RBLs as well.

Several products are available for free that will run bayesian filtering on IMAP folders for free. Two that come to mind are Spambayes and PopFile (IMAP support is young with the latter.)

On the possiblity of a switch to Dspam...

If I understand this product correctly it is a bayesian based filtering system, so my guess is that JT would be wary.

I am currently exploring the possiblity of not relying on SpamCop's filtering and simply running my own bayesian IMAP filter on a server that will running 24x7. Then training simply occurs when I move incorrectly classified messages with either my client or SpamCop webmail.

ob1db · June 15, 2004

From the newsgroup for JT's attention:

and people tell me our version of SpamAssassin is NOT broken ??

http://www.spamcop.net/sc?id=z519234739ze5...27d3d92cba050fz

also posting to web for Julian...

Title is " Children Porno " and includes body text of " child porno " and "hardcore".

This cannot be working properly, there is not even serious Bayseian text to explain this.

David

Dave_L · June 15, 2004

I don't think the presence of a few words such as those, by itself, should cause SpamAssassin to block an email.

Suppose you were asking someone about email abuse, such as in this case, and included those words. Then your email would be tagged as spam.

cliffski · June 16, 2004

spam cop has failed badly for me in the last week. I now get about 100-150 german spams a day, even though all my settings are at maximum. I used to get maybe 1 or 2 a day.

Whats with the german thing? cant spamassasin understand german or something? I get the same subjects and the same spoofed addresses again and again.

This is useless.

StevenUnderwood · June 16, 2004

I know you posted these questions elsewhere as well so I will probably repeat some of the things from there.

cant spamassasin understand german or something?

IIUC, the spamassasin configuration spamcop is using is not very fluid in that there is no effective way to train it for the number of users. It is using a set number of rules and those rules may not incorporate this type of spam.

I get the same subjects and the same spoofed addresses again and again.

By spoofed addresses are you saying IP addresses or email addresses. If email addresses, you could blacklist those "same spoofed addresses" so they would be directed to your held mail folder.

Are you reporting them to spamcop? Are these coming from the same IP addresses or are they reporting all over the place? If there are only a few IP's and you are reporting them, perhaps it is a directed attack against you and others are not getting these messages. That would explain them not getting blacklisted as one reporter can not blacklist an IP. You may be forced to contact the ISP directly and explain what is going on.

mplungjan@spamcop.net · June 20, 2004

Spamcop should silently delete all mails that have the following qmail identification

header GERMANSPAM MESSAGEID =~ /^<.*[a-z].*\.qmail\[at].*>/

describe GERMANSPAM Contains German spam

score GERMANSPAM 100

from

http://marc.theaimsgroup.com/?l=spamassass...95957623701&w=2

cliffski · June 20, 2004

i take it you mean they should but dont?

i still get about 500 a day. a friend of mine gets 10,000 a day.

this is rubbish.

StevenUnderwood · June 20, 2004

Spamcop should silently delete all mails that have the following qmail identification

This is a helpful suggestion that should be sent to support<at>spamcop.net.

Spamcop, however, as you probably know, does not delete spam, but moves it to the Held Mail folder.

Tim P · June 21, 2004

While the spam increases and the SA score decreases, I filter based upon the headers that SA adds and not on the SA score itself. i.e.

FORGED_YAHOO_...

FAKE_HELO....

HTTP_TO_IP...

PORN_

LINES_OF_YELLING

etc.

For that reason I am able to filter out all that junk based upon specific words in the header that SA inserts.

It works quite nicely for me.

StevenUnderwood · June 21, 2004

As stated previously, however, the spamcop webmail filters only work when you are logged into webmail. This is why they want the custom filtering be done by the server.

PeterJ · June 24, 2004

I think this topic is complicated and want to take some time to outline why. First we need to remember that people get mail accounts with SpamCop for a variety of reasons. The three reasons off the top of my head are:

a) To have an IMAP mail account

For ease of reporting spam

c) To utilize the spam filtering options provided with the account

SpamCop mail users likely weigh these three benefits (and others) of SpamCop mail differently. This means that some individuals may not care as much about filtering that SpamCop mail provides as others.

Most of the spam slipping through filters has been attributed to SpamCop's SpamAssassin implementation. Here are some points regarding SpamAssassin and SpamCop:

1) We are currently using SpamAssassin version 2.63 with no bayesian filtering (there was some talk about bayesian filtering in some of the posts in this thread that maybe implied it's use. It is not currently turned on.)

2) SpamAssassin requires tweaking over time to maintain its effectiveness by the incorporation of updated or additional rules. (One can argue that this is less important or even perhaps not necessary if you are also using SpamAssassin's bayesian filtering.)

3) JT has to ensure that any changes he makes to SpamCop's SpamAssassin implementation do not adversely effect the stability of the mail service. He also has to balance "false positives" and "false negatives." (That being said it is my opinion that on average he is conservative with regards to tweaking SpamCop's production install of SpamAssassin.)

4) When a new version of SpamAssassin is released it will obviously include many of the tried and true tweaks that SA administrators have been using with the previous verison. SpamAssassin 3.0 is near to being released, right now they have a 3.0 pre-release available, see my post in the lounge here: http://forum.spamcop.net/forums/index.php?showtopic=1892

5) Assuming that JT will continue to use SpamAssassin then he will no doubt move to verison 3.0 when it is officially released. SpamCop users of SA will no doubt find that version 3.0 will score spam more effectively than verison 2.63 and might need to adjust their SA # in their settings. Remember that over time even verison 3.0 of SpamAssassin will begin to lose its effectiveness if not tweaked.

I am an optimist, however. I think SpamAssassin 3.0 will make a lot of SpamCop mail users happy if JT implements it. I think JT does a great job managing the mail system and have no doubts that it is hard to balance the concerns that I have written about above. Everyone is going to have a different opinion as to whether they are getting what they paid for, as for me I have no doubt that I am getting a good deal.

The best thing we can do as mail users is provide feedback like this thread where multiple people are expressing similar concerns. Should JT add a custom rule to SC's SA so that the German spam is caught? I do not know, but he could.

One last thought: Maybe we need a FAQ entry for SpamCop Mail users, titled - "Why are these messages slipping past SpamCop's filters?" Granted the answer is complicated, but it could cover the basics regarding how to look at the headers to see what IPs were examined by SpamCop, what score SA gave it, and whether or not the user had it whitelisted.

PeterJ

Spamcop failing ? 5 of 10 in INbox were spam this

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived