Jump to content

Spamcop can't parse date - I think it's the tz abbr


jtrigg

Recommended Posts

In this attempted spam report: http://www.spamcop.net/sc?id=z1026004121zb...094a962512b2fcz

Spamcop gives the following error:

Can't parse date of spam for age detection: Wed, 09 Aug 2006 17:13:58 CEST

Message is old

That date appears to come from a legitimate Received header (and the spam appears to have been submitted by webmail via Yahoo):

Received: from [196.201.68.86] by web27104.mail.ukl.yahoo.com via HTTP; Wed, 09 Aug 2006 17:13:58 CEST

and web27104.mail.ukl.yahoo.com is a valid Yahoo server.

Link to comment
Share on other sites

I'm just going to copy over Mike Easter's response to a similar 'complaint' from the newsgroups.

Path: news.spamcop.net!not-for-mail

From: "Mike Easter"

Newsgroups: spamcop

Subject: Re: More spam with unparseable dates

Date: Wed, 9 Aug 2006 09:53:45 -0700

Message-ID: <ebd3tv$ph7$1[at]news.spamcop.net>

References: <ebd2dm$o08$1[at]news.spamcop.net>

Bert Hyman wrote:

> OK, I'm starting to receive more spam with unparseable dates.

> It's either a strange coincidence, or a ploy.

> Either way, will the scanner be fixed?

>

> Here's one:

>

http://www.spamcop.net/sc?id=z1025695131z0...b67d8549dcee93z

>

> Can't parse date of spam for age detection: Wed Aug 09 09:40:32 UYT

> 2006 Message is old

It never has been clear to me [before, maybe more so now] why the

parsing algorithm for mailhosted accounts was configured to determine

the timestamp differently than it does for non-mailhosted accounts, but

that is a major factor in this problem. So one workaround is to parse

such items with a non-mailhosted account.

The business about 'fixing' the algorithm to be able to recognize the

non-compliant timestamp in the format of the obsolete and 'unacceptable'

local ambiguously expressed timezone seems to me to be an unrealistic

expectation.

A server has an 'obligation' to stamp its tracelines properly. There

are many many many servers out there in the world which might be

non-compliant in a myriad of ways. The algorithm should be configured

to derive its timestamp from 'up at the top' like it does for

non-mailhosted accounts. Here's how we guess why it doesn't.

If one were to guess at the 'logic' of the mailhosted account's

determination of the timestamp, considering that the logic of the

non-mailhosted account's timestamp is based on 'the first good Received

traceline' -- where 'good' means a traceline which the algorithm is not

configured to ignore, such as one with a non-routing IP ---

-- then it would appear that the algorithm is 'ignoring' all of the

lines of the mailhost before it starts to look for the first good

Received traceline.

That logic would have to be changed, which is problematic. I suspect

the logic is based on first determining whether or not a submitting

account is mailhosted or not, and then proceeding down the mailhosted

algorithm if it is, and down the nonmailhosted algorithm if it is not.

The nonmailhosted's algorithm came first, so the mailhosted algo is

'based on' ie derived from the first. So it appears that the first

thing that happens is that the mailhosted algo 'subtracts' the

mailhosted part and then it begins the kind of operation which was

originally programmed for the nonmailhosted.

As a result of that sequence, all of those 'upper' timestamps [and

tracelines] are chopped off before the parsing and age determination

logic begins.

Or, at least that's my guess.

--

Mike Easter

kibitzer, not SC admin

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...