Jump to content

SpamCop Can't Find Links in This HTML


cciu

Recommended Posts

I've noticed that SpamCop doesn't correctly parse the links in the HTML below. The dashes and long number following the </html> tag seems to trigger the problem. If I delete that before I submit the message, SpamCop properly parses the HTML and finds the link. With the dashes and number there, SpamCop claims that there are no links.

<html>

<head>

</head>

<body>

<p><font size="4">We carry all the prescription and non prescription drugs you

could want. From Valium, cialis, viagra and everything else. No prior

doctor prescription is required and we can save you up to 50% over your local

pharmacy. Give us a try by <a href="http://www.allnewdeals.com">

going here</a> and just have a look around.  You will get your medication

the next day and we can always refill them as you need. So

<a href="http://www.allnewdeals.com">come on in</a> and have a look

around. You will find what you are looking for.</font></p>

<p> </p>

<p> </p>

<p><font size="4">nostalgic allow cult screen cornet sundial china caesar baylor triplicate paternoster sleuth pharmaceutic gem bellman involuntary grieve percussive johannesburg henderson crotch assemblage twain valois veal poultry o'leary about scot chastity apathy balmy depose decedent baden bolshevist sanction beret bootstrapping hastings droopy catholic brook ricochet fisherman repression bombay refer steppe docile nil honda stir flintlock drizzle groom shepard asphalt decide whoever convulsion mist mt crt alps klaxon applied buttonhole cargill camelopard cheesy cysteine quartz minneapolis rodgers winkle judicature songbook yardstick desolate shaky contrive hyperboloidal fifteen commissariat </font></p>

remove http://www.offerspages.com/qog345/pr/rf.html

</body>

</html>

----911846668031164--

Link to comment
Share on other sites

Your sample was provided without the message headers, which would have contained the form of Content-Type: that the body was allegedly generated in. The line you specify is normally defined as a boundary line for a message in the Content-Type: alternative, as an example ... these boundary lines set the stage for the HTML, plain text, attached files, etc .... and again, without the actual headers to go along with this spam body sample, there's no way to even begine to conjure up a guess for your solution.

Then we move next to your e-mail apps and handling. OutLook / Eudora don't handle these boundary line items well, which has been much addressed over the last year .. so are you using either of these products?

Link to comment
Share on other sites

Sorry, here's the header. I'm not using Outlook or Eudora, I use FirstClass from Centrinity.

Return-Path: <qpnzdsmwnvioxf[at]msn.com>

Received: from blue.cciu.org (204.108.129.102)

by mail.cciu.org (FirstClass Mail Server v7.1) with ESMTP

(Sender: qpnzdsmwnvioxf[at]msn.com)

transient id 16359; Sat, 15 May 2004 00:59:29 -0400

Received: from 201.7.79.4.ibest.com.br (201.7.79.4) by blue.cciu.org with

SMTP (Eudora Internet Mail Server 3.1.3) for <sales[at]demillion.com>;

Sat, 15 May 2004 00:59:22 -0400

Received: from 173.175.42.184 by 201.7.79.4; Sat, 15 May 2004 05:00:45 -0100

Message-ID: <DBGZJJGHKHWHEZHDSOOU[at]yahoo.com>

From: "Gale Byers" <qpnzdsmwnvioxf[at]msn.com>

Reply-To: "Gale Byers" <qpnzdsmwnvioxf[at]msn.com>

To: sales[at]demillion.com

Subject: nobody is here anymore

Date: Sat, 15 May 2004 09:00:45 +0300

X-Mailer: AOL 4.0 for Windows US sub 161

MIME-Version: 1.0

Content-Type: multipart/alternative;

boundary="--911846668031164"

X-Priority: 1

X-MSMail-Priority: High

X-IP:178.40.150.96

----911846668031164

Content-Type: text/html;

Content-Transfer-Encoding: quoted-printable

Link to comment
Share on other sites

X-IP:178.40.150.96

----911846668031164

This looks exactly like a spam reporting issue already discussed today over in the newsgroups. The issue is the lack of a blank line betwiin the header and the body of the supplied spam sample. Are you the same poster, is this the same spam, are you needing something other than what's already been disucessed, or are we supposed to assume some kind of miracle in the timing of the same specific issue (and off the top of my head, the same spam)?

Link to comment
Share on other sites

Sorry, not me. I'll head over there and check it out though. It's not surprising: I've gotten over 20 of these types of messages, so it's likely that other people are experiencing the same bug in SpamCop's HTML parser.

FWIW, it's not a lack of a line between header and body, as I'm using the separate header and body submission form.

Also, removing the line with the dashes and number in it at the end resolves the problem, without changing anything with blank lines between header and body.

I still say the parsing engine should see a link in there, regardless of any silly things that the spammers do to try to make it non-standard.

Link to comment
Share on other sites

FWIW, it's not a lack of a line between header and body,

I can only go with what you posted, and that's the very first thing I see that's a problem. Those two lines od your sample are where the blank line should be.

as I'm using the separate header and body submission form.

Also, removing the line with the dashes and number in it at the end resolves the problem, without changing anything with blank lines between header and body.

Though it may work, it is in violation of the letter of the "don't maipulate the spam" rule and could possible fet you into some hot water .. would be better to figure out why thw separating blank line is being dropped .... though a better method may be with the cut/paste operation to get the header and body separated correctly, but then there'd be the question as whether the two-part form would still work, as it would then appear that the boundary line data would exist correctly, and it was the lack of boundary lines in the Outlook/Eudora appsthat caused the two-part form work-around.

Link to comment
Share on other sites

I wasn't understanding what you meant originally. Some reading of the postings in the NNTP resources, as you suggested, brought the solution.

I didn't realize that the "header" (as reported by my FirstClass email system) was including what is considered the body of the message.

I got some new versions of similar messages today, and by putting a blank line between the following lines:

X-Mailer: AOL 4.0 for Windows US sub 161

MIME-Version: 1.0

...it works like a charm, and doesn't require changing the body in any way.

Looks like I've got a feature request for FirstClass to work on: getting that blank line right.

Thanks for pointing me in the right direction.

FWIW, I didn't read the NewsGroups first because the "SpamCop forum" page (http://members.spamcop.net/help.shtml) suggests that the new web-based forums are the new, improved, and preferred way to get help. I'll look at Newsgroups anyway the next time!

Link to comment
Share on other sites

OK, glad you got it working, but technically, you're picking the wrong lines ..

your sample:

X-Mailer: AOL 4.0 for Windows US sub 161

MIME-Version: 1.0

Content-Type: multipart/alternative;

boundary="--911846668031164"

X-Priority: 1

X-MSMail-Priority: High

X-IP:178.40.150.96

----911846668031164

Content-Type: text/html;

should look like this;

X-Mailer: AOL 4.0 for Windows US sub 161

MIME-Version: 1.0

Content-Type: multipart/alternative;

boundary="--911846668031164"

X-Priority: 1

X-MSMail-Priority: High

X-IP:178.40.150.96

----911846668031164

Content-Type: text/html;

Adding the blank line where you stated does in fact make it look like a jacked up Outlook/Eudora header, i.e. missing the boundary line specification ... so, yes, it works (per your staement) ... it's just not technically correct ... and let's hope that FirstClass knows this <g>

Link to comment
Share on other sites

Adding the blank line where you stated does in fact make it look like a jacked up Outlook/Eudora header, i.e. missing the boundary line specification ... so, yes, it works (per your staement) ... it's just not technically correct ... and let's hope that FirstClass knows this <g>

Oops, ok, got it now. I haven't yet reported it to FirstClass, so I'll be able to give them the correct info.

Thanks again for your expertise and help in straightening out what's truly a header and what's not.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...