Jump to content
Sign in to follow this  
lb6

Parsing HTML part error

Recommended Posts

Steven,

That URL you're giving is at "members.spamcop.net" instead of the usual "www.spamcop.net" -- I, for one, can't get to it.

17247[/snapback]

Just change "members" to "www" and it will work just fine.

/john

Share this post


Link to post
Share on other sites

I have modified my messages above to fix that problem.

You are correct in that I did not have the information you were using forward/noheader.

As I mentioned, my configuration can not send messages to the internet, so I could not submit as you did. I was approximating as best I could to try and help you, so I did do a copy/paste. Unfortunately, I have no way to create a MIME message here. I was only asking the question about forwarding other MIME based messages as I don't know how many MIME based messages you receive. I do not receive many here at work, plenty at home.

John, you had not differentiated yourself from Adam prior to these messages. I do not have the experience with VMS mail that you do as I do not use it as my primary application. All my VMS accounts simply forward to my primary email application and as we know, VMS just works, I don't need to fiddle with it.

Groups like this can be a tremendous asset if enough information is presented. My first experience with the comp.os.vms newsgroup persuaded me to want to help people with their computer problems until I ended up here.

Speaking to your problem here, the spammer has generated a non-RFC compliant message which spamcop will currently not process. You can report this to the deputies<at>spamcop.net and if this technique becomes widespread, modifications can and will be made to the parser.

Share this post


Link to post
Share on other sites

I'm a bit skeptical about what's going on here. Two out of those three examples were indeed missing the crucial blank line between headers and body, and I'm pretty sure that none of my spam comes to me formatted that way (and I get a lot, at a lot of different addresses). I'd still suspect something with the VMS system, despite the OP's insistance to the contrary.

DT

Share this post


Link to post
Share on other sites
The blank line is not "disappearing."  Nor is the Priority line (Priority is a

valid MIME header) having the "[" added to it.

The spammers are SENDING the mail exactly as shown.

Though many have said the same thing here and in the newsgroups over the years, I nust admit to apparently being lucky enough to have simply never received one of these myself to have to try to struggle with it.

I do NOT report email from this application as it is internal only at my site and does not receive spam. It also can not send messages to the internet.

Apologies for reading into this. Though also noting the different versions involved that may also lead to some confusion / differences in things.

This is why I usually don't bother with these kind of forums. You guys don't know me from Adam, and you assume that I'm wrong. It's a bit insulting, but it's the way these forums work, so I bear no malice towards you for your skepticism.

From my perspective, I was only looking at data provided, so not sure where your conjectured assumptions would be coming from. I don't even recall anyone saying "you are wrong". but, again, interpretation and perception of the written word is always biased by expectations.

Now that you've explained your background, the in-depth knowledge of VMS and e-mail, and have explained that there is absolutely no way that the spam could have been mis-handled, manipulated, screwed-up, or adjusted .... I am still going to state that the reason the SpamCop parser choked on your sample spam submittals is that there is a missing blank line between the header and the body of the spam. On how it happened, I will bow to your expertise. I'm just pointing out the problem.

Share this post


Link to post
Share on other sites

Would someone care to educate a dummy like me.

My understanding of an email message is that it is made up of two parts.

1) the headers

2) the body

What is it that separates the header from the body (forget about SpamCop at this point in time)

How do email programs know where the headers end and the body begins?

Getting back to SpamCop that answer is simple, there has to be a blank line.

Any thing above that first blank line is considered headers, anything below it is considered body.

If other systems use the same method, how would they ever process a message where the spammer had removed that line.

Since our expert is saying that the message is exactly as sent (no blank line) between the header and body how did his system separate the two?

Probably a stupid question, but since I never see the raw data before it is actually received by the mail program, I have no why of knowing, not that I have any need to know. So feel free to ignore the question.

Share this post


Link to post
Share on other sites
What is it that separates the header from the body ...

... that answer is simple, there has to be a blank line.

...

If other systems use the same method, how would they ever process a message where the spammer had removed that line.

...

Since our expert is saying that the message is exactly as sent (no blank line) between the header and body how did his system separate the two?

17295[/snapback]

The first blank line ends the headers and begins the body. Indeed, if there had been no blank line at all, any mail system would have considered the whole message to be headers.

MIME makes this a little more interesting. There are a lot of mail systems out there, and they may try to do various different things to be forgiving of badly formatted mail. Or not.

The VMS mail system has no clue at all about MIME. In fact, it cares very little about most headers. It wants to find the From:, To:, Subject:, and CC: headers, and maybe the Reply-To: header. But only to put that information into its own internal-only headers. Otherwise, as far as it is concerned, the message is just a message, and it really doesn't give a hoot about what are headers and what is body. With one exception. If you have selected the option of causing the headers to be displayed at the bottom of the message, it will search for the first blank line, and display everything after that blank line (including any MIME headers in the body) first, and then insert a line at the bottom

================== RFC 822 Headers ==================

And then the headers before the first blank line. But that's only if you select the "put headers at bottom option."

Let's first discuss the two messages where the blank line before the first set of MIME headers is missing. Since VMS doesn't care about MIME, it has no problem with the message, and it just displays the text. I imported the message into Netscape Mail to see what Netscape would do with it. Netscape displays only the headers. The message body is completely blank. Why? Well, think about what many MIME messages look like. There are headers at the beginning (among the RFC822 mail headers) that indicate that we've got MIME. One of these might say that the message is a multipart message. If so, then the text which appears after the headers and before the first set of MIME headers in the body is ignored by most mail readers that understand MIME. There is often some text there which advises you to get a mail reader that supports MIME.

So it's quite possible that you've received a message like this, and just thought you had a blank message. Or maybe not. After all, although I've received three like this in the past two weeks, this has not been a big problem among the other 2700 messages since I started reporting spam through SpamCop.

Now, the other one of the messages, the one with the brackets and spaces around the "Priority:" header.

Again, VMS doesn't care. The first blank line is the end of the headers and the beginning of the body, but for VMSmail that's pretty much a "so what?" If I import the message into Netscape mail we find that the funky bad header is simply ignored, as though it wasn't there at all, and the message is displayed "properly". However, since the message is declared to be "multipart/alternative", if you select "View Message as PlainText", you get a blank message, since the alternative part is missing.

Hope this has been useful.

/john

Edited by jrcovert

Share this post


Link to post
Share on other sites

John, Thank you.

It will take a while for all of that to sink in.

Share this post


Link to post
Share on other sites

Hello--

Is there an equivalent of "RTFM" or "RTFFAQ" for previous posts to a

forum? :) Anyway, I did read all the recent posts I could find on

the subject, so I hope this is not redundant. Most posts referring to

this problem appear to involve spam that has been downloaded into

users' email programs, then reported using forwarding and/or cut

and paste into SpamCop web forms. I am getting the same problem

using only the SpamCop webmail system; that is, I log in to

webmail.spamcop.net, then click the "Report spam" button, log in to

mailsc.spamcop.net, click the "Held Email" tab, select messages and

"Queue for reporting (and move to trash)" and click "Release / Delete

selected messages," then click the "Report spam" tab and start grinding

through the stack. No email clients or non-SpamCop servers are involved;

for the record, I'm using Safari 1.2.3 on Mac OS X 10.3.5. As an example,

here is the tracking URL for one recent spam. I estimate that three

to five percent of the spam I get is subject to this problem; I hope

this helps the gurus here reconfigure the parser to cause more pain

to spammers!

Best regards,

Mark Looper

Share this post


Link to post
Share on other sites
... noting that there isn't anything stopping you from manually sending your own complaints.

19323[/snapback]

Thanks. I am not 100% certain how this is to be done; searching previous posts, I found directions as to how to get SpamCop to tell me the email addresses for a URL that I want to report manually. I gather that I can then put these addresses in the "Re:User Notification" box of the reporting page? Comma separated? Is there a limit on how many I can put in that box?

I didn't find this in any of the FAQ or other help documents. Perhaps it might be useful to add some such instructions to the FAQ page that shows up as a link for More information on this error.. when I get one of these "error: couldn't parse head" messages (e.g., in this tracking URL). (By the way, toward the end of that FAQ page is the statement "I cannot emphasize enough that this is not a trick by spammers to 'fool spamcop'. It is an error introduced by the recipient (you) when copying or submitting email to spamcop." From the post to which you [Wazoo] linked in your message, I gather that this is no longer the case.)

Many thanks!

Share this post


Link to post
Share on other sites

... and yes, I'm aware it's easier for me to suggest changes in the FAQ and documentation than it is for all y'all to actually make those changes!

19959[/snapback]

Share this post


Link to post
Share on other sites

I'm obviously too tired to be typing stuff in here ... I got lost a couple fo times trying to follow your flow ... but, going to try anyway ...

Thanks.  I am not 100% certain how this is to be done; searching previous posts, I found directions as to how to get SpamCop to tell me the email addresses for a URL that I want to report manually.  I gather that I can then put these addresses in the "Re:User Notification" box of the reporting page?  Comma separated?  Is there a limit on how many I can put in that box?

Once upon a time, this was pretty much unlimited. The it was squashed down to only 200 characters. The last complaints I've seen seem to agree that it's now limited to five addresses.

I didn't find this in any of the FAQ or other help documents.  Perhaps it might be useful to add some such instructions to the FAQ page that shows up as a link for

Got me there, as I'm not a Member ... switched the "mailsc" to "www" .... That FAQ was originally written in the good old days, unfortunatly all the attributes seem to have disappeared when the web pages were updated .. anyway, it could have been written by a small handful of folks (not me) .. and then recently, Courtney (IronPort staffer) went through and updated a number of the www.spamcop.net FAQ pages (in addition to doing the web page coding itself) ... The catch is that this FAQ doesn't address the "missing blank line" issue, talking of badly wrapped long-lines and the Outlook/Eudora MIME-Boundary line problems. The problem is that to make an addition to the FAQ on the "missing blank line" problem, it has to be identified as to what's causing it / where it's coming from .. that hasn't happened yet.

More information on this error.. when I get one of these "error: couldn't parse head" messages (e.g., in <a href='http://www.spamcop.net/sc?id=z690740840zff06a300ca556306a3c88c33b28b62acz' target='_blank'>this tracking URL[/url]).

Your spam sample has a couple of oddities / problems. The first is a wayward line "Wed, 10 Nov 2004 04:44:19 +0300" stuck in the middle of things ... I don't even see anything close that it could have been accidentally stripped/copied/dropped from .. it's just "there" ....

The next issue is that the header states Content-Type: text/html; charset="us-ascii" ..... but the spam is offered up in plain-text. From the parser viewpoint, it's not known as to whether the spam was made that way, something happened in the handling, Outlook/Eudora was involved, or the user just screwed up ... so, the decision is to make the safe call and not "play" with the "bad" data. Though I don't believe that there is "one" existing SpamCop FAQ that totally disects e-mail construction, a lot of the specific stuff is found through them, through the many discussion already existing, and I'm sure that I added in a number of "other than SpamCop" web-pages for more data in the FAQ found here in the web-Forum.

(By the way, toward the end of that FAQ page is the statement "I cannot emphasize enough that this is not a trick by spammers to 'fool spamcop'. It is an error introduced by the recipient (you) when copying or submitting email to spamcop."  From the http://forum.spamcop.net/forums/index.php?showtopic=2927&st=0&p=19215entry19215' target='_blank'>post[/url] to which you [Wazoo] linked in your message, I gather that this is no longer the case.)

Ummm again, that specific FAQ deals with the Outlook/Eudora experience, the specific commentary you are pointing to is describing badly handled long-line wrapping .. and thus, the statements are still accurate. The catch is, I don't see "my reference" to that FAQ in the Topic/Discussion you point to .. and that disussion was about the "missing blank line" problem.

While researching something else dealing with the IMP/Horde application, I did come across some interesting "items" that I thought may have had some influence on this missing-line thing, but JT tells me that they weren't the problem or solution. So we're still back to only a few seem to gather these missing-blank-line spams and the common vector is still unknown, other than the conjecturment that it's a certain spammer or spammer software app that's involved.

I'm hoping there's an answer in there somewhere <g>

Share this post


Link to post
Share on other sites

I just received an email with the added space, which triggered the parsing error:

http://www.spamcop.net/sc?id=z690907301z6e...c1af85b8fde186z

MIME-Version: 1.0

Content-Type: multipart/alternative;

boundary="--6940975102924587"                                                                                                                                     

[                                                             Priority: Normal                                                                                                       ]

X-spam-Checker-Version: SpamAssassin 3.0.0 (2004-09-13) on blade4

X-spam-Level: ************************

X-spam-Status: hits=24.5 tests=FORGED_MUA_OIMO,FORGED_OUTLOOK_HTML,

FORGED_OUTLOOK_TAGS,FROM_ENDS_IN_NUMS,HTML_40_50,HTML_FONT_INVISIBLE,

HTML_IMAGE_ONLY_08,HTML_MESSAGE,LONGWORDS,MIME_BOUND_DD_DIGITS,

MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI,MPART_ALT_DIFF,RCVD_BY_IP,

RCVD_DOUBLE_IP_LOOSE,SARE_URI_PILLS,URIBL_AB_SURBL,URIBL_OB_SURBL,

URIBL_SBL,URIBL_SC_SURBL,URIBL_WS_SURBL version=3.0.0

X-SpamCop-Checked: 192.168.1.103 216.148.227.84 68.73.148.74

X-SpamCop-Disposition: Blocked bl.spamcop.net

I took out the spaces, resubmitted and got this:

http://www.spamcop.net/sc?id=z690916526z38...33e7017ee637e5z

this email came from my comcast account and is forwarded directly to my cesmail acct, so I don't know if the space was caused by Comcast or by the spammer.

bp

Edited by btech

Share this post


Link to post
Share on other sites

It appears that Julian finally made this item appear at the top of the "to-do list" .... noting the results of someone else's query over in the newsgroups, I see this "new and interesting" line in the parse results;

spam Header

No blank line deliniating headers from body - abort

Problem is now spelled out .....

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×