mMerlin

couldn't parse head but really caused by multipart in body

5 posts in this topic

I have had a few spams recently (windward casino) that spamcop gave error message:

 
Quote

 

error: unexpected end of header error: couldn't parse head
Message body parser requires full, accurate copy of message

 

at the end of "Finding links in message body"

I did some exploration, and discovered that the problem seems to be the way the spam emails are constructed. They are multipart/alternative, and the alternative header blocks are ending with a line containing a single space.  Deleting that space in the body of the content lets spamcop parse the body correctly.

--Section.«guid»
Content-Type: text/plain
 
Winward Casino: US Players, …

That *blank* line is really a space. Deleting the space character allows the "Finding links" to work again.

Example: https://www.spamcop.net/sc?id=z6408256844z01c6f10262d93f7fd6bc56e589ea4e33z

I understand that the original purpose of that message, was due to an error in the user handling the copy/paste/forward of the emails in such a way that email header was no longer accurate.  However, in this case, the "couldn't parse head" is really the multipart header in the body of the message.  That was not really clear from the context.

Edited by mMerlin
add reporting link

Share this post


Link to post
Share on other sites

Moved from "Reporting help" => "Routing/Report address issues"

Other post in the sub forum identify corrected addresses for given IP/IP blocks.

Share this post


Link to post
Share on other sites

Maybe the initial post was not clear enough.  The post was because SC was not parsing the body of the spam email, not because it was get the wrong reporting address.  Unless can not find any links is the same as wrong address.

What I see is that the post has been moved **to** "Spamcop Reporting Help", while I thought I had initially put it in the "Routing/Report address issues".  Which is the reverse of what @Lking seems to say.  Neither seems to be quite correct, but I could not find a place to report that the parser is (partially) unable to handle an email that was submitted with full content.  And with a trivial solution.

Once the start of an alternative block is detected by the parser, trim trailing white space until the end of the header (for the block) is found.  That should allow this type of email content to be parsed, without breaking anything else.  Given the messages I got, I assume that the parser currently does not see that line with a space as the end of the header, and keeps reading more lines expecting more header content.  Which then fails.  But it seems to work just fine in gmail.

Share this post


Link to post
Share on other sites

Did you see/follow the link More information on this error.. ?  Some SC philosophy handling miss formed/formatted email/spam is provided. 

What you suggest may seem to be trivial, but if you read many old post, you will see this forum is full of "trivial" fixes suggested for one or another intentional trick or dumb error that trips up the parser. The question becomes 'how much time should be spent tracking errors made by spammers or their software tools?' 

Gmail has the objective to generate a display for each email a client receives no matter how god-offal (people would scream if 'trailing white space' caused no display).  If on occasion they get one wrong, nothing is lost and they can claim 'The email is wrong but we tried.'  On the other hand, if SC tries to correct an error and the effort results in an incorrect accusation, the credibility of SC is lost.

I might also suggest "New Feature Request"

 

Share this post


Link to post
Share on other sites

I already saw and looked at the more information page.  As far as I can tell, that is talking ONLY about mangled email headers, usually because of dropped leading whitespace.  It does not mention anything about the in the body section headers, and the content is not being mangled.

The 'trivial' solution was explicitly qualified with only trimming the trailing spaces once an alternative block header start was detected, and only doing it until the end of the block heading was found.  IE, until the first blank line (after trimming) was found.  That was intended to cover don't break other things.

To be even safer, that could be done only as a second pass, when the first pass fails to parse (per that error message).

Tracking *errors* made by spammers or their tools may not be useful effort, but handling parsing of things that show as valid, usable content in an email client does seem appropriate.  Otherwise those tools will eventual evolve (survival of the fittest) to versions that our tools fail to parse and report.  That is already true from the number of links I have checked out that simply point to a redirector.  Several levels of redirection sometimes.  And only the first gets reported.  Unless the abuse handling team does the extra work to follow up on the chain, instead of just killing the reported link page.

As far as feature request, I would start with making the error message clearer for this case, and pointing to a more information page that has some relevance to the actual error.  The suggested trivial fix would be separate, since that change might not fix every *error* that might cause the link finding to fail.

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now