Jump to content

Multipart parsing


ankman

Recommended Posts

It seems that Spamcop fails parsing URLs in Multipart spam.

Since about a week or so if there is Multipart declared in the headers and the text/html with some URLs in the body Spamcop fails to see the URLs.

Is there a problem?

Link to comment
Share on other sites

Yes, I'm seeing exactly the same thing and noticed as you did that the problem is quite recent. The ones I'm getting are all multipart/alternative and appear to be properly structured. The text/plain part has the URL in square brackets and the text/html part has the same URL in an anchor tag. And Spamcop can't see either one! Latest example: https://www.spamcop.net/sc?id=z6226526475zb4f37537206bffda67da5ca9c9f6bae9zGetting 50 or so of these per day. The host in the URL is always different and looks to me as if in every case, the host has been hacked and a php scri_pt planted on it (model.php in this example) to which a complex argument it passed. Because of this, I think it is extremely important that the spam be reported to those responsible for the IP range including the hacked host. This is a serious bug.

Link to comment
Share on other sites

You are correct. We've seen the same problem -- where "spamvertised" sites have been included in the reporting for over a decade, Spamcop has stopped recognizing the links.

I believe it's a glitch, or bug in the parser.

They couldn't have done it intentionally because they know it's just useless to report only the sending emailer.

If they did, in fact, do it intentionally it's probably some political pressure because some of the "big boys" are annoyed by getting so many reports. Enom is a ring leader.

Now that AMAZON is implicated in so much cyber crime activities, perhaps they brought their huge hammer down on Spamcop. We just don't know. But you can bet, both Google and Amazon are making a pile of money off the spam, cybercrime and cyber terrorism industry they don't dare hinder it. Now Microsoft is in the stew, and a percentage of the spammers are using the Microsoft Cloud. It's a hit for cyber criminals because Spamcop and similar automated reporting services recognize the 'cloud' as "No Master" ... so there's nobody to report to.

If you Manually dissect the headers and links you'll find out exactly where it's coming from, right down to the machine IP address. Unfortunately, the "cloud" just renders "No Master" or large spam-friendly hosts like www.softlayer.com or true cybercrime cartels like Enom.

You should go and check in to www.Knujon.com and begin reporting spam to them. They will accept a mailbox text file upload, so it's not a matter of reporting each spam. They parse and database "bad" web sites, hosts and registrars, and then petition through ICANN by identifying rogue registrars and criminal cartels. This is not necessarily shutting down spammers in the trenches -- but in the bigger picture, they're getting the huge cartels identified and squashed. . . . . that's http://www.Knujon.com/

thanks for reading.

Fred

Link to comment
Share on other sites

Guys, I traced down the bug to the quotation marks in the Content-Type header. If the boundary (or any other optional part of the Content-Type header, like charset) has its value enclosed in double quotes, Spamcop fails to parse it correctly and hence doesn't find the boundaries in the mail's body (probably the quotes are included in the boundary string, which is wrong).

This is a bug that someone fiddling in the Spamcop code must have introduced recently.

Using of quotation marks in the Content-Type header is allowed per RFC 2045, section 5.1 "Syntax of the Content-Type Header Field"

https://tools.ietf.org/html/rfc2045#section-5.1

If the boundary string does not contain special characters like spaces, brackets or colons etc. (called tspecials in the RFC), the double quotes can be omitted; just remove them before submitting the spam and the parser again finds the links in the body...

--

Johannes

Link to comment
Share on other sites

In case you're not following this other thread http://forum.spamcop.net/forums/topic/16624-all-spams-lately-get-no-links-found, j-f/Johannes has just found the problem: In ALL the spams I'm getting that have this problem, in the initial Content_Type: multipart/alternative; the boundary string is given with double quotes around it like this:

boundary="b1_b421a616f2a7415e9edb2c535efad9b4"

These are unnecessary (though legal) in the string above, and are causing the Spamcop parser to screw up. Remove those and Spamcop finds the links!

Link to comment
Share on other sites

Guys, I traced down the bug to the quotation marks in the Content-Type header. If the boundary (or any other optional part of the Content-Type header, like charset) has its value enclosed in double quotes, Spamcop fails to parse it correctly and hence doesn't find the boundaries in the mail's body (probably the quotes are included in the boundary string, which is wrong).

This is a bug that someone fiddling in the Spamcop code must have introduced recently.

Indeed. Removing double quotes around boundaries makes multipart elements visible and parseable again (thanks for that, Johannes).

By the way IMHO the bug is a little deeper.

There are also problems with double quotes in link parsing with quoted-printable encoding.

Something like

<A href=3D"http://spam.site.com">here</A>

gets parsed as:

Resolving link obfuscation
   http://spam.site.com&quot;&gt;here&lt;/A&gt
   chopping username "spam.site.com">here&" from URL: http://lt;/A&gt

Maybe a library upgrade has broken something in the parser?

Link to comment
Share on other sites

Is removing the quotes permitted, even if it's an obvious bug in the parser?

SpamCop does what it does and doesn't do for a reason. Do not make any material changes to spam before submitting or parsing which may cause SpamCop to find a link, address or URL it normally would not, by design, find.

https://www.spamcop.net/fom-serve/cache/283.html

Dave, I think the answer to your question is 'No'.

Link to comment
Share on other sites

  • 2 weeks later...

Just in case anyone following this had not heard, or noticed, this bug has been fixed. Parsing of these multipart messages is now succeeding consistently. The boundary string definitions using double quotes are no longer causing any problems.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...