Jump to content

Mixed-case "http" results in bogus error


Recommended Posts

There has been spam coming in with HTML bodies like this:

New Mortgag�‹ R�€�tes <A HrEF="hTtP://BIt.LY/10ghkBa?z">«heÃk This �€œ�št<Img sRc="hTTp://bIt.lY/146ZhAw?yfnqx/npcmy/mv/l/lj"><span size=""></span><style style=""></style><style lang="zkoeytvlqtkt"><font></style></font><font></font><big></big>

The parser looks at the <A> tag and for some reason throws an error about "Remove email parameters" and refuses to report the link:

Finding links in message body

Parsing HTML part

Resolving link obfuscation

hTtP://BIt.LY/10ghkBa?z

Remove email parameters: hTtP://BIt.LY/10ghkBa

However, simply changing the scheme from the mixed-case "hTtP", to consistent case "http" or "HTTP" allows the parser to correctly parse and report the link.

This appears to be a bug in the parser: it should be able to handle any text case in which the scheme might appear.

Link to comment
Share on other sites

<snip>

However, simply changing the scheme from the mixed-case "hTtP", to consistent case "http" or "HTTP" allows the parser to correctly parse and report the link.

<snip>

...You may wish to look at the following items in the SpamCop FAQ (links to which appear near the top left of each SpamCop Forum page):
  • "Material changes to spam" first paragraph
  • "Material changes to spam - Updated!" first paragraph (basically a copy of the last bullet item but contains additional content that may be of interest)
  • "What if I break the rule(s)?"

Link to comment
Share on other sites

Thanks John. Live bit.ly links broken. Please don't post spammer's links in public (not even re-director ones).

My apologies, I should have munged it.

...You may wish to look at the following items in the SpamCop FAQ (links to which appear near the top left of each SpamCop Forum page):

  • "Material changes to spam" first paragraph
  • "Material changes to spam - Updated!" first paragraph (basically a copy of the last bullet item but contains additional content that may be of interest)
  • "What if I break the rule(s)?"

You may want to look up the definition of "material change". Changing the case of "hTtP" to "http" is certainly not such a change by any definition of that phrase.

<_<

Link to comment
Share on other sites

<snip>

You may want to look up the definition of "material change". Changing the case of "hTtP" to "http" is certainly not such a change by any definition of that phrase.

<snip>

...This is SpamCop, which has its own definition of "material change" which may be different than what you or I might mean by "material change." :) <g> That definition is given, piecemeal, in various places in the SpamCop FAQ entry "Material changes to spam - Updated!" to which I referred earlier 84334[/snapback]. And given the fact that the various parts of the definition we have came out in pieces over the course of time, I would not be surprised to learn that we do not have the full definition.

...As far as changing the case of "hTtP," that might or might not run afoul of:

Do not make any material changes to spam before submitting or parsing which may cause the SpamCop parser to find a link, address or URL it normally would not, by design, find.
and
It is OK to delete content in order to reduce the size of the spam, as long as you don't alter what is left.
...Bottom line: where there's the least question, it might be best to get a ruling from the SpamCop Deputies at deputies[at]admin.spamcop.net or "SpamCopAdmin" Don D'Minion at Service[at]Admin.SpamCop.net.
Link to comment
Share on other sites

Do not make any material changes to spam before submitting or parsing which may cause the SpamCop parser to find a link, address or URL it normally would not, by design, find.
Thanks for the advice and concern, but the salient part of that quote, "by design", sufficiently clarifies it for this case: there's no way that it's "by design" that the mixed-case "hTtP" isn't recognized as a valid scheme (and in fact is misinterpreted as some kind of email parameter), whereas "HTTP" or "http" is recognized. It's very clearly a parser bug and therefore by definition not "by design". :)
Link to comment
Share on other sites

  • 2 weeks later...

...Looks that way to me. I just tried parsing the following spam I invented (with some invented headers I am not including here):

and the parser found only the last one.
Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...