Another parser challenge, but easier - hex IPs

Cedders · September 9, 2008

I've sent this to deputies[at] but thought it's worth reporting here too.

SpamCop's not dealing with hex-obfuscated URIs (found in the current IRS phishing) as well as might be expected, e.g.:

Tracking link: http://0x7C.0xA.0x7F.0xA4/Internal.Revenue...refund-form.php

No recent reports, no history available

Cannot resolve http://0x7C.0xA.0x7F.0xA4/Internal.Revenue...refund-form.php

It's attempting to resolve something that is clearly not a domain name, because the TLD begins with a digit.

Really a report should go to the reporting address for 124.10.127.164, i.e. spam[at]anet.net.tw

Note that 2081062820 and 0x7C0A7FA4 (different ways of parsing the same IP address) are parsed correctly.

Wazoo · September 9, 2008

Tracking link: http://0x7C.0xA.0x7F.0xA4/Internal.Revenue...refund-form.php
No recent reports, no history available

Cannot resolve http://0x7C.0xA.0x7F.0xA4/Internal.Revenue...refund-form.php

It's attempting to resolve something that is clearly not a domain name, because the TLD begins with a digit.

No Tracking URL provided, so all that can be said 'here' is that the data you present is out of context. The most glaring question would be what the headers and/or MIME-Boundary definition lines define this embedded stuff to be.

Cedders · April 10, 2009

For info, the problem still exists. If there is any SpamCop development going on, it is worth the time reporting bugs or feature requests if we know where they should go - but if there is very little happening, maybe we should know. I'm not interested in discussing stuff just for the sake of it, unless it helps the SC developers.

No Tracking URL provided, so all that can be said 'here' is that hte data you present is out of context. The most glaring question would be what the headers and/or MIME-Boundary definition lines define this embedded stuff to be.

I think you misunderstood. Perhaps I presented the issue too tersely. It's not "embedded stuff" - the quote was SC output.

Actually what could be said from my evidence is that there is a way of presenting a link that browsers understand but SpamCop doesn't, and that is something it would be good to fix. But I suppose the context might make it clearer:

Return-Path: <refund[at]irs.tax.gov>

Delivered-To: somewhere[at]munged.org

Received: from qbsrv011.QBASCO.COM (mail.qbasco.com [61.219.55.141])

by munged.org (Postfix) with ESMTP id 136A814C7F9

for <somewhere[at]munged.org>; Tue, 9 Sep 2008 17:13:30 +0100 (BST)

Received: from service ([67.59.48.131] unverified) by qbsrv011.QBASCO.COM with Microsoft SMTPSVC(5.0.2195.6713);

Tue, 9 Sep 2008 23:50:28 +0800

Reply-To: <noreply[at]irs.tax.gov>

From: "Internal Revenue Service"<refund[at]irs.tax.gov>

Subject: IRS Tax Refund

Date: Tue, 9 Sep 2008 11:44:38 -0400

MIME-Version: 1.0

Content-Type: text/plain;

charset="Windows-1251"

Content-Transfer-Encoding: 7bit

X-Priority: 3

X-MSMail-Priority: Normal

X-Mailer: Microsoft Outlook Express 6.00.2600.0000

X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000

Message-ID: <QBSRV0118AVIBsOPd8s00003098[at]qbsrv011.QBASCO.COM>

X-OriginalArrivalTime: 09 Sep 2008 15:50:41.0312 (UTC) FILETIME=[CFD55A00:01C91393]

To: undisclosed-recipients:;

Internal Revenue Service (IRS)

United States Department of the Treasury

Dear Applicant,

After the last calculations of your annual fiscal

activity we have determined that you are eligible

to receive a tax refund (Stimulus Payment) of $620.50.

Please submit the tax refund request and allow us

6-9 days in order to process it.

A refund can be delayed for a variety of reasons.

For example submitting invalid records or applying

after the deadline.

To access the form for your tax refund, please use the following link:

http://0x7C.0xA.0x7F.0xA4/Internal.Revenue...refund-form.php

Applicant ID: (0x7C.0xA.0x7F.0xA4).

Regards,

Internal Revenue Service

Reporting this does not correctly parse the spamvertised URL, and you need to covert the hex bytes to decimal yourself to find the most effective reporting address. This is the SpamCop output:

Finding links in message body

Parsing text part

Resolving link obfuscation

http://0x7C.0xA.0x7F.0xA4/Internal.Revenue...refund-form.php

Host 0x7c.0xa.0x7f.0xa4 (checking ip) IP not found ; 0x7c.0xa.0x7f.0xa4 discarded as fake.

Tracking link:

http://0x7C.0xA.0x7F.0xA4/Internal.Revenue...refund-form.php

No recent reports, no history available

Cannot resolve

http://0x7C.0xA.0x7F.0xA4/Internal.Revenue...refund-form.php

Remains a bug, but I've not seen it exploited much, probably perhaps such a way of representing a host is inherently suspicious.

Miss Betsy · April 10, 2009

You might be interested in reading this topic which discusses, to some extent, why spamcop does not fix these bugs. blog spot spam

Basically, spamcop is the wrong tool for spamvertised websites.

Miss Betsy

Wazoo · April 11, 2009

For info, the problem still exists. If there is any SpamCop development going on, it is worth the time reporting bugs or feature requests if we know where they should go - but if there is very little happening, maybe we should know. I'm not interested in discussing stuff just for the sake of it, unless it helps the SC developers.
I think you misunderstood. Perhaps I presented the issue too tersely. It's not "embedded stuff" - the quote was SC output.

Remains a bug, but I've not seen it exploited much, probably perhaps such a way of representing a host is inherently suspicious.

Tracking URL still needed for discussion on the actual parsing output. I'm not interested in trying to recreate your 'posted evidence' into a parsable form, as I still have no idea what the 'actual/real' spam looked like .. and totally not meaning to skip over the fact that no one really wants to 'read' your spam here .... most have enough of their own to handle in one way or another.

rconner · April 11, 2009

Cannot resolve ht tp:/ /0x7C. 0xA.0x7F.0xA4/ Internal.Revenue.Service/refund-form.php

The perp is giving the IP address as a "dotted hex quad" rather than the more conventional decimal form. Here is a place where you can break down this URL into a more usual form.

You mention converting this to a decimal quad by hand. I would be very careful about doing this if I were you, because it runs right smack into the strict rule that we are not supposed to alter the spam we submit to get SpamCop to find things that it would not otherwise find. People get kicked off SpamCop for failing to follow this rule. On the other hand, there's no problem reporting this outside SpamCop if you wish to.

It seems to me that browsers used to support hex quads (and octal quads and other forms as well), but I cannot get my browsers (on the Mac) to recognize this link. Neither whois nor nslookup nor curl command lines will support it either. It's as though everyone has decided to stop permitting these things. Maybe they still work in IE, I'm not sure. Certainly the only plausible use for such a trick is deception, and perhaps that is why browsers now "boycott" it. If they do, then this guy isn't going to get much business.

Wazoo is correct that you need to post this sort of thing under a tracking URL (see here for how to do this). In your post, not only do we see the whole spam (and help the spammer promote it via search-engine crawling), you've also left the links "live" so that anyone can click on them (I munged the link in the quote above to stop the board software from "linkifying" them).

-- rick

On edit, I note that the "fixed" version of this link will not load (timeout), so there may be no justification for reporting it.

StevenUnderwood · April 11, 2009

For info, the problem still exists. If there is any SpamCop development going on, it is worth the time reporting bugs or feature requests if we know where they should go - but if there is very little happening, maybe we should know. I'm not interested in discussing stuff just for the sake of it, unless it helps the SC developers.

There is development going on, but usually not in the area of finding links in the body of the spam messages,. It has been this way for quite a while and, IMO, is unlikely to change at any time in the near future.

Cedders · April 12, 2009

You might be interested in reading this topic which discusses, to some extent, why spamcop does not fix these bugs. blog spot spam
Basically, spamcop is the wrong tool for spamvertised websites.

I disagree. SpamCop clearly is used a lot for reporting spamvertised websites and does the task pretty well - I've even received such reports myself when a site we host is used as an authority for a news story (effectively a "joe job" rather than the spammer's intended landing page). Also there is a URIBL based on reports of spamvertised websites reported to SC: SpamAssassin has a rule URIBL_SC_SURBL that hits about 20% of incoming spam with a nice low rate of false positives.

I think this inconsistency in parsing is just an omission. A low-priority one, admittedly, but fixing it would enable SC to behave more consistently from the perspective of both user and abuse desk.

Tracking URL still needed for discussion on the actual parsing output. I'm not interested in trying to recreate your 'posted evidence' into a parsable form, as I still have no idea what the 'actual/real' spam looked like .. and totally not meaning to skip over the fact that no one really wants to 'read' your spam here .... most have enough of their own to handle in one way or another.

:huh: Er... what? A big advantage of reporting via SpamCop is that it anonymises the report. If I don't want the ISP's abuse desk to see the recipient address, I'm not going to post it (or the tracking URL which might still include the receiving server name, although as you can see the report was months old) to a public webpage. I don't want "help" in any case - I wanted to report a bug. And, besides, I don't think I can find the original Tracking URL any more

It was you who I noticed said it lacked context, so I supplied the context (cutting and pasting it into SC will be parsable, so long as you indent the second line of headers appropriately. It's text/plain and there are no other MIME parts, if that is really relevant.) I cannot understand at all why you say you don't know have any idea what it looked like. There it is in post number 3 with only the recipient address and receiving server omitted.

BTW as I do abuse work (even at the weekend ), I don't just have my own spam to handle, but that of thousands of users.

The perp is giving the IP address as a "dotted hex quad" rather than the more conventional decimal form. Here is a place where you can break down this URL into a more usual form.

Yes, the dotted hex quad is what SpamCop doesn't parse correctly. The deobfuscator link is indeed a useful one for users, but I know my sixteen-times table anyway

You mention converting this to a decimal quad by hand. I would be very careful about doing this if I were you, because it runs right smack into the strict rule that we are not supposed to alter the spam we submit to get SpamCop to find things that it would not otherwise find. People get kicked off SpamCop for failing to follow this rule. On the other hand, there's no problem reporting this outside SpamCop if you wish to.

To clarify, I wasn't suggesting altering the headers submitted to SpamCop, and did indeed (I think I recall) report it outside SpamCop manually.

It seems to me that browsers used to support hex quads (and octal quads and other forms as well), but I cannot get my browsers (on the Mac) to recognize this link. Neither whois nor nslookup nor curl command lines will support it either. It's as though everyone has decided to stop permitting these things. Maybe they still work in IE, I'm not sure. Certainly the only plausible use for such a trick is deception, and perhaps that is why browsers now "boycott" it. If they do, then this guy isn't going to get much business.

Firefox 2 and Opera 9 both recognise it. So does wget (which I use in place of curl). IE6 does not, and IE users are usually the more naive users. I agree that the spammer is therefore artificially restricting the number of people who they might defraud. Actually I can find any references in RFCs to hex quads, but I'd imagine with moves towards IPv6, it's a newer trend rather than one being phased out. Certainly RFC 1630 (1994, formalising URIs) doesn't seem to be conscious of hex quads - I'll test it on Amaya.

Wazoo is correct that you need to post this sort of thing under a tracking URL (see here for how to do this). In your post, not only do we see the whole spam (and help the spammer promote it via search-engine crawling), you've also left the links "live" so that anyone can click on them (I munged the link in the quote above to stop the board software from "linkifying" them).
On edit, I note that the "fixed" version of this link will not load (timeout), so there may be no justification for reporting it.

I'm seeing a 404 page in Chinese (Big5), not getting a timeout. Maybe your upstream provider is firewalling it? I don't think there's any risk in posting the link since the page was presumably taken down months ago, and phishing only works in context, (if it weren't a 404 it would now be more likely to come up in searches for "spam" than for "IxRxS". I was reporting the bug in the hope that it's fixed in future - the spam was only an example.

There is development going on, but usually not in the area of finding links in the body of the spam messages,. It has been this way for quite a while and, IMO, is unlikely to change at any time in the near future.

Thanks for the info. It would be nice if there were some sourceforge-style bugtracker to keep hold of minor issues like this, but I can understand that SC is not open source and there's no public access to this.

Wazoo · April 12, 2009

Also there is a URIBL based on reports of spamvertised websites reported to SC: SpamAssassin has a rule URIBL_SC_SURBL that hits about 20% of incoming spam with a nice low rate of false positives.

This data ecists in the FAQ found here.

besides, I don't think I can find the original Tracking URL any more

Also in the FAQ here, How to get a Tracking URL from a Report ID

It was you who I noticed said it lacked context,

No, I said it was "out of context" .... huge difference. My words include the actual entire spam as submitted ... not snippets or something copied/posted 'here' .... it's what the parser saw and how it was reacted to that's in question.

since the page was presumably taken down months ago,

gotta love playing with things that don't exist anymore, data being months old.

rconner · April 12, 2009

Yes, the dotted hex quad is what SpamCop doesn't parse correctly. The deobfuscator link is indeed a useful one for users, but I know my sixteen-times table anyway

No insult to your ability intended, I merely wanted to make things explicit for other people who might be reading this.

To clarify, I wasn't suggesting altering the headers submitted to SpamCop, and did indeed (I think I recall) report it outside SpamCop manually.

Actually, the rule explicitly applies to the entire spam (including URLs. etc.), not just the header.

-- rick

Miss Betsy · April 12, 2009

Yes, spamcop does parse some spamvertised URLs correctly. However, they stopped development when spammers started accelerating obfuscation - mostly, IIRC, because Julian, the original developer, thought it was more important to concentrate on the source. At one time, again, IIRC, they were going to drop the spamvertised part altogether, but did not because of the outcry from server admins who use spamvertised URLs for filtering. I don't remember at all whether, or what, if there is, a connection between spamcop and the blocklist that lists spamvertised sites. I seem to think that spamcop helps feed it.

In the beginning, reports to spamvertised sites helped educate legitimate merchants and web hosts that unsolicited email was not a good thing. There were many small merchants who hated spamcop. However, nowadays, very few, if any, reports go to responsible merchants and webhosts, unless the URLs are put there to make the spam look legitimate (not usually a joe-job which has a malicious connotation to deliberately get a site in trouble). Innocent bystander URLs are another reason why spamcop decided to discontinue development, I bet. The rest of the reports either go directly to the spammer or to spam-friendly web hosts. Keeping a list of spamvertised URLs can be valuable for tracking, but it is practically useless to send spamcop reports.

Now, there are other reports that can be sent and spamcop can be a tool for the first step in deciding where the reports should go, but the rest of it is pretty much manual - though Complainterator and Knujon seem to be other tools that may be helpful. I also think that spamcop.routing might still be active for some fine tuning, but not deobfuscation.

Spamcop's developers seem to have decided to concentrate on the source and on feeding the spamcop blocklist which does not involve spamvertised websites, but that is not an 'official' statement of policy. No 'official' spamcop source ever posts in this forum so it is conjecture whether anyone 'official' even reads it.

Miss Betsy

Lking · April 12, 2009

gotta love playing with things that don't exist anymore, data being months old.

Good point. The original example (at least the OP) is from Sep 08. Don't I remember reading that URL's are only good for 6 months?

Er... what? A big advantage of reporting via SpamCop is that it anonymises the report. If I don't want the ISP's abuse desk to see the recipient address, I'm not going to post it (or the tracking URL which might still include the receiving server name, although as you can see the report was months old) to a public webpage.

The tracking URL displays the obscure (munge) email/spam. The same URL is included in the report to the ISP's abuse desk. So no additional information would be exposed posting the URL on this public webpage. And yes, the receiving server name(s) are included

You may want to send yourself a spam report so you can see what is sent (for you) by SC. You may want to adjust you reporting options.

Wazoo · April 12, 2009

Deputies contacted with a request for their specific input.

Another parser challenge, but easier - hex IPs

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived