Jump to content

FAQ Entry: The Link Analysis Process


Jeff G.

Recommended Posts

Link analysis is performed by the SpamCop Parser, part of the SpamCop Parsing and Reporting Service.

Finding links in message body is the first step of the process. The Parser steps through the body (if any) and each attachment that could contain a link (if any). It skips attachments that contain images and will reduce redundant links as necessary. It doesn't actually display the links it found in this step. It sometimes fails to find links that are really there - refreshing usually helps.

Resolving link obfuscation is the middle step of the process. The Parser displays each link it found, followed by any deobfuscation that is necessary, followed by the IP Address of the link's host (a lookup of the A DNS Record), followed by the canonical name of that IP Address (a lookup of the PTR DNS Record). It frequently fails to start looking up the IP Address - refreshing usually helps. It also sometimes fails to resolve the IP Address, especially with the domains of spammers who are playing fast and loose with the Domain Name System, producing "ip not found" and "discarded as fake." messages - refreshing usually helps, and parsing the URL only in a separate browser window usually helps in stubborn cases when refreshing hasn't been helping.

Tracking link is the final step of the process. The Parser again displays each link it found and was able to resolve (deobfuscated if necessary), again followed by the IP Address, and then the email addresses in the whois lookups of that IP Address from cache or (if the cached entry is stale or nonexistent) from ARIN and other appropriate Registries (there is currently a known issue with lookups of contacts at APNIC), followed by the abuse.net lookups of those email addresses (if those addresses are for role accounts), and finally a list of best contacts. It sometimes fails to start this step - refreshing usually helps. If it fails to resolve the IP Address, it displays a "Cannot resolve" message.

Please make sure this email IS spam: indicates the end of the link analysis process.

If you get tired of refreshing, please send a Manual Report for the URL(s).

I believe all the failures described above are known issues, I just wanted to document them in one Topic.

See also: SpamCop reporting of spamvertized URLs and a contribution from Don in that Topic.

Edit: 2005/07/01 23:13 EDT -0400 Jeff G. added messages and Manual Report. Also added APNIC, toned down the rhetoric, and added " (if those addresses are for role accounts)".

Edit: 2005/10/29 18:44 EDT -0400 Jeff G. added references to SpamCop reporting of spamvertized URLs and a contribution from Don in that Topic.

Edited by Jeff G.
Link to comment
Share on other sites

  • Replies 51
  • Created
  • Last Reply

Top Posters In This Topic

The factual stuff - valid.

The "usually refresh ..." thing is a bit touchy ...

On a spam source that's 'normal' ... say your typical compromised high-speed U.S. cable connected zombified system ... the parser works just fine, all the way through.

On a spam source that's spammer controlled ... DNS, web-site, etc. .... there seems to be more failures than not. First 'easy' reason is the tieouts in the look-ups, though the reasons for this range from smply being badly configured servers to the outright blocking of 'certain/specific IP addresses.

On a spam source from our favorite ISPs, the above applies in addition to some ignorant/explicit mis-configuration of server and server data. As seen in another Topic, what was the 'admin' person on when dreaming yp an "abuse" address od 1385902234[at]someISP ??????

The "refresh until it works" bit tends to also consume more resources at the SpamCop end. Once upon a time, there was a cache thing involved, and the "refresh" function seemed to take advantage that usually though the look-up thread timed out during a specific spam parse, the look-up may actually have eventually worked .. such that refreshing the parse then caused the next look-up to see and use the cached data. Although it is seen that some data does actually get cached (another Topic/Discussion about the Refresh link in the middle of a parse result) .... there is something else going on with some of the spam over the last few months to a year ... just what that is appears to fall under the "Julian isn't going to talk about it" scenario. Truth be told, most of these "failed to parse" results end up going to ISPs that don't give a hoot to receiving a complaint/report anyway, so it's hard to point to anything being "lost" when these URLs fail to come up as targets.

Personally, I'd rather suggest that if these failed items 'demand' reporting, then the "Manual Reporting" FAQ item needs to be pointed to / used.

Dropped a request off in the spamcop newsgroup asking for input to this item, a thread that has been going on for a while ... http://news.spamcop.net/pipermail/spamcop-...ead.html#101292

Link to comment
Share on other sites

  • 2 weeks later...

I seem to keep getting the same spam at work, where I have Outlook 2000 and use SpamDeputy to help report. I noticed that some of the same links that were NOT being found and offered for reporting as recently as yesterday are now being found!

Just an FYI and a HUGE thanks! :D

Link to comment
Share on other sites

It is not surprizing at all that many of us have expressed frustration when analysis failed. If we take our time to do the analysis it is because we also send reports to gouvernment agencies in hope some action will be taken down the line, regardless of the refractor<I>ness of ISPs involved, and in spite of that.

Edited by dra007
Link to comment
Share on other sites

  • 2 weeks later...
  • 2 weeks later...
Resolving link obfuscation is the middle step of the process.  The Parser displays each link it found, followed by any deobfuscation that is necessary, followed by the IP Address of the link's host (a lookup of the A DNS Record), followed by the canonical name of that IP Address (a lookup of the PTR DNS Record).  It frequently fails to start looking up the IP Address - refreshing usually helps.  It also sometimes fails to resolve the IP Address, especially with the domains of spammers who are playing fast and loose with the Domain Name System, producing "ip not found" and "discarded as fake." messages - refreshing usually helps, and parsing the URL only in a separate browser window usually helps in stubborn cases when refreshing hasn't been helping.

Hm.. refreshing helps? Perhaps an explanation why this helps.

I just reported a spam with the url http://jw.1UW.affordablekinginventions.com/4h/

the spamcop system repeatedly said "host jw.1uw.affordablekinginventions.com (checking ip) ip not found" - refreshing helped nothing. And the address resolved here all the time.

Link to comment
Share on other sites

snaller, I just used the Parser to parse just that URL twice. The first time, it couldn't resolve the IP Address, and the second time, it could resolve it and offered shengjun.zheng<at>fibrlink.net and wei.deng<at>fibrlink.com as reporting email addresses for IP Address 210.72.224.49. In my experience troubleshooting this particular issue, parsing just the URL independently until the Parser resolves the IP Address helps to increase the likelihood that parsing a spam that includes the URL will include resolution of the IP Address. Perhaps this is because parsing of individual URLs uses a longer timeout or a different algorithm or source for dns resolution, and parsing of spams relies in part on the cached results of the parsing of individual emails, and it may also depend on which servers in the farm you hit.

Link to comment
Share on other sites

  • 7 months later...

I have been getting a lot of spam lately advertising sites hosted on geocities in various countries. For some reason spamcop's parser usually doesn't pick up on these. I've been reporting them manually, since Yahoo will address the problem and discontinue the sites when it gets a report, but it would be nice to be able to do it via one step with Spamcop. The extra 24 hours it might take for a report to be received is money in the bank for the spammer.

The other issue is when instead of a site being advertised, the email advises people to contact an email address. The parser doesn't find those, but again, geocities will terminate their accounts if they get the report.

Link to comment
Share on other sites

Hi AlphaCentauri, it's been a long time! Do you have a tracking URL of a case where the parser doesn't pick up the geocities hosted site? Many may be familiar with these, but I (for one) am not.

The email addresses, as used in 419 scams etc, were discussed in http://forum.spamcop.net/forums/index.php?...indpost&p=35473 maybe other threads as well. Did you look at that one?

Link to comment
Share on other sites

Hi AlphaCentauri, it's been a long time!  Do you have a tracking URL of a case where the parser doesn't pick up the geocities hosted site?  Many may be familiar with these, but I (for one) am not.

The email addresses, as used in 419 scams etc, were discussed in http://forum.spamcop.net/forums/index.php?...indpost&p=35473 maybe other threads as well.  Did you look at that one?

40741[/snapback]

Here's one from yesterday:

http://www.spamcop.net/sc?id=z884407155zab...5528d30bd61082z

Edit: 2006/02/24 14:49 EST -0500 Jeff G. replaced the posted spam email message (against the rules here) with a Cancelled Tracking URL.

Edited by Jeff G.
Link to comment
Share on other sites

I have been getting a lot of spam lately advertising sites hosted on geocities in various countries. For some reason spamcop's parser usually doesn't pick up on these. I've been reporting them manually, since Yahoo will address the problem and discontinue the sites when it gets a report, but it would be nice to be able to do it via one step with Spamcop.

40739[/snapback]

Yes, it would be nice. I have found the formula for the addresses to be cc-geo-abuse[at]yahoo-inc.com, where cc is the country code; in this case, for country code es, the address would be es-geo-abuse[at]yahoo-inc.com. Edited by Jeff G.
Link to comment
Share on other sites

I also had the problem of links not being resolved at all, refreshing doesn't help. It's not that the domain translation fails, simply nothing happens at all:

Example of no resolving

The example shows this happening with geocities.com, but I also had this problem with other domains.

Edit: 2006/02/24 18:30 EST -0500 Jeff G. changed the Tracking URL to one usable by all.

Edited by Jeff G.
Link to comment
Share on other sites

I also had the problem of links not being resolved at all, refreshing doesn't help. It's not that the domain translation fails, simply nothing happens at all:

Example of no resolving

40751[/snapback]

ca.geocities.com/jaynell21539jason35054/ is resolving now.
Link to comment
Share on other sites

ca.geocities.com/jaynell21539jason35054/ is resolving now.

Just for fun, I put that address in the parsing window and got:

Parsing input: ca.geocities.com/jaynell21539jason35054/

Host ca.geocities.com/jaynell21539jason35054/ (checking ip) IP not found ; ca.geocities.com/jaynell21539jason35054/ discarded as fake.

ALL Geocities URLs have extreme trouble parsing, i.e. I haven't seen one work first time yet, and the average number of reloads (it varies) before SpamCop actually does something with the URL seems to be going up. this is very unfortunate, because it would seem that at least one prolific spammer is using Geocities as his host of choice.

HOWEVER, "ca.geocities.com" parses first time both times I tried it. If anyone is looking into this bug, that might be a clue.

Link to comment
Share on other sites

Just for fun, I put that address in the parsing window and got:

ALL Geocities URLs have extreme trouble parsing, i.e. I haven't seen one work first time yet, and the average number of reloads (it varies) before SpamCop actually does something with the URL seems to be going up.  this is very unfortunate, because it would seem that at least one prolific spammer is using Geocities as his host of choice.

HOWEVER, "ca.geocities.com" parses first time both times I tried it.  If anyone is looking into this bug, that might be a clue.

40783[/snapback]

I put "http://" in front of it ("ca.geocities.com/jaynell21539jason35054/"), as in http://www.spamcop.net/sc?track=http%3A%2F...39jason35054%2F.
Link to comment
Share on other sites

This bug in the parser even with common domains like xx.geocities.com is still there and seems to be getting worse. Entering a geocities subdomain into a separate window results in an immediate response while a full spam submitted still results in inconsistent failures. I have seen the same spam, including geocities spam, on different refreshes result in:

- no links found

- links found and simply dropped with no error message

- the first link properly identified with the rest dropped

Normally, geocities links in spam are simply dropped, with even dozens of refreshes failing to resolve them. In the past, refreshes worked more often.

At least one spammer is exploiting this with repeated use of geocities links. Some now include several such links.

The inconsistent behavior and poor error messages detracts from the credibility of all url parsing -- one never knows if all the links have been found, if "no links found" is believable, or if the reporting addresses are complete. This bug has been well known for a very long time. It and the fact that it has evidently been ignored are extremely frustrating and annoying, it has caused much wasted time, and spam reporting is being delayed or dropped because of it.

Link to comment
Share on other sites

/snip

This bug has been well known for a very long time.  It and the fact that it has evidently been ignored are extremely frustrating and annoying, it has caused much wasted time, and spam reporting is being delayed or dropped because of it.

40977[/snapback]

If you expect that will change in the future you might have to wait for a very long time...most of us don't even care reporting the url's in the spam and resort to quick reporting instead.. Without dwelling into it too much it seems that you are facing an upstream swim...It is simply not a priority of SpamCop..like it or leave it..

Link to comment
Share on other sites

  • 2 weeks later...
most of us don't even care reporting the url's in the spam and resort to quick reporting instead.. Without dwelling into it too much it seems that you are facing an upstream swim...

40979[/snapback]

That's very unfortunate, since I believe that Yahoo! is responsive to complaints about GeoCities-hosted spam sites.

It is simply not a priority of SpamCop..

40979[/snapback]

You're not the first to say that but, that too, is very unfortunate since IIRC back in the days of prehistory SpamCop's philosophy was exactly the opposite: that reporting a compromised system after it has been hijacked to relay spam was too late, not to mention a neverending battle to exhaust a practically infinite resource. The real value was in taking down the spammer's web sites, interfering with their revenue. In some cases this is no longer true due to 'bulletproof' spam hosting but, where it's likely or even possible that the spammer may be shut down and may have to move on, I'll take a moment to contribute to that.

like it or leave it..

40979[/snapback]

You are not the first person to say that, either, and you may well be correct but if everyone in history had just decided to accept what was there or forget about it in stead of trying to help and improve things, we'd be living in a pretty crappy world. None of us would be here if Julian had ever accepted that e-mail was e-mail, like it or leave it. Maybe nothing that I will ever do will result in significant improvement, but I refuse to sit back and not try.

I do like SpamCop, and shame on all you "like it or leave it" people for failing to recognize others' efforts to try to improve a good thing.

On a more positive note, I see lots of info on how to contact JT, the deputies, and other fine folk who, unfortunately, have no control over the reporting mechanism. Does anyone have contact information for 'the powers that be'?

Thanks to all who contribute, in whatever capacity.

Link to comment
Share on other sites

I see lots of info on how to contact JT, the deputies, and other fine folk who, unfortunately, have no control over the reporting mechanism.  Does anyone have contact information for 'the powers that be'?

41400[/snapback]

You may contact 'the powers that be' via the SpamCop Deputies and SpamCop Admin.
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...