Jump to content

Google redirect parsing


Jank1887

Recommended Posts

A related topic to this has been discussed ( see Website redirectors, How to trace? ) under reporting help, but that seemed to specifically deal with an obfuscated URL being included in the google redirect. Recently I've been getting a lot of spam using this google redirect. See the following example spam parse:

http://www.spamcop.net/sc?id=z867573303z0d...88634396882315z

Now, obviously google doesn't want to hear about this, and (arguably) they shouldn't need to for an automated redirect system. (see discussion on this here: tricking with google, hiding spamvertised site )

Personally, I hardly see any reason for these to exist in the first place (well... laziness...), but as they do, it would seem to me that for certain redirects it would be little effort on the parser to jump to the target when identifying the spamvertised site. The parser already takes the time to deobfuscate the target link. How hard would it be to check if the inital string is "http://www.google.com/url?q=" and if so, to examine everything after the = ? Whether or not the parser still reports to google, it could additionally send a non- (less?) useless report to the actual site host.

I realize this could get back to the whole argument about the usefulness of identifying/reporting spamvertised sites. Let's not go THERE again. The parser currently makes a (nontrivial) effort at ID'ing the site, this seems like it would improve that functionality at minimal effort, redeem the currently wasted parse time, and have little (?) chance for additional errors/misreports as a result.

Thoughts?

Link to comment
Share on other sites

now, the issue always comes up about extra overhead and value thereof.

The system already processes the whole link... no extra there.

Then, it always tries to send a report to google, always marked as "isp does not wish to receive reports"... making the whole process a waste of time. Might as well put in

if(text=="http://www.google.com/url?q=") ignorelink();

with a change:

run a if(text=="http://www.google.com/url?q=") reparse(stripfirst28chars(link));

so, it would be about the overhead of parsing a second link, but it would get a (possibly) functional result.

Link to comment
Share on other sites

  • 2 weeks later...

Not that it means a whole lot, but I agree also. I actually came to post the same suggestion after seeing an obvious Google redirect to a spam site.

I don't expect SpamCop to decode all URLs with a 100% success rate or anything, but it would be nice if blatant autoredirects from big names like Google and Yahoo could be parsed out to the URL that it's getting redirected to. While they may be contributing to the problem of spam with their autoredirects, shutting down the targets will teach spammers that redirects aren't an easy way of getting their URLs in (and hopefully they'll eventually stop using them).

Link to comment
Share on other sites

  • 2 months later...
That's the issue.  It still can't pick "http://hengheng.ath.cx//pub/colappmgr/colportal/index.htm"

out of "http://www.google.com/url?sa=u&start=4&q=http://hengheng.ath.cx//pub/colappmgr/colportal/index.htm

I tend to manually submit those links.

42318[/snapback]

Difference of opinion...get google to stop redirecting, link is useless.

Link to comment
Share on other sites

It picks up google.com for me.  What is it getting for you?

42316[/snapback]

I parsed it a few times to see if it would pick up the correct URL. Most of the time it found Google, at least once it didn't parse, spits the url back at you with no host resolve or any info about the url.

Difference of opinion...get google to stop redirecting, link is useless.

42319[/snapback]

I agree, but that falls outside of what SpamCop does. Getting Google to change their policies or the way their sites function is not what this service is about. For the purposes of the parser, it needs to be able to find the correct url for it to be effective. Pursuing Google to get them to change should be a separate campaign outside of reporting spam and spamertized websites.

Link to comment
Share on other sites

I agree, but that falls outside of what SpamCop does. 

42329[/snapback]

I again disagree...google is the "host" of that link (your connection goes to google to resolve it) and SpamCop reports the hosts of the spamvertized links it fnds. This is a very old argument with all of it already hashed out many times in this thread, no sense going through it again.
Link to comment
Share on other sites

Sure, there are different ways of looking at it, no need to hash it out again.  But alas, nothing is being done.  I'd like to hear an official SpamCop position on it, because the parser is still only finding Google:

42401[/snapback]

You can certainly ask ... links have been provided at several points within the Forum, the FAQs, etc. If for some reason you get an answer that has never been provided to anyone else in the years of asking "why does this ...? why doesn't this ..." it would be appreciated if you'd post that response.

Link to comment
Share on other sites

And while catching up on other things, came across ths link .... Spammers hitch a free ride on car site ....

There's a legitimate use for the re-direct feature at the Autotrader.com home page. The site sports a bunch of banner ads, which, if clicked, send you to the advertisers' sites (while a ca-ching sounds in AutoTrader.com's accounts receivable department).

.

.

.

I imagine some spammers have a scri_pt that scours the Internet looking for sites with open re-directors. Others probably just use Google. Either way, re-directs are just another item in the devious spam king's bag of tricks.

Link to comment
Share on other sites

  • 1 month later...

could it be that the powers that be are paying attention?

Just ran a parse yesterday that had a Google redirect in it. Here's the tracking URL. http://www.spamcop.net/sc?id=z977207459zbb...08dcf9d2b91508z

Cut and paste:

Resolving link obfuscation

http://www.google.com/url?q=%68%74%74%70%3...%6fm/hn/?a=1243

Percent unescape: http://www.google.com/url?q=http://junonglois.com/hn/?a=1243

Removing Google re-director: http://junonglois.com/hn/?a=1243

Host junonglois.com (checking ip) IP not found ; junonglois.com discarded as fake.

Host junonglois.com (checking ip) IP not found ; junonglois.com discarded as fake.

Tracking link: http://junonglois.com/hn/?a=1243

...

the other day it found an IP for the actual target site, and offered a LART address. Today it gave the "cannot resolve" error. BUT... that means they did add code to strip some obvious redirectors, and the actual URL's can then run through the standard ID'ing process.

Edit: not sure if anyone that cares wants to merge this info into any other relevant threads (link analysis process, etc.) In the interest of not littering the forum with redundant posts, I'll leave that decision up to people brighter than I. :)

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...