Jump to content

Preventing spam on URL shorteners


samrudge
 Share

Recommended Posts

I run a URL shortener, we're doing pretty well (~1m clicks a month at the moment), but I'm pulling my hair out trying to deal with spammers.

Right now, we only allow a link to be shortened if;

  • It is not listed in DBL, SURBL or Google Badware
  • The IP the request came from is not in the SBL, XBL or SpamCop
  • The target domain has a Web Of Trust 'Trustworthiness' rating of >40 (only if confidence is >=12)
  • The target URL has a mime type of text/* or image/*
  • The IP of the server hosting the target domain is not listed in hpHosts
  • The target domain is not in our internal list of manually blocked domains
  • Regular human modoration of recently created links

Before a link is shortened we send a single GET request and detect any 30x redirects or java scri_pt redirects (We use Ghost.py for this) and run the spam detection on the URL redirected to, and we redirect to the target link so the redirection can't be changed after the link is saved.

We re-run all these checks every 6 hours and return `410 Gone` for any links that become spam after they're created.

We do all that, and it's still not enough.

Today we've had quite a large influx of spam links being created. We've had 417 links created, each link created by a different IP address and the targets are split over 112 domains, all seem to be compromised sites which have had a single HTML file uploaded, I won't link to it but it's branded as a Facebook page with the text "Save the file and run! It is lol :)" then it tries to download an .exe file (obviously something not nice).

I've now blocked all 417 links and removed them so they're all safe, the way I ended up doing it is after our spam checker has done it's mime type and redirect checks, MD5-sum the content and add that to a manual blocklist. I haven't yet figured out what's linking to these but they've had about 20-30 hits each domain as of now (The first link was created at 15:37UCT, I'd put this system in place and re-scanned all links created since then by 16:28UCT so hopefully not many people have been affected)

So here's my question, how do I stop things like this? It would only take adding

<!-- <?php echo time(); ?> -->

to get past the system I'm using to block the links now, even now none of the domains or source IPs are in any of the lists we check against. Say I'd been out today, it could have been 24 hours before I caught these links and managed to stop them.

I've tested about 20 of the links and none of them are blocked by bit.ly or tinyurl so the spammer just has to switch to one of those and they can keep on spamming for a bit.

I hate being a spam "enabler" but I'm pretty much out of ideas to stop people using our service for bad stuff.

P.S. I'm talking to our ISP at the moment about the best way to contact the website owners and inform them of the spam page. All the sites I checked seemed perfectly normal sites with this one bad HTML file in the root so I'd guess the site owners don't know they've been compromised.

Link to comment
Share on other sites

As Geek said, thanks for actually trying to stop the spammers from abusing your system. While others may also go to the same lengths that you do, it sure doesn't seem like it.

Project Honey Pot is another one you could check. They have a large focus on scrapers and comment spammers, which seems like it might overlap with the types who would use URL shorteners to obfuscate their spam links. I'm not sure how much extra it would block above what you're already doing, but if it's not too much work, might as well check there too.

From the other end of things, please make it easy to investigate and report your shortened links. For example, adding a "+" to the end of a bit.ly link will take you to info about the target URL, rather than immediately forwarding you to the other site. You could even put an option right on that page to report it as an inappropriate target (spam, malware, illegal content, etc.). It might run the risk of getting abused or turning into a very complex system, but you could even have some sort of automated blocking/warning system if a certain forward gets enough reports from users.

Link to comment
Share on other sites

...And I add my thanks, as well! :) <g>

...Another possible resource for you is McAfee Site Advisor, which I have installed on my personal PC. At work, the client to which I'm assigned uses Symantec Endpoint Protection, which does not seem to have an anlagous proactive feature.

...In addition to checking the web page being linked to, perhaps you could look up the IP address of the requestor of the link shortening in one or more BLs like the SpamCopBL. This, however, could result in too many false positives and wind up costing you users but it might be worthwhile if it helps keep you "cleaner."

Link to comment
Share on other sites

Thanks for your efforts samrudge.

Too many possible types of spam to address except through the common focus of the "bad" URI. But it might be an idea to look at stopforumspam.com as well. And someone there might be able to help with the mechanics of blocking. Seem to be some capable programmers hang out there.

And, as well as DBL, SURBL have you looked at URIBL? Not sure if they could increase your assurance of detection, sounds like those compromised URIs might be pretty-much "made to order" and your processing overheads must already be pretty horrendous - but all I can think of offhand ...

Good luck!

Link to comment
Share on other sites

Project Honey Pot is another one you could check. They have a large focus on scrapers and comment spammers, which seems like it might overlap with the types who would use URL shorteners to obfuscate their spam links. I'm not sure how much extra it would block above what you're already doing, but if it's not too much work, might as well check there too.

Will add Project Honey Pot, not sure how similar it's results are to SpamHaus XBL but will test it out see how much it's catching.

From the other end of things, please make it easy to investigate and report your shortened links. For example, adding a "+" to the end of a bit.ly link will take you to info about the target URL, rather than immediately forwarding you to the other site. You could even put an option right on that page to report it as an inappropriate target (spam, malware, illegal content, etc.). It might run the risk of getting abused or turning into a very complex system, but you could even have some sort of automated blocking/warning system if a certain forward gets enough reports from users.

I could add a 'link info' page, the problem is letting people know about it. If someone doesn't know about our URL shortener (as a generalization, people who visit spam/bad links aren't too tech-savvy) they wouldn't know about/understand this. I'll have a think about a way to integrate it without having a negative impact on the safe links that make up most of our system.

I like the idea of a manual reporting system though, I could have a page on the site for reporting bad links, and link it from the main site so it was easy to report links for various reasons.

Another possible resource for you is McAfee Site Advisor, which I have installed on my personal PC. At work, the client to which I'm assigned uses Symantec Endpoint Protection, which does not seem to have an anlagous proactive feature.

We looked at Site Advisor, unfortunately they don't (publicly at least) seem to offer any sort of lookup API so I'm not sure how we could check links against them.

In addition to checking the web page being linked to, perhaps you could look up the IP address of the requestor of the link shortening in one or more BLs like the SpamCopBL. This, however, could result in too many false positives and wind up costing you users but it might be worthwhile if it helps keep you "cleaner."

We already check the IPs of link creators against the SpamHaus XBL/SBL or SpamCop BL, from an earlier suggestion I'm also going to try out Project Honey Pot. At the moment we just block bad IPs but it might be better to require a CAPTCHA before creating the link for bad IPs, as well as doing all the usual spam checks.

Too many possible types of spam to address except through the common focus of the "bad" URI. But it might be an idea to look at stopforumspam.com as well. And someone there might be able to help with the mechanics of blocking. Seem to be some capable programmers hang out there.

OK I'll make a post over there see what they think. I'll also look at integrating their bad IP list into the blocking.

And, as well as DBL, SURBL have you looked at URIBL? Not sure if they could increase your assurance of detection, sounds like those compromised URIs might be pretty-much "made to order" and your processing overheads must already be pretty horrendous - but all I can think of offhand ...

We don't use URIBL yet but I'll look at adding it now. The overhead isn't too much, all the spam checks are done async so it's not too much load and doesn't take that long.

Have you looked at cbl.abuseat.org?

No, will look at adding it. Looks like it could be some help.

Thanks everyone for your help so-far

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...