Jump to content

Detect URLs with whitespaces


BoMbY

Recommended Posts

Hello,

lately I'm getting lots of mails with URLs which are not detected by spamcop. There are simply some spaces added between the dots.

For example something like this: www . domain . invalid

Is it possible to create a workaround for this?

Thanks and Regards,

BoMbY

Link to comment
Share on other sites

lately I'm getting lots of mails with URLs which are not detected by spamcop. There are simply some spaces added between the dots.

For example something like this: www . domain . invalid

Is it possible to create a workaround for this?

The point boils down to the fact that what you present does not meet the definition of a URI/URL. It is simply a string of text. Not only non-detectable by the SpamCop.net parser, but it should not be 'handled' by any known browser either. That even a cut/paste operation wouldn't work directly ought to be a large clue that there is something wrong with the provided link.

Paid users can add in the appropriate additional target to send a Report to about this URL (after you doing the appropriate research) and/or free-account users can also fall back to generating thier own Complaints/Reports to the appropriate source.

What you're basically asking for is the magic pill that would understand the 19,000 different ways to spell Viagra such that it would 'always' be recognized ..... actually quite impossible with any thoughts of accuracy involved. Note a recent Topic in the Reporting Help Forum section that dealt with a screw-up in the parser that turned 'best4www' into 'www' and therefore sent notification out to the wrong party ... and that's even ignoring that this data was only found within an e-mail address, so it shouldn't have been parsed at all.

Link to comment
Share on other sites

IMHO, it is a waste of time to report spamvertized sites via spamcop. The part of the parser that does that is left over from when webhosts needed to be 'educated' by reports. Nowadays, spammers evade the problem of being shut down by their webhost by creating as many websites as they need and legitimate webhosts are careful about hosting such sites. There are other ways to report spamvertized websites - Complainerator (in the software forum here) and knujon which is a site many swear by, but IMHO doesn't quite measure up to my standards. In addition, many spamvertized reports go directly to the spammer.

The only reason spamcop continues to parse them, IMHO, is that some server admins do filter on spamvertized sites (one estimating that his filter catches 25% of his spam) and the spamcop parser helps to feed such lists.

IMHO, no one at spamcop is going to spend anytime at all 'fixing' finding broken URLs - as Wazoo says, no browser would find it anyway. Like the empty spam people sometimes get, it probably is 'operator error' - someone who bought a spammer package who doesn't understand how to do it and has forgotten to put the % or whatever in the code that hides it from the parser, but not the browser.

As Wazoo told you, if you still think it is important to report the website, you can do it manually. It doesn't make any difference anyway since spamvertized websites addresses are not added to the spamcop blocklist - which is used by many to identify the source of spam. IMHO, it is more important to block or filter the source (the IP address the spam is coming from) than to shut down the website.

Miss Betsy

Link to comment
Share on other sites

...The only reason spamcop continues to parse them, IMHO, is that some server admins do filter on spamvertized sites (one estimating that his filter catches 25% of his spam) and the spamcop parser helps to feed such lists. ...
At the risk of wandering a little O/T I note the paste-in window at uribl.com appears to include the facility to trim uri/url text strings. Which is evidently the sort of functionality tantalizing SC users. As others have said, that's not going to happen with SC (based on what we know/guess). While SC is not dedicated to the pursuit of the spamvertized sites, others are. Miss Betsy has mentioned two, and I guess the URIBL people and the SURBL people too play their parts (the latter having one list which is fed by SC reporting).

These URBLs are made for filtering - I guess the 'internet address' (dotted quad) and associated ISP which are what SC finds (resources permitting) are mostly pretty useless for that purpose these days - due to the wide use of fast-flux botnet hosting. But the domain name is not really affected by such (illegal) chicanery:

H:\>nslookup hlatx.cn

Non-authoritative answer:

Name: hlatx.cn

Addresses: 217.52.247.246, 71.239.68.234, 77.238.233.185, 86.106.92.212, 98.197.5.134, 190.17.104.21, 190.191.12.138, 193.17.213.14 (botnet, revolving addresses and, for additional variability, the list might be changed at any time).

H:\>nslookup hlatx.cn.multi.uribl.com (check one of the URIBL)

Name: hlatx.cn.multi.uribl.com

Address: 127.0.0.2 (a hit)

H:\>nslookup hlatx.cn.multi.surbl.org (check one of the SURBL)

Name: hlatx.cn.multi.surbl.org

Address: 127.0.0.80 (a hit)

(Internal address - 127.0.0.x - on the list lookup means the referred domain address is listed).

Link to comment
Share on other sites

Thanks, Farelf for (I must remember "fast flux botnet hosting") showing /why/ it is difficult to identify spamvertized sites and why it is not very effective to report them via spamcop.

Apparently, though, using spamvertized websites as filters is effective which is why spamcop continues to include that part of the parser. I don't exactly how the spamvertized websites are 'fed' to the SURBL. Perhaps they have a way of extracting all these problem URLs; perhaps since they have other sources, the spamcop list is used to either confirm or to add to their list - even without the URLs that spamcop has a problem with, there are plenty of sites identified.

Miss Betsy

Link to comment
Share on other sites

... the URLs that spamcop has a problem with, ...
SC often resolves the FF botnet cases in fact - but just to the internet address (IP) that happens to be on top of the stack, at that instant, when it does. That's how come Comcast, to name but a few, features so frequently :D.

As consolation to them, if I have time, I add the current list of IPs (botnet) to my report notes to the spamvertized 'host', just to show they're not alone. I would like to imagine that generally the horrified provider might act just like a suburban mom having little Johnny sent home from school with head lice.

Alas, but a fey whimsy on my part no doubt - but that's what I like to imagine, on my more optimistic days, albeit the more pointless ones, and before the urge to gafiate overwhelms. That's not so often, when I come to think about it. But sometimes. I need my teddy bear now.

Link to comment
Share on other sites

  • 4 weeks later...

Moderator Action: Merged this new Topic opened in the Reporting Help Forum section into this existing Topic.

I keep getting junk mails where the sender has deliberately made the URL non-clickable by putting the URL as follows: www.urliana dot com. Spamcop won't report that URL because it's not a "clickable" URL. There ought to be a way to programmatically catch these.

Sample spam report: http://www.spamcop.net/sc?id=z2409535104z3...0baea412d2ae95z

It's frustrating to have to manually parse these URLs, and I can't "fix" them before reporting as that would be materially altering the UCE.

Link to comment
Share on other sites

This is very common stuff. Yes, there is a fix for it (e.g., "s/\sdot\s/\./" jumps to mind for RegEx fans), but the problem is that there is an infinitude of ways to hide links in this fashion. Apart from the question of whether we could get SpamCop to change their code (far from certain), you could actually keep a programmer employed full-time just tracking these down and dealing with them.

As you may have read here and elsewhere, SpamCop is about the sources of spam, and to a much lesser extent about spam websites.

You are free to deal with these URLs yourself if you can. In the past, I have done my own research on these and piggybacked them onto the SC reports as user-defined reports, but I tend not to do this so much anymore because so many of these sites are botnet hosted, and an individual report at one instant in time is almost futile. I generally copy the spams to KnujOn, which seems more oriented toward spam website reporting.

I will warn you that collecting info on spam websites is not for the novice nor the impatient. If you are interested, you can read this Wiki page that I put together some time ago: Reporting spam Websites.

-- rick

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...