Jump to content

Two out of three URLs using t.co skipped


DRSpalding

Recommended Posts

The following spam report, http://www.spamcop.net/sc?id=z5471444841z7...73565a0fd8a2efz shows only one of the three unique t.co URLs in the spam message. This format of spam always seems to befuddle the parser and it only shows one, in this case,

http://t.co/OvSpFPrg

while missing the other two. It does not seem to be consistently the first, second, or third link in the spam that is pulled out for reporting either. Is this something known or can it be solved?

As an aside, I know that SpamCOP doesn't want to go near detecting redirected URLs, but I would gladly pay the price in time to have SpamCOP actually do what I do manually using "wget --spider" to follow the redirects on sites such as t.co, lnkd.in, goo.gl, and bit.ly, et al. I then have a listing of the redirect chain like this:

http://t.co/OvSpFPrg 
--> http://lnkd.in/mZyUMs
--> http://www.linkedin.com/slink?code=mZyUMs
--> http://naift.com/mls/index.php
--> /stealth/

That I can then include in additional reports to Linked In for instance in this case. I generally don't bother with the final target but if SpamCOP could do it as an option, it would be helpful. I am seeing at least 50% of the URLs in the spam I handle using t.co, lnkd.in, goo.gl, or bit.ly separately or in tandem now and I get the feeling that Twitter doesn't care enough to nuke them. I have checked previously reported shortened URLs and they still successfully redirect to their targets. LinkedIn has been more proactive, but they don't get notified by default if it is a t.co shortened URL in the spam redirected to them.

Link to comment
Share on other sites

<snip>

As an aside, I know that SpamCOP doesn't want to go near detecting redirected URLs, but I would gladly pay the price in time to have SpamCOP actually do what I do manually

<snip>

...But the rest of us don't! :) <g> Not that it's a bad idea, it just probably isn't worth the time to most of us and in any event spamvertized URLs, redirected or not, aren't SpamCop's main concern.
Link to comment
Share on other sites

The following spam report, http://www.spamcop.net/sc?id=z5471444841z7...73565a0fd8a2efz shows only one of the three unique t.co URLs in the spam message. This format of spam always seems to befuddle the parser and it only shows one, in this case,

http://t.co/OvSpFPrg

while missing the other two. It does not seem to be consistently the first, second, or third link in the spam that is pulled out for reporting either. Is this something known or can it be solved? ...

That definitely sounds buggy - not that it matters in this case since all three are twitter and, if the parser found the other two, it would just thrash about resolving them and looking up their reporting addresses all over again, only to end up sending a single report to twitter anyway. But, as you point out further, the same skipping apparently happens when there are multiple different services/hosts links in the message body.

Unfortunately, as Steve T points out, spamvertized links are not a priority for SC and the rate of development of the parser, even within the "prioritized" areas of source mail server identification followed by reporting address discovery is rather slow (using the RIPE lookup flag concerning the latter still seems to be outstanding). Nonetheless, it may be something easily fixed (who knows?) and whatever causes it might create other parsing/processing problems less apparent - so a tracking URL showing two or more services/hosts with only one of them identified just might catch the attention of the developers, if the SC staff care to support a fix. Without that illustration it (almost) looks more like a "feature" than a "bug", to trot out that hoary old excuse for inaction.

Link to comment
Share on other sites

...But the rest of us don't! :) <g> Not that it's a bad idea, it just probably isn't worth the time to most of us and in any event spamvertized URLs, redirected or not, aren't SpamCop's main concern.

The too-subtle emphasis was on the "I" of my statement. I would like to see it as an advanced option (off by default) to do automatically what I generally end up doing manually anyway. That's a good use for computers and automation after all. Having it integrated into SpamCOP proper means I have to do less on my end, and since SC already reports web sites, diving through 301, 302, 303, or 307 redirects the same as a browser would to see through the subterfuge would be awesome.

It could take the form of a different submit email address (a la the "quick" reporting) for instance, and also include other "slow" or "computationally more expensive" features such as a better lookup on those pesky websites that always appear fine to users but for whatever reason SC fails to get an IP address on their lookups.

More than anything, if the spammers are using t.co or lnkd.in as a way to avoid getting their website taken down, I would like to see something done about it. It is clear that it is far too easy to use a Twitter t.co URL to hide behind and that Twitter is not really on top of whacking them or (probably) the users that create them. LinkedIn seems to do a better job and Google too. My scri_pt for doing this runs for a second or less and that is w/o any programmatic access to the http protocol. I just use wget --spider and filter/munge its output to spit out the redirects.

Thanks for listening, even if I am tilting at windmills. :)

Link to comment
Share on other sites

You would also have to do spoofing of who you are as you did your sniffing, some of these redirects send you in one direction if you're using firefox, and yet another using iexplorer. And they can do further redirection based on the country and / or IP address you are at.

Link to comment
Share on other sites

Interesting - though this part of the discussion probably belongs in "New Feature Request" - I think there is a topic on punching through the redirects there already but this might be additional data.

Link to comment
Share on other sites

The too-subtle emphasis was on the "I" of my statement. I would like to see it as an advanced option (off by default) to do automatically what I generally end up doing manually anyway. That's a good use for computers and automation after all.

<snip>

...Yes, it is. The problem here is that there is only one SpamCop parser, so any SpamCop system resources taken to perform advanced features for you means that much fewer system resources that can be devoted to high-volume parsing for the rest of us. Perhaps you could convince SpamCop to provide you with a copy of the parser that you could enhance to include the features you wish and run on your own system so as to avoid such resource contention (but I would not expect such a request to be accepted :) <g>).
Link to comment
Share on other sites

...Yes, it is. The problem here is that there is only one SpamCop parser, so any SpamCop system resources taken to perform advanced features for you means that much fewer system resources that can be devoted to high-volume parsing for the rest of us. Perhaps you could convince SpamCop to provide you with a copy of the parser that you could enhance to include the features you wish and run on your own system so as to avoid such resource contention (but I would not expect such a request to be accepted :) <g>).

The parser is already handling "quick" vs. "normal" at a level not discernible to use users since the spam traps don't parse the body at all, IIRC. I agree that there would be design and work involved, but if there is work being done in the parser, then dealing with user settings as to which parser to use (fast header only, traditional body, slow and complete body) would be high on my list.

And again, I am fully willing to pay the price in both time spent waiting for it and my payment for SC fuel to do it, if that is what it takes.

Link to comment
Share on other sites

The parser is already handling "quick" vs. "normal" at a level not discernible to use users since the spam traps don't parse the body at all, IIRC.

<snip>

...AFAIK, the parser is not processing e-mails sent to spam traps at all. There is "quick reporting" which IIRC does not parse spamvertized URLs at all. As far as I am concerned, the parsing of spamvertized URLs can be removed entirely and left to tools designed for that purpose, such as Knujon and Complainterator, cited elsewhere in these Forums. Whatever the SpamCop parser does do for us in terms of spamvertized links is gravy and I would not favor "improving" it in any way, even as an "extra" available only to those willing to pay the extra fuel, that adds any time to the parse for anyone because that's not SpamCop's mission. I am pretty sure I am not alone in this view.... :) <g>
Link to comment
Share on other sites

...AFAIK, the parser is not processing e-mails sent to spam traps at all. There is "quick reporting" which IIRC does not parse spamvertized URLs at all. As far as I am concerned, the parsing of spamvertized URLs can be removed entirely and left to tools designed for that purpose, such as Knujon and Complainterator, cited elsewhere in these Forums. Whatever the SpamCop parser does do for us in terms of spamvertized links is gravy and I would not favor "improving" it in any way, even as an "extra" available only to those willing to pay the extra fuel, that adds any time to the parse for anyone because that's not SpamCop's mission. I am pretty sure I am not alone in this view.... :) <g>

I disagree a bit on this, since the bot networks most spammers use are vast and have lots of nodes. The servers being spammed are either hacked sites or near-bulletproof hosted and are far fewer in number. Getting them off the net probably hurts them more than the disconnection or cleansing of a few of the bot nodes that sent the spam. In addition, I do the two steps myself now already and I would rather it be done in one shot. I don't like spending time doing it, but I also despise spammers and scammers. So, I do it for the greater good. :)

Regardless, this is an interesting conversation to have.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...