SpamCopWiki : RconneRVerifyingReportabilityOfSpamWebsites

SpamCopWikiHome :: Categories :: PageIndex :: RecentChanges :: RecentlyCommented :: Login/Register

Verifying the reportability of spam websites


by RconneR -- 29 May 2007


I am not affiliated with SpamCop in any way except as a user and paid customer. The remarks on this page reflect my own opinions, and not those of SpamCop's owners or staff.

NOTE: This page is linked from my general topic of how spam-related websites can be reported. If you got here first by a web or Wiki search, you might like to visit the main page to read the topic "from the top."

In order to send an accurate and factual report regarding a spam website, you must use some human judgment to determine whether the website is, in fact, directly related to the spam. This kind of scrutiny isn't required when reporting spam mail sources via SpamCop because SpamCop can generally find these sources without much problem or ambiguity. However, spammers (and others) can (and do) drop random, unrelated, and innocent website links into spam mailings, and it is not cricket to report such sites. In addition, you may not want to waste your time reporting a website that is already offline.

NOTE: I'm being very detailed here, and beginners may find this somewhat daunting. However, as you deal with spams, you'll quickly gain experience with spam pitches (particularly since you'll probably be repeatedly spammed by the same offenders), and you'll be able to "internalize" a lot of this work without going into rigorous detail.

As I mentioned in the main page of this topic, you can report a website linked from a spam message if you can verify

We'll tackle each of these below, and I'll end by suggesting some alternatives to using insecure web browsers to deal with spam websites.


Verifying that the web host is online


NOTE: For this operation, I assume that we'll use a web browser to do the testing; however, there is some risk with such activity. While you (probably) won't encounter anything dangerous here with most spams, it is possible that the link you are investigating could verify your e-mail address to the spammer, or (worse) expose your computer to web-based malware exploits. If this bothers you, read below to find some alternatives to the use of web browsers.

The first thing to do is to enter the spam website's URL into your browser's URL window, as you would for visiting any other website. We're going to see whether we get an error, and what kind error it might be (and how we would interpret it for spam-reporting purposes).

Every browser handles routine network errors differently, so I am generalizing a bit in my descriptions below. The bottom line is that if you get "trapped" in this section with some sort of error (that displays a possibly cryptic error message rather than a "normal website"), you may be looking at a spam website that is already offline or that cannot be reached for further investigation, so you are really off the hook for reporting this particular website.

Incidentally, if you have trouble getting spam URLs to work, it might help to make sure that you can reach known-good sites (like Yahoo! or Google) with your browser to rule out any problems with your computer or your internet connection. I also advise using copy-and-paste to enter URLs rather than typing at the keyboard, since you are less likely to make an error.

Can the host name be resolved to an IP address?


If your browser returns almost immediately, complaining about "no such host," "can't find host," "cannot load address," or some such, this probably means that your browser could not resolve the host name to an IP address. In this case, the spam domain may have been removed from DNS, or else that its DNS service is not operating. There isn't much more to do here, since if you can't get an address for the site you have no one to complain to. Count this as a kill, unless you want to try again in an hour or so to make sure that a transient DNS problem wasn't at fault.

Is the host running and reachable via the network?


If your browser goes away for a very long time, and then comes back with an error, this could mean more problems with DNS (e.g., the authoritative name server for the spam domain was unreachable), but it could also mean that the machine hosting the website was offline. It might also indicate in some cases that there's no web server software running on this host. The error message returned by your browser may give you clues as to the exact problem. Whatever the case here, unless you want to try again later, rack up a kill and move on to the next spam.

Can you connect to web server software at this address?


If your browser comes back fairly quickly, but complains about "connection refused," "could not connect to host," or the like, this means that the spam website host is probably resolvable to an address, and is online, but there is no web server software running to accept your connection. To use an analogy, you drove to the mall and found the store, but the store was closed (or there was no one on duty in the store to serve you). This is a bit different from the cases above (in which you wouldn't even have been able to find the mall). Yes, you can now find the address of the host, since it was successfully resolved, but it doesn't make much sense here to report it because it isn't serving the website. Again, rack this up as a kill unless you want to try again later.

Does the URL exist on the server?


If your browser displays a page with an HTTP error code (like 404, 500, or the like), then your browser found the host, and talked with a web server listening at this address; however, the URL you are asking for could not be found or was "forbidden" to you. This usually means either (1) the URL is incorrect (check your typing), or (2) someone (hard to know who) has removed the page you are looking for or has told the web server software not to serve it to you. You might like to check to make sure you entered the correct URL, but otherwise there probably isn't anything to report here (i.e., what you see in your browser is not related to the spam).

If your spam website link points to some large hosting service (Geocities is a popular choice with some spammers), and you get a response from this service indicating that the page couldn't be found or couldn't be served, this is essentially the same thing as I described above (i.e., the hosting service sent your browser an HTTP error, plus a fancy "error page" to explain the problem).

Did something break on the website?


Occasionally your browser will display cryptic-looking error messages from sources other than HTTP -- for example: CGI errors, PHP errors, database errors (that might mention MySQL or the like), script traceback stacks, and so forth. This might mean that the website has been disabled (accidentally or on purpose), but is most likely to happen if you intentionally "munge" out portions of the URL that you find suspicious (such as the long "query strings" often found at the end of spam URLs). Unless you want to expend further effort, you may as well mark this down as a broken website and move on.


Matching the website to the spam pitch.


Assuming you have not dropped out due to one of the problems listed above, you are now looking at what may or may not be a spam website. This is where that "human judgment" stuff comes into the picture.

If the website matches what you expect based on the spam (i.e., a drug spam points to a website selling drugs), then you may be finished with your check, and you can proceed to track down the reporting info for the site. This will be the case most of the time when you investigate a genuine spam website link, and after awhile you will be able to spot these sites pretty quickly.

On the other hand, this may not be a reportable spam website even if it loads properly, and makes some reference to the topic of the spam. Here are some things to look for:







What if you don't want to load spam websites with your browser?


Can't say I blame anyone for feeling this way. Many browsers and operating systems are famously vulnerable to tricks and exploits crafted by crackers for subversion, theft of service, or theft of information. Of course, the average spammer is interested only in taking your money, and probably does not want to break your computer in the process (or at any rate he doesn't want to obviously break it). If you still want to track down spam websites, but would rather not risk these problems, there are some measures you can take.

What makes a browser (or, more precisely, a browsing method) secure for our purposes here is that it will do nothing more than read data from a web server and simply print it out in raw form; it will not run scripts, read or write cookies, nor call up plug-ins or other external code. Of course, these behaviors would make for a pretty useless web browser for most normal purposes, but for checking suspicious websites these behaviors become very important indeed.

Instead of your browser, you might make use of a "command-line web fetch tool" like curl or wget.Both of these are free, open-source software, and both are available in versions for most home-computer operating systems (including Windows). I'll leave it to you to investigate these links and decide whether or not these tools would be useful to you; I'll just note that these tools simply fetch specific files from web servers and print them in raw form (i.e., HTML markup) on your screen. These tools can also display the HTTP headers for the transaction (which typically aren't visible in web browsers). You can then scan the markup yourself for the info you need. These tools don't write or transmit cookie data, nor do they run scripts, so they are very safe to run.

Another alternative is to find a "safe browser" on the web; this would be a web-based utility that lets you enter a URL, then returns the raw content of the file to you in a form that is "massaged" so your own browser won't try to render it as HTML. In this way, the website won't be able to play any dirty tricks on you. For many years, the reliable choice was found at Sam Spade, but this site has since greatly curtailed its operations, and I haven't found a good web-based replacement (let me know via page comment if you've found one). Note that "anonymizing" web proxies are not precisely the same thing, since they will not mangle the HTML before delivering it to your browser.


CategoryPagesUnderConstruction
CategorySpamCopReporting

There are no comments on this page. [Add comment]

Valid XHTML 1.0 Transitional :: Valid CSS :: Powered by Wikka Wakka Wiki 1.1.6.2
Page was generated in 0.1189 seconds