Verifying the reportability of spam websites
I am not affiliated with SpamCop in any way except as a user and paid customer. The remarks on this page reflect my own opinions, and not those of SpamCop's owners or staff.
| NOTE: This page is linked from my general topic of how spam-related websites can be reported. If you got here first by a web or Wiki search, you might like to visit the main page to read the topic "from the top." |
In order to send an accurate and factual report regarding a spam website, you must use some human judgment to determine whether the website is, in fact, directly related to the spam. This kind of scrutiny isn't required when reporting spam mail sources via SpamCop because SpamCop can generally find these sources without much problem or ambiguity. However, spammers (and others) can (and do) drop random, unrelated, and innocent website links into spam mailings, and it is not cricket to report such sites. In addition, you may not want to waste your time reporting a website that is already offline.
| NOTE: I'm being very detailed here, and beginners may find this somewhat daunting. However, as you deal with spams, you'll quickly gain experience with spam pitches (particularly since you'll probably be repeatedly spammed by the same offenders), and you'll be able to "internalize" a lot of this work without going into rigorous detail. |
As I mentioned in the main page of this topic, you can report a website linked from a spam message if you can verify
- that the website is up and running, and,
- that it is directly related to the spam.
We'll tackle each of these below, and I'll end by suggesting some alternatives to using insecure web browsers to deal with spam websites.
Verifying that the web host is online
| NOTE: For this operation, I assume that we'll use a web browser to do the testing; however, there is some risk with such activity. While you (probably) won't encounter anything dangerous here with most spams, it is possible that the link you are investigating could verify your e-mail address to the spammer, or (worse) expose your computer to web-based malware exploits. If this bothers you, read below to find some alternatives to the use of web browsers. |
The first thing to do is to enter the spam website's URL into your browser's URL window, as you would for visiting any other website. We're going to see whether we get an error, and what kind error it might be (and how we would interpret it for spam-reporting purposes).
Every browser handles routine network errors differently, so I am generalizing a bit in my descriptions below. The bottom line is that if you get "trapped" in this section with some sort of error (that displays a possibly cryptic error message rather than a "normal website"), you may be looking at a spam website that is already offline or that cannot be reached for further investigation, so you are really off the hook for reporting this particular website.
Incidentally, if you have trouble getting spam URLs to work, it might help to make sure that you can reach known-good sites (like Yahoo! or Google) with your browser to rule out any problems with your computer or your internet connection. I also advise using copy-and-paste to enter URLs rather than typing at the keyboard, since you are less likely to make an error.
Can the host name be resolved to an IP address?
If your browser returns almost immediately, complaining about "no such host," "can't find host," "cannot load address," or some such, this probably means that your browser
could not resolve the host name to an IP address. In this case, the spam domain may have been removed from DNS, or else that its DNS service is not operating. There isn't much more to do here, since if you can't get an address for the site you have no one to complain to. Count this as a kill, unless you want to try again in an hour or so to make sure that a transient DNS problem wasn't at fault.
Is the host running and reachable via the network?
If your browser goes away for a very long time, and then comes back with an error, this could mean more problems with DNS (e.g., the authoritative name server for the spam domain was unreachable), but it could also mean that
the machine hosting the website was offline. It might also indicate in some cases that there's
no web server software running on this host. The error message returned by your browser may give you clues as to the exact problem. Whatever the case here, unless you want to try again later, rack up a kill and move on to the next spam.
Can you connect to web server software at this address?
If your browser comes back fairly quickly, but complains about "connection refused," "could not connect to host," or the like, this means that the spam website host is probably resolvable to an address, and is online, but
there is no web server software running to accept your connection. To use an analogy, you drove to the mall and found the store, but the store was closed (or there was no one on duty in the store to serve you). This is a bit different from the cases above (in which you wouldn't even have been able to find the mall). Yes, you can now find the address of the host, since it was successfully resolved, but it doesn't make much sense here to report it
because it isn't serving the website. Again, rack this up as a kill unless you want to try again later.
Does the URL exist on the server?
If your browser displays a page with an HTTP error code (like 404, 500, or the like), then your browser found the host, and talked with a web server listening at this address; however, the URL you are asking for
could not be found or
was "forbidden" to you. This usually means either (1) the URL is incorrect (check your typing), or (2) someone (hard to know who) has removed the page you are looking for or has told the web server software not to serve it to you. You might like to check to make sure you entered the correct URL, but otherwise there probably isn't anything to report here (i.e., what you see in your browser is not related to the spam).
If your spam website link points to some large hosting service (Geocities is a popular choice with some spammers), and you get a response from this service indicating that the page couldn't be found or couldn't be served, this is essentially the same thing as I described above (i.e., the hosting service sent your browser an HTTP error, plus a fancy "error page" to explain the problem).
Did something break on the website?
Occasionally your browser will display cryptic-looking error messages from sources other than HTTP -- for example: CGI errors, PHP errors, database errors (that might mention
MySQL or the like), script traceback stacks, and so forth. This might mean that the website has been disabled (accidentally or on purpose), but is most likely to happen if you intentionally "munge" out portions of the URL that you find suspicious (such as the long "query strings" often found at the end of spam URLs). Unless you want to expend further effort, you may as well mark this down as a broken website and move on.
Matching the website to the spam pitch.
Assuming you have not dropped out due to one of the problems listed above, you are now looking at what may or may not be a spam website. This is where that "human judgment" stuff comes into the picture.
If the website matches what you expect based on the spam (i.e., a drug spam points to a website selling drugs), then you may be finished with your check, and you can proceed to track down the reporting info for the site. This will be the case most of the time when you investigate a genuine spam website link, and after awhile you will be able to spot these sites pretty quickly.
On the other hand, this
may not be a reportable spam website even if it loads properly, and makes some reference to the topic of the spam. Here are some things to look for:
- Beware of links placed by others (besides the spammer). For example, many 419 scam letters are sent via free e-mail services, which often put advertisements for themselves at the end of all outgoing messages. Likewise, some anti-virus or filtering software may also place advertising links in messages that they have processed. These links should not be reported.
- If the link doesn't seem to have anything at all to do with the topic of the spam, it is most likely a camouflage link, placed for the purpose of bamboozling spam-reporting software or getting spam investigators into trouble by tempting them to file false reports. Needless to say, these links should not be reported either.
- If the website seems to be related to the spam, but has some sort of warning or notice indicating that spam has been sent in the name of the website operators, but without their consent, this is a signal that you may be looking at a "Joe job" (i.e., an attempt to frame an innocent party as a spammer). I tend to believe these messages wherever I see them, and refrain from reporting the links. On the other hand, some webmasters would rather not post such disclaimers on their sites as it would perhaps be bad P.R., so you won't always be able to depend upon such disclaimers. I would suggest that you use your best judgment, and err on the side of caution by not reporting such links unless you have reason to believe that they are indeed directly related to the spam.
- Stock spammers used to provide links to unrelated stock information sites (such as Yahoo! Finance) to give their touts an air of authority. I call these "further-reading" links, and I do not consider them to be reportable (since the operators of these websites almost certainly did not give permission for them to be cited in spam messages). In general, if a website doesn't offer you some means to contact someone directly in order to buy something or accept some sort of personal offer, it may not be a reportable site. Again, use your best judgment and be conservative; likely these sites are victims of the spammer, just like you.
What if you don't want to load spam websites with your browser?
Can't say I blame anyone for feeling this way. Many browsers and operating systems are famously vulnerable to tricks and exploits crafted by crackers for subversion, theft of service, or theft of information. Of course, the average spammer is interested only in taking your money, and probably does not want to break your computer in the process (or at any rate he doesn't want to
obviously break it). If you still want to track down spam websites, but would rather not risk these problems, there are some measures you can take.
What makes a browser (or, more precisely, a
browsing method) secure for our purposes here is that it will do nothing more than read data from a web server and simply print it out in raw form; it will not run scripts, read or write cookies, nor call up plug-ins or other external code. Of course, these behaviors would make for a pretty useless web browser for most normal purposes, but for checking suspicious websites these behaviors become very important indeed.
Instead of your browser, you might make use of a "command-line web fetch tool" like
curl∞ or
wget∞.Both of these are free, open-source software, and both are available in versions for most home-computer operating systems (including Windows). I'll leave it to you to investigate these links and decide whether or not these tools would be useful to you; I'll just note that these tools simply fetch specific files from web servers and print them in raw form (i.e., HTML markup) on your screen. These tools can also display the HTTP headers for the transaction (which typically aren't visible in web browsers). You can then scan the markup yourself for the info you need. These tools don't write or transmit cookie data, nor do they run scripts, so they are very safe to run.
Another alternative is to find a "safe browser" on the web; this would be a web-based utility that lets you enter a URL, then returns the raw content of the file to you in a form that is "massaged" so your own browser won't try to render it as HTML. In this way, the website won't be able to play any dirty tricks on you. For many years, the reliable choice was found at
Sam Spade∞, but this site has since greatly curtailed its operations, and I haven't found a good web-based replacement (let me know via page comment if you've found one). Note that "anonymizing" web proxies are not precisely the same thing, since they will not mangle the HTML before delivering it to your browser.
CategoryPagesUnderConstruction
CategorySpamCopReporting
There are no comments on this page. [Add comment]