Poullting a spam email list

curlytail · July 13, 2010

Hi There,

Working on the basis that the margin on which spammers work is very slim and wishing to make that margin even slimmer, I encountered a fake email list which could be copied and pasted to a site using a discreet link on the homepage. This email list contained about 150 fake emails and was specifically designed to be harvested and added to email lists used in spamming. This would make a certain percentage of emails in any spam list ineffective and reduce the margin of success further; polluting a spam list.

As a programmer, I added to a site a similar page which generates a fake list if addresses on the fly, meaning that each time the emails are harvested, a new set will appear different to the last time the page was crawled.

I am wondering if there's any method the people harvesting these addresses try to validate these emails and if this is waste of time.

Cheers

Curlytail

Farelf · July 13, 2010

...I am wondering if there's any method the people harvesting these addresses try to validate these emails and if this is waste of time.

Hi curlytail,

No doubt there will be variations (there are spammers of all sorts) but on the occasions we get misdirected bounces the sheer volumes of undeliverable spam back to our addresses, spoofed by the spammers as "Reply-to:" or "From:" addresses, suggest the current mainstream spammers rely purely on volume with no discernible attempt at validation.

Validating email addresses is not a 100% reliable proposition in any event - not all service providers confirm invalid addresses anyway but invalid domains at least are easily confirmed.

Generally, fake addresses are not recommended under rfcs - simply because they might correspond to real addresses through coincidence. That's why there are specified addresses, domains and top-level domains for test and discussion purposes - like user[at]domain.invalid. I suppose an automated system would generate some pretty improbable addresses but you wouldn't want to accidentally publish someone's actual address for scraping.

Apart from that, I couldn't really say if it is a waste of time or not. Certainly any address on a well-visited and well-linked webpage will get picked up by the spammers fairly quickly and 'once on a list, always on a list' it seems. I'm sure many of us could talk about addresses eventually abandoned or relagated to personal spamtrap duties on account of them apparently being on a whole lot of spammers' lists.

turetzsr · July 13, 2010

Hi, curlytail!

<snip>
As a programmer, I added to a site a similar page which generates a fake list if addresses on the fly....

<snip>

...Please consider stopping your practice! I understand the intent but an unintended byproduct is to add to one of the big problems spammers cause, which is to add a huge volume of junk to the internet, causing more time and resources to be used than warranted.

Lking · July 14, 2010

Not to pile on, but do the math.

One good email address generates one speck of noise on the web. There is a reasonable probability of getting stopped in a filter.

One bad email address generates 2 specks of noise on the web, the original spam and the bounce back to whoever the return address is. Being a bad address no spam filter.

Both cost the spammer the same. One cost the web twice as much bandwidth as the other.

I think we, the web, lose by adding bad addresses and thereby adding to the wasted bandwidth.

I would think blocking the sender, as in spamcop, or identifying the harvester, as in projecthoneypot.org or identifying the spamvertised websites, as in knujon.com or identifying the phishing sites, as in phishtank.com would be more productive.

curlytail · July 14, 2010

Thank you for your comments and feedback, though they were not what I expected.

I'm sorry to say that much of the argument about internet noise doesn't take into account these questions:

turetzsr says:

"I understand the intent but an unintended by product is to add to one of the big problems spammers cause, which is to add a huge volume of junk to the internet, causing more time and resources to be used than warranted."

If the MX record (and trust me these domains I generate DO NOT exist) cannot be found by the spammer's SMTP server then the junk will get no further than their own server. Isn't it true then that not even one "speck" of email noise is generated on the internet except between the spammer's client machine and his mail server?

LKing says:

"but do the math. "

If what I mentioned above is correct and no email goes further than the spammer's mail server (it won't try to send out to unresolved domains) then is there any math to do here?

As a rule of thumb the half life of an email list is calculated to be three months. If this email list is refreshed with dirty data which won't get past the spammer's mail server, then the critically small margin that these guys work on becomes perilously thinner. This will effectively reduce the noise as those "specks" which are valid addresses which are resolved and get past the mail server become fewer and fewer until the spammer's operation becomes unviable.

When comparing the 3k size of an email against say the traffic on a single facebook page, google, twitter or porn site page, the "specks" being considered here are infinitesimal and not a consideration at all (for the record the Google logo is twice the size of an average spam email).

One consideration is the space on mail servers taken up by junk mail. If dirty data means less emails reaching mail servers then here a real and tangible benefit.

I appreciate what you guys are saying, but in summary the issues of accidentally generating a bona fide email address and the bandwidth taken by an email which won't get past the mail server, are not issues at all.

Thanks for the feedback guys. Much appreciated.

Curlytail.

agsteele · July 14, 2010

I am wondering if there's any method the people harvesting these addresses try to validate these emails and if this is waste of time.

I think the anecdotal evidence is that spammers do not attempt to validate but without speaking to these people it would be difficult to prove.

I would say that generating these fake addresses is a waste of time in that the spammers are already processing millions of Email addresses that do not exist (I have a client whose domain receives thousands of spam messages each day for destinations which do not exist and are simply dropped by our mail server) so I don't see that adding a few extra from your list will make the slightest difference.

But if it makes you feel good then I guess you'll be happy to continue.

Andrew

curlytail · July 14, 2010

(I have a client whose domain receives thousands of spam messages each day for destinations which do not exist and are simply dropped by our mail server)

For this to happen the domain would have to exist. In my case the domains don't exist. The destinations you mention would be accounts that don't exist on that domain.

But good point. Cheers

Curlytail.

turetzsr · July 15, 2010

<snip>
If the MX record (and trust me these domains I generate DO NOT exist) cannot be found by the spammer's SMTP server then the junk will get no further than their own server. Isn't it true then that not even one "speck" of email noise is generated on the internet except between the spammer's client machine and his mail server?

<snip>

...How would the spammer's SMTP server know to stop the junk? Doesn't it have to send messages out to at least a DNS to find out that there is no such domain? Okay, admittedly the DNS may be within the spammer's domain. How about the bandwidth taken to pull the fake e-mail addresses from the server on which they are listed to the spammer?

...Given that spamming is theft of internet resources, it's hard to see how increasing the "cost" of those using those resources in the manner you suggest is likely to have the impact you seek. If I'm a spammer and I have 100 million e-mail addresses that never result in revenue and 2 addresses that do, I am not hurt at all if the only change I see is that I suddenly have 101 million e-mail addresses that never result in revenue (because your scheme has added 1 million non-existent addresses) and 2 addresses that do provide me revenue.

curlytail · July 15, 2010

...How would the spammer's SMTP server know to stop the junk? Doesn't it have to send messages out to at least a DNS to find out that there is no such domain? Okay, admittedly the DNS may be within the spammer's domain. How about the bandwidth taken to pull the fake e-mail addresses from the server on which they are listed to the spammer?

...Given that spamming is theft of internet resources, it's hard to see how increasing the "cost" of those using those resources in the manner you suggest is likely to have the impact you seek. If I'm a spammer and I have 100 million e-mail addresses that never result in revenue and 2 addresses that do, I am not hurt at all if the only change I see is that I suddenly have 101 million e-mail addresses that never result in revenue (because your scheme has added 1 million non-existent addresses) and 2 addresses that do provide me revenue.

The spammer is unlikely to have his own mail server as these are easily blacklisted bringing to a halt their entire enterprise. Spammers either seize control of an entire server by brute force hacking or use a whole host of bona fide ISP servers sending emails at spaced intervals to prevent suspicion. Using hundreds of these infected ISP servers they would use the servers own DNS service to find the domain.

The band width is, as mentioned previously, miniscule compared to almost any internet page pulled from server which happens continuously.

If you are a spammer with 100 million email addresses then it's likely that you have had to refresh this as many addresses will have been defunct for example. Fresh email addresses are the holy grail of spammer scraping and to add another thousand new, replacing a thousand defunct is a good trade. If those thousand fail by being false then that's a thousand bona fide addresses which are saved from spam.

Each time my page is visited, the addresses are new and randomly generated so dupes are highly unlikely. If crawled once a month then each month that page throws in a thousand useless addresses which replace a thousand possible good addresses. If I put this onto twenty of my web sites then the enterprise becomes more worthwhile. If every webmaster put a single page like mine onto a site then spammer's would replenish their lists with completely unworthwhile addresses and reap not a penny reward, still facing the expenses involved in running bots and harversters.

I agree that this is a drop in the ocean but I know of others who are doing this and hopefully this will erode margins for spammers making their existence more tenuous. A spammers margin of success is razor-blade thin as it is. For them to invest time, money and effort in setting up a spam run with tens of thousands of hopeless addresses from the start cannot hurt, surely.

SpamCop 98 · July 15, 2010

For this to happen the domain would have to exist. In my case the domains don't exist.

They don't exist today, but there is no guarantee that they will not exist tomorrow. And there are people who will register as3rtrgav6vwer8has2df.com if they come across it enough.

The spammer is unlikely to have his own mail server

Well now that's some old school thinking. There are large IP blocks that are either flimsy cover "ISPs," where the spammer games the system as you mention by rotating through different IP#s, or hi-jacked unused IP#s. Believe me there are millions of dedicated mail servers controlled by spammers. And I'm not talking about compromised user's machines.

If every webmaster put a single page like mine onto a site then spammer's would replenish their lists with completely unworthwhile addresses and reap not a penny reward, still facing the expenses involved in running bots and harversters.

What expenses?

I agree that this is a drop in the ocean but I know of others who are doing this

That's not a good reason for you to do it.

In fact, there's no good reason to do it. If you believe 150 or even 150,000 invalid email addresses are going to slow even the smallest spam operation you have more research to do. Spammers need help generating invalid addresses about as much as the Gulf of Mexico needs more oil.

turetzsr · July 15, 2010

The spammer is unlikely to have his own mail server as these are easily blacklisted bringing to a halt their entire enterprise. Spammers either seize control of an entire server by brute force hacking or use a whole host of bona fide ISP servers sending emails at spaced intervals to prevent suspicion. Using hundreds of these infected ISP servers they would use the servers own DNS service to find the domain.

...And I think this line of reasoning (plus what appears to be more often the case these days, which is infected individual PCs that are "trojans" that get included in "botnets") supports my position. Your scheme will cause network traffic that would not otherwise happen in packet flow from the server with your web page to the seized systems, at the very least.

<snip>
If you are a spammer with 100 million email addresses then it's likely that you have had to refresh this as many addresses will have been defunct for example. Fresh email addresses are the holy grail of spammer scraping and to add another thousand new, replacing a thousand defunct is a good trade. If those thousand fail by being false then that's a thousand bona fide addresses which are saved from spam.

...Wow, I consider that a significant number of hypotheses for which I've seen no data to support. Why would spammers care how many worthless addresses they have? If I have 2 good addresses, I don't think I care how many bad addresses I have because I don't pay for the resources required to have those bad addresses and therefore I have no incentive to clean out the bad addresses.

<snip>
... hopefully this will erode margins for spammers making their existence more tenuous. A spammers margin of success is razor-blade thin as it is.

<snip>

...This is a repeat of your hypothesis which so far does not seem to be supported. I just don't see any penalty to a spammer from invalid e-mail addresses (see my arguments, above). I'm not trying to convince you to give up your argument, I'm just saying I'm not yet convinced. I still believe you're causing additional network traffic to little or no benefit.

Lking · July 15, 2010

If you are a spammer with 100 million email addresses then it's likely that you have had to refresh this as many addresses will have been defunct for example. Fresh email addresses are the holy grail of spammer scraping and to add another thousand new, replacing a thousand defunct is a good trade. If those thousand fail by being false then that's a thousand bona fide addresses which are saved from spam.

As Steve said, not sure these assumptions are supportable. Experience shows that "once on a list, always on the list." Email addresses I used in a business that I closed 8 years ago are still receiving spam. No outgoing traffic just incoming spam. In fact more than before. If spammers bother to scrub bad/defunct addresses, you would think scrubbing addresses that report spam would be advantageous to drop those. There is no evidence of that either.

Additional counter evidence is obtained from poorly configured mail servers. When the server uses the Reply or From lines to bounce bad addresses and I am on the receiving end, I get addresses that are just random strings before {AT} or bounces in alphabetical order by time stamp. The spammer doesn't care. I would be more productive if you spend you time educating the public NOT to respond to spam.

Your assumption is faulty that the spammer has 100 million addresses and that if you "pollute" his list that leaves only 99 million addresses to spam. If he finds your addresses why doesn't his list become 101 million addresses long? As stated there is not evidence of a refresh cycle.

Each time my page is visited, the addresses are new and randomly generated so dupes are highly unlikely.

As a programmer you should know better. If truly random one sequence (address) is just as likely as not to follow an other. So curlytail is just as likely to follow curlytail as any other name. If that's not true then they are sequentially dependent which is not random, or even close.

If crawled once a month then each month that page throws in a thousand useless addresses which replace a thousand possible good addresses. If I put this onto twenty of my web sites then the enterprise becomes more worthwhile. If every webmaster put a single page like mine onto a site then spammer's would replenish their lists with completely unworthwhile addresses and reap not a penny reward, still facing the expenses involved in running bots and harversters.

Again your assuming the spammer has a fixed length list. If they find more addresses they used more addresses. Remember memory is cheap, they just build longer list.

A spammers margin of success is razor-blade thin as it is.

Not sure you assumption about thin margin is valid. The occasional news stories about spammers or bot nets being closed down don't sound like their working "on the margin" to me. In fact the news stories about millions of dollars make it sound profitable. It does seem odd that news stories about all the people that get swindled out of their savings responding to some email deal, doesn't slow it down. but it doesn't.

rconner · July 15, 2010

People used to do this sort of thing quite a bit, and it was usually called "spambait." I don't run into it as much anymore these days, perhaps for the reasons suggested by the other posters.

Ironically many spammers use essentially the same trick you are proposing -- picking a known-good domain, and then generating random e-mail addresses and attempting delivery. They have huge, diffused, and untraceable resources for sending mails to MX hosts, so they probably don't mind the noise arising from undeliverable addresses.

-- rick

turetzsr · July 15, 2010

<snip>
Remember memory is cheap, they just build longer list.

<snip>

...Just noting that memory (as well as disc space) being cheap is indirectly related to spammers building longer lists -- it reduces the incentive of the service providers whose services spammers are stealing from taking them down. The spammers aren't paying for the memory (or disc) or the resources to create and send the packets all about!

Lking · July 15, 2010

Just processed some more spam and in the process it accrued to me this whole thread in based on the assumption that spammer's rule #3 is faults.

If there were a real though process involved why would I receive:

20-40 emails in Russian every day? What is the probability that some one with an email account at a domain registered in the U.S. understand Russian?

10-20 emails at a time with the same subject? If there was a possibility that someone would open one spam, the probability must vanish exponentially as the number of identical emails received increases.

a large number of spam (any topic) addressed to webmaster or postmaster? If there were anyone with an email address that would NOT be likely to open spam/click on zipped attachments you would think the *master would be the most knowledgeable and the least likely.

Because of rule #0 and James' Axioms to rule #3 they don't care. Their cost are nil. I dough a few bad email addresses (even a few thousand) effect them even on the margin.

If cost or throughput were an issue, there are lots of ways they could reduce overhead/increase ROI. But it obviously is not worth the effort. They must be doing just fine. <_<

enigma1 · August 6, 2010

IMO There is more to it about the mysterious emails that seem to have no purpose.

As it is reported in the forums and FAQs the spammers can integrate signatures in the body of mails. Therefore as these reports sent back to ISPs and host owners the likelihood of spammers knowing the email in the report is very high.

Configuring the server with a catch all and file reports for every spam email on every account (valid mail accounts or not), may help because it can hide the valid e-mail accounts to an extend as signatures will now indicate reports filed from every account that received spam not just valid ones. The spamcop process removes the localhost forward references from the emails as far I can tell so the headers sent with the report should look identical whether the mail was received via a valid account or not.

Bouncing emails to the sender is a goldmine for spammers. They do utilize the bounce mechanism of a host to safely sent mail to others, so never bounce e-mails without proper verification (something I found very hard to setup and automate so for me is out of the question).

About the malware attachments sending even to host admins, I think the idea behind it is human error and confusion. If the admin opens the attachment - and there is a good possibility about it - they can check if the malware was identified or not (the admin may not directly click to open the attachment but usually the AV will, or whatever tool is in place that scans for it). This may work like with fraudulent orders placing an order for $1 see if the card works (reported or not as fraud), then sometime later, place the real order. There is a window of opportunity for them and they can have tools in place for detection.

I would prefer to see the reporting mechanism of spamcop to be safer with respect to signatures but I don't know if that is possible or not. Basically send the host a link about the reported spam instead of the mail headers/body. It will add an extra step for identification purposes, leaking out info to spammers, that right now can be fully automated and processed. Or perhaps using different report formats that change often of reports sent to hosts/isps, that will make automation harder.

From e-mail reports I have access to, seems some of the spammers are very systematic identifying valid e-mail accounts and how they attempt to compromise systems.

Poullting a spam email list

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived