Jump to content

Project Worth Supporting


Mikey

Recommended Posts

Don't know if this has been discussed here before or not but it is certainly worth looking at this:

http://www.projecthoneypot.org

If you have a web page under your control you can really help out with this. If you have an MX DNS entry to donate, you can help even more.

Studies have shown (can't find the link I had....) that the majority of E-mail addresses are added to spammer's list via harvesting websites and USENET posts. This project attempts to analyze these harvesting activities and provide near-real-time data to the spam-fighting community.

No, I don't work on the project. I just thought it was an excellent example of the kind of folks working for the white-hats out here. Their site is very well done. If you have a web-page, spend 5 minutes and register. Then you can check back in any time and see exactly how many harvesters have crossed your path (and exactly WHO they are).

Link to comment
Share on other sites

  • 2 weeks later...
Don't know if this has been discussed here before or not but it is certainly worth looking at this:

http://www.projecthoneypot.org

20532[/snapback]

I first discovered Project Honeypot on this thread, on 24 November. Kudos, Mikey... I checked it out very carefully and decided to join up. As I mentioned in another thread on this board, I think it's a brilliant concept. It puts spammers on the defensive, instead of end users. It cuts spam email address harvesters off at the knees.

And unlike Lycos Europe's nasty "Make Love Not spam" garbage, it does so without harming the Internet we're all trying to save.

I'm adding their honeypots to all domains I own (and several I administer - with my client's permission, of course). I especially like their PHP version. It seems much simpler and less error-prone during installation than the various other flavors they offer. The instructions are clear and easy to follow.

The people running Project Honeypot are surprisingly responsive to your inquiries - and very ethical. What a stunning surprise. I encourage everyone to check it out and join. And pass it on. They need lots of honeypots and MX record donations in order to compile stats that will make a dent.

Check out: http://www.projecthoneypot.org

Link to comment
Share on other sites

  • 2 weeks later...

I stumbled upon www.projecthoneypot.org recently, and installed a hidden 'honeypot' page on my website, to catch people who are skimming my site for email addresses.

Amazing.

So far, in just a week, this hidden page (so hidden that a normal user won’t ever see it) has had 101 visits.

Five pieces of email have been sent to unique email addresses which only appear on these pages.

In particular, 80.46.67.1 has visited eleven times: this is an IP address in the UK owned by Tiscali. An IP address in China has sent email to the unique email addresses that were displayed to the Tiscali-owned IP address - leading me to suspect that a trojan program is installed in some unsuspecting user’s PC, which is doing the dirty work for this Chinese company.

Other visits have been from computers in Peru and Korea - including arch-spamming company Kornet.

This is a *really* good idea: and moderately painless, too.

If you can, please do add a honeypot to your site: it looks a great extra way to combat spam.

Should these people and SpamCop.net sit down and have a chat?

Link to comment
Share on other sites

I finally (re)found the link I mentioned above. Great reading.

http://www.cdt.org/speech/spam/030319spamreport.shtml

Their data is a couple years old but I bet it is completely relevant today and likely very educational for the next few years. About the only change I see in spammers' habits is the increasing use of worms and viruses to harvest addresses from local hard drives.

Project Honeypot is just starting out. I'm proud to say I jumped on in their first few months and I'm looking forward to a fun ride! They are expecting to turn on the statistics pages in the next few months as the results become meaningful. I would expect things to move exponentially in the next couple years.

Certainly the scumbags out there will figure it out and find a way to detect the dynamic pages -- and just not harvest them -- but we have some tricks up our sleeves too.

Link to comment
Share on other sites

I finally (re)found the link I mentioned above.  Great reading.

http://www.cdt.org/speech/spam/030319spamreport.shtml

Their data is a couple years old but I bet it is completely relevant today and likely very educational for the next few years.  About the only change I see in spammers' habits is the increasing use of worms and viruses to harvest addresses from local hard drives.

Project Honeypot is just starting out.  I'm proud to say I jumped on in their first few months and I'm looking forward to a fun ride!  They are expecting to turn on the statistics pages in the next few months as the results become meaningful.  I would expect things to move exponentially in the next couple years.

Certainly the scumbags out there will figure it out and find a way to detect the dynamic pages -- and just not harvest them -- but we have some tricks up our sleeves too.

21374[/snapback]

Mikey, I don't run a server or website or anything like that, but I'm curious how this works. I take it that SendMeMoreInfo[at]mysite.com is still visible to a normal user?

Are the e-mail addresses like GUIs, one for every IP address and one for every second (or tenth of a second, or whatever it is)?

Couldn't a clever harvester query repeatedly and see that the page gives it a different e-mail each time?

Is this information secret? :)

Link to comment
Share on other sites

Mikey, I don't run a server or website or anything like that, but I'm curious how this works.  I take it that SendMeMoreInfo[at]mysite.com is still visible to a normal user?

Are the e-mail addresses like GUIs, one for every IP address and one for every second (or tenth of a second, or whatever it is)?

Couldn't a clever harvester query repeatedly and see that the page gives it a different e-mail each time?

Is this information secret? :)

21414[/snapback]

I signed up for this the other night. What you do is use the code provided on the honeypot site, insert it on your site somewhere and the Honeypot site does the rest. It uses fake email addresses to track harvesters and bots, then record them into the master database. I haven't monkied around with it too much, so I don't know all the technical specifics, but it looks like a great tool to track who's sniffin my sites.

Link to comment
Share on other sites

I'll try to give a little snapshot here of how project honeypot works, as far as I know it. I am by no means an expert and am not officially associated with the project. They have a really good FAQ which might make for some nice reading some cold winter night.

The important thing to remember is that we often overestimate the intelligence of the spammer and, more importantly, his harvesting tools. There is no doubt that a crafty programmer could perform delicate surgery on all the addresses that a spammer collects. But the truth is, they don't care and don't spend that much time on it. Most are little more than scri_pt kiddies running 5-10 year old software. If they get 1 out of 10 addresses that are valid, they are happy.

Another important thing to remember: as the spammer sends E-mail to the address supplied by the honeypot page, he has no reason to believe that anything is amiss. His spam is happily accepted by the recipient. Here's the deal.

1. I install a dynamic page on my webserver (either Perl or PHP) that is, like all dynamic pages, generated by my server every time it is called by a GET request from a remote client. The link to this page (say, from my front page) is hidden so that the average user can't see it -- its invisible to everything exept robots and spiders. The name of this hidden page is different for every server in the project. I can name it anything I want.

2. The scri_pt produces the HTML for your browser or the spammer's spambot. Tucked away in this page that is generated is an E-mail address, complete with the mailto: HTML tag. However this is not visible to someone with a browser, you could only see the E-mail address if you did a "view/source" of the page or if you were a spambot -- only looking at the raw HTML.

3. Here's the important part. That E-mail address is actually created by the Project Honey Pot servers a split second before my page is sent to the requesting host. So the project knows: a) when the page was requested. B) who requested it (IP address). c) via which server (mine) it was handed out. It logs these along with the E-mail address that was generated.

4. Just as important: The E-mail address generated is completely valid with a valid domain that is (secretly) owned by the project. Any mail sent to this address will be delivered and (secretly) disected by the project's servers.

4. In the future, if spam is sent to the address that was handed out, we not only know who the spammer is, but exactly when, where and who collected that address for the spammer. If nobody ever sends E-mail to the address, no harm, no foul.

A specific example:

0. I get a personalized scri_pt from the project and install it on my server at http://mysite.com/pizza.php

1. Sally Spammer turns her spambot loose on the world from her home DSL (4.3.10.10)

2. The spambot hits my home page and finds the hidden link to http://mysite.com/pizza.php

3. The spambot goes to /pizza.php and issues a GET for that page.

4. My server starts to generate the HTML code for the requested page. (spambot waiting)

5. Part of the internal page generation includes a call to project honey pot and requests an e-mail address. (spambot still waiting)

6. Project Honey Pot logs my server, the requesting IP (4.3.10.10), the date/time and makes up an arbitratry but valid E-mail address: john[at]jankyho.com

7. My server receives john[at]jankyho.com back from the project and sticks it into the HTML code for my page. (spambot waiting)

8. I serve the page up to the spambot at 4.3.10.10. I don't keep track of anything, I'm done.

9. Sally's spambot slurps up the page and greedily finds the html code for the email address of john[at]jankyho.com and stuffs it away.

10. Sally sells her list of E-mail addresses to a spammer in New York.

11. Three months from now the New York Spammer wants to sell Viagra. He digs into his list of suckers and comes up with 1000 addresses, including john[at]jankyho.com.

12. New York turns his spam machine on (220.20.20.1) and sends out spam. When his server goes to send something to the jankyho.com domain, he does a DNS lookup for the MX record of jankyho.com. He is given the mail server address for one of the Project Honey Pot mail servers (of course this is not obvious to him or his spam-server).

13. The Project's mail servers get a spam sent to john[at]jankyho.com coming from 220.20.20.1. It takes it in, says thanks, and lets the spammer go on his way.

14. The cycle of scum is complete. We now know that Sally Spammer is the root of all evil. We can positively identify her by her address (4.3.10.10) and know exactly where and how she got the E-mail address. The johny[at]jankyho.com address was never handed out to anyone but her (and never will be handed out again).

Some things to note.

Sally and New York never know this is going on.

We make no attempt to actually stop the spam as it is being sent.

The only thing required to be secret is the list of domains owned by the Project (the ones that actually receive the spam.)

An average surfer would never even see these pages let alone get to the hidden E-mail address.

Even if a casual user or search engine DID get the E-mail address, nothing would ever happen unless they actually sent E-mail to that address.

There are several ways a spammer could defeat this approach. None of them likely in the near future.

Link to comment
Share on other sites

That's about what I thought. I see quite a few ways a spammer could defeat this.

Honeypot does not have Sally Spammer's name. They have her IP address. However, Sally runs her spider on compromised machines, public and semi-public machines, through hacked and stolen accounts, and on unethical providers.

Anyone who can write a spider can write a spider to detect a dynamic page and touch it twice to see if the address has altered.

Nothing good actually happens until Sally is sued. Then you have to explain it to a judge, explain why it's not entrapment, and convince him/her that Sally agreed to the contract to pay $500 per address, when in fact she never even saw the thing.

Link to comment
Share on other sites

That's about what I thought.  I see quite a few ways a spammer could defeat this.

Again, I stand by my statement that the HUGE majority of spammers will not go to ANY effort to defeat this for at least several years.

Honeypot does not have Sally Spammer's name.  They have her IP address.  However, Sally runs her spider on compromised machines,  public and semi-public machines, through hacked and stolen accounts, and on unethical providers.

Which is also true of many of the SOURCES of spam. So we can do the same thing to Sally that we do to the sources. Compromised machines can be treated just like any other source of problems. See http://www.spamhaus.org/xbl/index.lasso Yes, I know Sally is not directly sending mail but people are looking at compromised machines in ways other than just DNSBLs.

And I can tell you that losers do conduct harvesting from personal accounts on occasion. I know of one rabid anti-spammer who used to implement a similar dynamic email hand-out scheme on his site, handing out encoded addresses that he could trace back to his logs. (There have been little PHP routines to do this for years). He worked with several ISPs and people did lose accounts.

Anyone who can write a spider can write a spider to detect a dynamic page and touch it twice to see if the address has altered.

Spiders will not stay away from dynamic pages simply because there is a huge base of things like PHPNuke and PHPWebsite and other CMSs that offer them a wealth of addresses. So although simply staying away from dynamic pages would keep them clear of Project pages, they won't do it. The rewards outweigh the risks. As for touching it twice.....lets just say.....that won't do them much good. ;) I understand what you are saying but....somebody's already thought of that.

Nothing good actually happens until Sally is sued.  Then you have to explain it to a judge, explain why it's not entrapment, and convince him/her that Sally agreed to the contract to pay $500 per address, when in fact she never even saw the thing.

Again, you could say this to some degree about any spam abatement measure -- blacklists, poisoning, spamtraps, tarpits, address obfuscation.... How many people have Spamcop sued? How many people do spam victims normally sue? Nothing good happens? NOTHING?

As I said in the little example a few posts above, the Project is not showing all their cards. Perhaps this is just an academic research project. Perhaps they are going to do nothing more than post the statistics and let the observer draw their own conclusions. Perhaps they expect an earth-shattering revelation about how spammers operate. Perhaps they are going to start a dynamic blacklist for websites. We'll just have to wait and see.

I think we all agree that it can't hurt and it offers a unique view that few others have systematically explored before.

Link to comment
Share on other sites

That's about what I thought.  I see quite a few ways a spammer could defeat this.

21435[/snapback]

I agree. But I have one of their honeypots running anyway. At the end of the day, the New York Spammer sent spam to an address that was demonstrably harvested off a website, after the CAN-spam Act came into force, which never "opted in" to anything. ISTM there's got to be a way to make that hurt...

(When I was installing the honeypot last month, I got polite, proactive, human email from one of the guys at the Project offering to help with a configuration problem they identified at their end. I'm very impressed!)

Cheers, Nick

Link to comment
Share on other sites

  • 5 years later...
interesting 98, but after plowing through most of your reference I don't see to much relevance to my understanding of Project Honey Pot.

There's no info anywhere as to why it's down.

Perhaps this would help:

1. Project Honey Pot goes along swimmingly for six years

2. Spammer group exploits public details of PHP so spammers can avoid honey pots

3. PHP goes into deep re-tooling, as the author/hacker at the posted link said they probably would do

HTP, HAND

Link to comment
Share on other sites

Saw this on Honey Pot's maintenance page today.

On April 23, 2010 Project Honey Pot encountered a major hardware failure on the primary database server. A solid state storage device containing all of the database indexes and transaction logs failed. We have been working with the vendor and expect to have the replacement hardware on site today. After the replacement hardware is installed the database will need to be rebuilt and reindexed. The rebuild process will take a couple of days to complete. We currently expect to have the site back online by May 5, 2010.
Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...