Jump to content

Spiders are allowed to harvest here!


DavidT

Recommended Posts

I tried searching Google for some random text strings (in quoted, phrases searches) found in these forum messages and am alarmed that I'm getting hits. The reason this is alarming is that people often post real email addresses in these forums, and if Google is allowed to harvest them, perhaps all spiders/robots are being allowed to do so, including those belonging to spammers.

I would strongly suggest that a universal "robots.txt" exclusion be installed on this server if it isn't already. Also, the admins should be able to remove existing Google database hits to these forums by going to:

http://www.google.com/remove.html

scrolling to the bottom and clicking on the Automatic URL removal system at:

http://services.google.com/urlconsole/controller

Once there, you create an account (which will be verified) and then you can deal with Google to remove URLs from the Google index. I can't remember if wildcards are allowed, which would be the only practical solution in this case.

Haven't we all be harvested enough elsewhere?

DT

Link to comment
Share on other sites

I tried searching Google for some random text strings (in quoted, phrases searches) found in these forum messages and am alarmed that I'm getting hits. The reason this is alarming is that people often post real email addresses in these forums, and if Google is allowed to harvest them, perhaps all spiders/robots are being allowed to do so, including those belonging to spammers.

I would strongly suggest that a universal "robots.txt" exclusion be installed on this server if it isn't already.<snip>

...But SpamCop search is powered by Google so I'm not sure if what you suggest is possible. I think the real answer is for those of us posting to these fora to be aware that it is publically available and, as such, anyone can see anything posted here. In fact, many participants in these fora have taken it upon themselves (thank you, those of you who do this!) to warn people who make the mistake of including information that can identify them, such as unmunged e-mail addresses.

Link to comment
Share on other sites

...But SpamCop search is powered by Google so I'm not sure if what you suggest is possible.

14045[/snapback]

I noticed that there's a customized Google site search in the Help area. Initially, if you enter something in the box and use either of the "dropdown" options ("SpamCop and FAQ" and "Old discussion archives") you will not get hits from these Forums. That's what I based my "exclude all robots" suggestion on. However, I notice that once you do a search in one of those areas, and are viewing the results on the Google site, that there's a third option to "Search forum.spamcop.net" which, interestingly enough (and probably yet another example of needed updates, bug fixes,etc.) is not an option on the search function on the primary Help page.

Given that, a decision would have to be made as to whether to rely on the Search function of Invision Powerboard to search these forums and let Google index and search everything else. That would offer the users of this forum the best protection against harvesting, but other options are also possible. Leaving the users unprotected shouldn't be one of the options.

DT

Link to comment
Share on other sites

Why does it matter if they are harvested?

No one here has anything to hide..............

14061[/snapback]

Most people don't....except for the users who post their actual email addresses in spam headers, etc.

I just checked, and althought these URL's produce exclusion lists:

http://mailsc.spamcop.net/robots.txt

http://www.spamcop.net/robots.txt

this one doesn't exist:

http://forum.spamcop.net/robots.txt

and so it seems that the forums aren't protected from harvesting, which is my point.

David T.

Link to comment
Share on other sites

I have no idea what David is talking about, but it sounds as if there is a way for forums such as this to prevent spiders from gathering addresses. However, it would prevent one from searching the forums for answers by using Google. I can't really offer an opinion on the technical aspects.

However, since the help forum was established for the newbie, who doesn't know that publishing one's address in a forum such as this can result in a spammer harvesting the address, possibly he has a point.

Or possibly, it should be part of the instructions on how to post. (Good luck on that!)

And that's probably the best answer to suggestions for improvements - especially those aimed at the technically non-fluent. Whatever the merit of the idea is, there will probably be no change.

Miss Betsy

Link to comment
Share on other sites

Like I said, who cares?

Did you post your address?

14099[/snapback]

No, I didn't, but I care, and the people who run this system should care because IMO, it's not necessary to allow all the robots to harvest everything posted in these forums.

Miss Betsy "gets it" also and seems to care....quoth she:

...the help forum was established for the newbie, who doesn't know that publishing one's address in a forum such as this can result in a spammer harvesting the address...

Exactly, and SilverFalcon also gets it...here's what he had to say just now:

http://forum.spamcop.net/forums/index.php?...indpost&p=14116

(oh goodie....this post just helped me become an "Advanced Member" -- all you have to do is make 100 posts..)

DT

Link to comment
Share on other sites

The only thing we don't know is how the "Powered by Google" thing works. That may be the reason for the lack of limit, but I would think that it is more of an oversight.

Have you sent an email to JT on this subject? That is the only way anything can be done about it.

Link to comment
Share on other sites

some further speculation on the role of spiders in these forums and the lack of a "robots.txt" file on this server....

I searched the Invision Powerboard site and came up with this:

http://www.invisionpower.com/documentation...hp?page=31&p=19

It's possible that when the spiders/bots are harvesting messages from these forums, that they're showing up as "anonymous members."

In any case, users should be strongly warned NOT to post anything sensitive here, or anything they don't want archived in the search engines forever, such as email addresses. There should be a primary pinned item (maybe "Read this before using this forum") at the very top of each forum, IMO.

DT

Link to comment
Share on other sites

There should be a primary pinned item (maybe "Read this before using this forum") at the very top of each forum, IMO.

This would be a good idea, but people don't read the pinned items we already have.

There probably would need to be a "read this and accept it" type of pinned post before people are even allowed to reach the forums.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...