Jump to content

Spider, Bots, HTML - oh my!


Wazoo

Recommended Posts

Does anyone know if spiders read the underlaying code or the displayed copy?

HTML code is used to write the commands and insert the content that your web-browser then uses to paint the pretty pictures and text on your screen. If you want to scare yourself silly, right-click on this screen and "View Source" (Windows, IE ... other software may require you to select Edit, then View Source) ... all that wildly wicked stuff is the code your browser asked for (and received) when you clicked on the Topic Title to come here and see what was typed in. Somewhere in all that garbage, if you look hard and long enough, you'll eventually see these words mixed in there <g>

So, the answer to the question asked, a spider looking for certain data, say e-mail addresses, has not a care in the world what the web page actually looks like when displayed in a screen ... it's simply going to look for certain text in this data stream ... in the example of an e-mail address, it'll be looking for the [at] symbol, then grab the data surrounding it, up to the space character in front of and after that string of text. Recent virus / trojan tools will simply be sitting off to one side, monitoring all this code as it streams into an infected computer and pluck that same data out of the stream to add to its use later on, be it for the To:, Reply-To:, and/or From: address in the next batch of spew that gets sent out from that computer.

Most of the methods suggesting ways to "hide" an e-mail address on a web site are basically based on a way to prevent the [at] sign showing up in that data stream, be it hiding in a graphic, a bit of java scri_pt to 'build' something that 'looks and acts' like a displayed e-mail address, or a routine / form that handles user input and converts that to an e-mail in the background ....

Link to comment
Share on other sites

...

So, the answer to the question asked, a spider looking for certain data, say e-mail addresses, has not a care in the world what the web page actually looks like when displayed in a screen ... it's simply going to look for certain text in this data stream ... in the example of an e-mail address, it'll be looking for the [at] symbol, then grab the data surrounding it, up to the space character in front of and after that string of text. 

Most of the methods suggesting ways to "hide" an e-mail address on a web site are basically based on a way to prevent the [at] sign showing up in that data stream, be it hiding in a graphic, a bit of java scri_pt to 'build' something that 'looks and acts' like a displayed e-mail address, or a routine / form that handles user input and converts that to an e-mail in the background. ...

33624[/snapback]

Which is not to say a spider might not also render what a browser sees, regardless of the code tricks in the page code. Last time I looked, off-page processing was the recommended security of address method (or the obvious-to-human bits in a posted address to be manually removed or altered, remembering the munging advice so as not to inadvertently create someone else's actual address). Worth bearing in mind that "survival of the fittest" applies to spiders too, and their "generations" can be very short - odds are "they" are ahead of "us".

Loads of programming to render a page the way people see it? Not so, consider a programmer's opinion (not specialized in internet) - one of my sons, actually:

... I know for sure I can write a little app that uses the IE Explorer OCX control to render a web page for me (just set the Navigation URL property, and it'll automagically parse the HTML / scri_pt and render the page).

From there it's just another step to harvest the plain text already rendered by the browser control. I can't recall exactly if there is a way to neatly expose the plain readable rendered page programmatically, but even if there isn't (but I'd bet there is) a way, you can always talk to the IE control as though it were a window of text at the OS API level and get it that way (think copy and paste, but done programmatically). ...

As they say, it's all very well to be paranoid - but are we paranoid enough?

[Added - Gad's Wazoo, some bounder's half-inched your "Admin." shingle at the time of this edit. I trust the replacement to be appropriately imposing!]

Link to comment
Share on other sites

Just for the record, the "[at]" replace with "[at]" Wazoo finally implemented in our forums is actually being done at the code level, so there is no translation required when rendering the page. Both contain the same "[at]" data

Note: his first version maintained the "[at]" at the code level but was displayed as "[at]"

So if you want to save key stokes it is OK to type email address using the "[at]" as the software will write the code and display it as "[at]" and if you wish to confirm it, just edit your post.

If you preview your post before posting it the display will read "[at]" but the code will remain "[at]". The code gets translated when you actually post (submit) it.

Link to comment
Share on other sites

[Added - Gad's Wazoo, some bounder's half-inched your "Admin." shingle at the time of this edit.  I trust the replacement to be appropriately imposing!]

33655[/snapback]

Wazoo's account is having some problems at present, and I have verified that the "bounder" is the right person.
Link to comment
Share on other sites

[Added - Gad's Wazoo, some bounder's half-inched your "Admin." shingle at the time of this edit.  I trust the replacement to be appropriately imposing!]

33655[/snapback]

The story is truely heart-rending <g> On the other hand, I beat the thing senseless and walked away grinning .... I'll be honest, still looking for some answers <g> ... http://forums.invisionpower.com/index.php?showtopic=193443

Wazoo's account is having some problems at present, and I have verified that the "bounder" is the right person.

33658[/snapback]

That he did <g> Absolutely horrible connection at first, couldn't understand what was being said, didn't recognize the voice, I started with the 'hell of a time of night for a wrong number' routine ... Got that straightened out ... he was concerned about waking me <g> ... pointed out I was on IBP looking for answers, whipping up on the Forum server ... he offered up some suggestions on an easy way around it (admitting that those thoughts / actions had not even crossed my mind) .. as it turns out, that catching a bit of data sliding up the screen offered up something a bit odd, as it turns out, that odd bit of text plugged right into a fix .. well enough that I could log in again as myself <g>

Thanks for watching out and noticing the strange activity ...

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...