Jump to content

[Resolved] Google site search results


Wazoo

Recommended Posts

Posted

Brought here from a PM;

Wazoo --

I wanted to bring to your attention a possible issue w/ the Google site search function for the main SpamCop site. I don't know if this would be something you would look into personally, but I know you maintain the forum, so I thought you might be the proper person to contact - and if not, could pass along the message to the proper person(s).

The other day I was doing a search for a page within the main SpamCop FAQ for a link I remember suggesting and which was subsequently added to the original FAQ. What happened was that I was using the Google search function to lookup the term "formmail" (w/o quotes). I knew that the term existed on a page in the SpamCop FAQ, but I just didn't remember where. However, it appears, that for some reason, the site search didn't turn up the result. I tried this with several other words that I knew should come back with a result, but they didn't either. I just tried it again, as I was reading your reply to my post regarding the PGP question, and I noticed it still wasn't functioning properly.

Here is what is happening:

On the SpamCop.net Help page I entered the term formmail into the search field and used the default search entry for SpamCop & FAQ. The result (which sent me here) reported that my query did not return any matches. Yet, I know the word exists in the FAQ on the page SpamCop FAQ : Help for abuse-desks and administrators : Formmail. Just to verify the problem, I searched for the simple term abuse and received only a single result, which is a link to a reported email. I can also search Google directly for the term "spamcop formmail" (w/o quotes) and it comes back with the above page as the first result.

Out of curiosity, I checked the META tags and robots.txt for spamcop.net and didn't see any problem there that might cause it. The robots.txt even has an entry to disallow the lookup of SC reports (although it appears that line's syntax might be incorrect). I tried the same search for the word abuse on the search box at the top of the forum under SpamCop.net & Original FAQ and got the same results. Strangely though, searching in the forum and the newsgroup both return 10+ pages of results for each.

Anyway, I just wanted to mention this to you, since I would presume it might be causing some users to not find the answers they're looking for on the site and end up posting them in the forum, which I know causes aggravation on your end. So, I hope that might help out. I apologize if this has been brought up previously.

Jon

Posted
I don't know if this would be something you would look into personally, but I know you maintain the forum, so I thought you might be the proper person to contact

Way back when IronPort's involvement became public, IronPort assigned an additional task to a girl to go through the (official) FAQs. Myself and DavidT (perhaps others??) were beating on her InBox quite a bit. It was during that timeframe that the search tool/function on the www.spamcop.net page was updated, basically expanded to match what I had put into place 'here' ....

... search function to lookup the term "formmail" (w/o quotes). .... FAQ: FormMail .... to verify the problem, I searched for the simple term abuse ..

Like you, I tossed in several other words found on that FAQ page, searching the www.spamcop.net site, and like you ... zero results. Toss those quesries against forum.spamcop.net and one gets results from posts, discssions, Wiki entries, on and on. What this means is that the original/official FAQ page has for some reason not been indexed by Google's search engines (or dropped from the index pool for some reason??)

MSN search using "formmail site:www.spamcop.net" returns three pages, to include your referenced FAQ.

Yahoo search using the same string returns three hits, including the referenced FAQ.

Anyway, I just wanted to mention this to you, since I would presume it might be causing some users to not find the answers they're looking for on the site and end up posting them in the forum, which I know causes aggravation on your end. So, I hope that might help out.

Getting good search engine results is a bit of a black art. This is especially true when dealing with a technical subject/area with non-technical participants involved. I know it's hard for some folks to find things 'here' because they have no idea what words are used to describe something .... so they end up using their own, or using a wrongly defined term that they saw used somewhere else. This feeds into that the next person won't find this post, as the next person has yet another way of describing the same issue. Trying to resolve the mystery in words was why I started the Glossary 'here' .. then added in the Dictionary tool .... Dbiel performed the work of transferring all of that to the Wiki .... but as noted, this hasn't seemed to have helped that much in the overall scheme.

What aggravates me the most is that so many folks don't do any research (attempts) at all. For example, it's readily apparent to me that you worked on this for quite a while before composing your question. I have spent over an hour working through various tests, doing some research, re-reading old discussions here, re-reading some ancient e-mails .. and only now getting to the point of posting a response. This is quite in contrast to those that get excited about something and getting their remarks posted is the only goal in mind.

According to http://forum.spamcop.net/forums/index.php?...ost&p=60255 I'm supposed to come up with some sarcastic here, but .... can't quite come up with anything for some reason ??

and if not, could pass along the message to the proper person(s)

Seach engin results are nebulous at best ... note all the companies (?) and even spammers involved with the attempts at getting web-sites bumped up in the search results. That a single (and that's going to be a hard one to prove also) hasn't been picked up yet is a bit of an oddity. The 'path' to that page seems OK, pages around it seem to be indexed .....

R.W. was the last primary action guy on the official FAQ. His last words (seemingly years ago) was that he would only do simply updates to that tool, as everything now had to go up the corporate chain, to include getting approved by the legal staff, before actually being added. But, that a specific page has not been indexed by Google isn't something that he/Deputies could actually do something about, short of contacting Google staff directly and asking for that addtional page index addition (which would probably get a well-deserved laugh ???)

The end point for now .... Google does find that term just fine when bounced against the Forum ... finding the various Foum posts, the Forum-single-page-access-expanded=version of that FAQ (which does point back to the original/official FAQ [which we also note sucks when www.spamcop.net is down]), and the Wiki entry .. so the data will be found 'here' .... and recall that almost all of the entries mentioned above were tools and content placed 'here' due to folks not being able to find it in the original/official FAQ in the first place.

I placed your PM here so it could be found by those that might be involved, offer some sort of explanation to others. Search engines are great, but ..... too much faith in any one tool can lead to some strange results os as noted here, results not found.

Posted

Simple answer: someone from IronPort has most likely gone into the options Google provides to webmasters to deny indexing of their site...or to remove content that's already been indexed. My Google site searches of www.spamcop.net were similarly empty, and you're correct that their robots.txt isn't the cause.

However, if you go to Yahoo and do a search on:

spamcop formmail

you'll wind up with hits to the site, as you also will with other searches, so this supports my contention that someone has gotten Google to de-index the site. Very strange, unless they want all traffic to come here? I suppose someone who cares should try contacting IronPort. :-)

DT

Posted
Simple answer: someone from IronPort has most likely gone into the options Google provides to webmasters to deny indexing of their site...or to remove content that's already been indexed. My Google site searches of www.spamcop.net were similarly empty, and you're correct that their robots.txt isn't the cause.

Wow! Just spent way too much time trying to bring up any of the FAQ pages there (via Google) .... all kinds of 'result' type listings, but nothing actually looked for gets returned.

Looking at the 'cached' pages, 23 July 2007 seems to be an interesting date .....

However, if you go to Yahoo and do a search .... you'll wind up with hits

as I said above, also with MSN ....

so this supports my contention that someone has gotten Google to de-index the site. Very strange, unless they want all traffic to come here?

Possibly something to do with duplicate content .. the original/official FAQ being stagnant for so long, Forum and Wiki content reproducing a lot of that stuff ... but not sure that would have resulted in 'all' www.spamcop.net FAQ pages being removed .... agreed, very odd ,,,,,

I suppose someone who cares should try contacting IronPort. :-)

Actually, one would probably now have to ask Cisco. (announced acquiring IronPort on 25 June 2007)

I'll try another one;

From: "Wazoo"

To: "SpamCop Deputies"

Subject: Google indexing

Date: Sun, 14 Oct 2007 17:24:00 -0500

Who might be responsible for the apparent request for Google to vaporize the www.spamcop.net FAQ page content?

Posted

Turns out that the scripting in both places has been broken for years now. Forum is fixed. I have passed on data to Ellen/Deputies to fix the www.spamcop.net page (for both the 'spamcop' pages and the newsgroup archives .. forum search always worked)

My Google site searches of www.spamcop.net were similarly empty

Therein lies the problem ..... Google was indexing this stuff way back when ..... therefore these old pages are actually found under 'spamcop.net' ... recall that in the beginning, Julian felt that the "www." crap was for kids <g> ..... IronPort basically 'forced' the change to have everything under www.spamcop.net .....

Posted
these old pages are actually found under 'spamcop.net' ... recall that in the beginning, Julian felt that the "www." crap was for kids

Of course the "www" in URLs is for morons....who the heck doesn't know that we're on the "www"? I never enter www when manually typing a URL, and when I find a site that won't appear because of that, I generally notifiy the webmaster of their stupidity. Most of the commercials on radio and TV have finally dropped the "www" and voiceover artists are rejoicing, because it's a bit of a tounge-twister. The "www" host name arose mostly when the Internet was dominated by "gopher" and FTP sites, so "www" was initially used to differentiate, but that was in the dark ages.

As for searching the SpamCop.net site, you can construct Google searches like this:

site:spamcop.net -site:forum.spamcop.net -site:news.spamcop.net searchterm

but Google "site" searches with the "www." in front "spamcop.net" produce very few hits...and that's ridiculous.

DT

Posted

While working on the code, came across some other things (alsi fixed/added 'here')

When did google.beta.groups.com disappear? No wonder I never got returns from that search option .....

Radio buttons returned on the Google result page included news.spamcop.net .. but, of course, when 'corrected' .. all the results come from zeta.cesmail.net .... I've changed that here, but not sure what's going to happen on the 'official' page.

Added the 'last 90 days' option that points to the actual newsgroup server .... not sure what words to use to define the 'archive' link as containing 'all' the existing newsgroup traffic .... can't really use "complete" due to the July-December 2006 dropiut .... and that the HTML for the 'old' archives needs to be re-written so as to match reality (missing files, URLs, etc.)

Posted
Of course the "www" in URLs is for morons....who the heck doesn't know that we're on the "www"? I never enter www when manually typing a URL, .....
Just currious, how do you get around spamcop.net which uses www. forum. webmail. mailsc. members. just to name a few.

There are many reasons not to auto resolve to www; but for those sites that are looking for the general public to find them, to auto resolve to www should be a general practice.

Posted

I have been advised that I have confused people. Here's my last attempt .. someone else want to take a crack at explaining it better?

From: "Wazoo"

To: "SpamCop/Ellen"

References: <027601c80eb0$ec6823e0$6701a8c0[at]HPorGateway> <4712B486.6060901[at]admin.spamcop.net> <02d001c80ec6$50f758c0$6701a8c0[at]HPorGateway> <47136EE3.8050306[at]admin.spamcop.net> <053a01c80f5d$1b7692f0$6401a8c0[at]HPorGateway> <4713BF75.8070006[at]admin.spamcop.net> <055601c80f69$8295b090$6401a8c0[at]HPorGateway> <4713FF83.5010703[at]admin.spamcop.net> <060501c80f93$2a685970$6401a8c0[at]HPorGateway> <47148FF4.3080608[at]admin.spamcop.net>

Subject: Re: Google indexing

Date: Tue, 16 Oct 2007 08:24:23 -0500

It boils down to the "www." .... to clear your confusion ....

go to http://www.spamcop.net/help.shtml

type in > formmail < into the search box, leave "SpamCop and FAQ" selected

click the Search button

see the zero results

go to http://forum.spamcop.net

type in > formmail < into the search box, select "SpamCop.net & Original FAQ" from the dropdown menu

click the "Go" button

Note that http://spamcop.net/fom-serve/cache/270.html is the first return of many ... pay extra note to the lack of the leading "www." in the URL ....

The difference: the Forum code now searches for site content on "spamcop.net"

The SpamCop.net page code still looks for site content on "www.spamcop.net"

Google's own 'internal' code seems to 'inject' the www. also

The issue is that all these old pages are indexed under 'spamcop.net' ... not 'www.spamcop.net'

This and more is addressed by folks other than myself in the Forum Discussion at http://forum.spamcop.net/forums/index.php?showtopic=8839 ..... I continued to carry on with myself in the Topic at <elided> .... which led to some more changes on the Forum code

> sigh -- now you have me really confused. When I asked you if the

> problem was google search on the website you said no, you are were

> asking about a regular google search. And yet it seems that the

> regular google search does work ok. So we are done with that

> problem?

Posted
Just currious, how do you get around spamcop.net which uses www. forum. webmail. mailsc. members. just to name a few.

There's nothing inherently wrong with function-related URLs. My point is that websites, by default, should *not* need "www" in front to function. However, websites should *also* work with the "www." in front, as a convenience to those who insist on typing those superfluous characters. Now, this Forums sub-site *only* works with "forum." and *not* with "www.forum." which is fine, because people can differentiate whether they want to go to the forums or not by either typing "forum" or not, although some "www-brainwashed" people might wind up typing "www.forum.spamcop.net."

For more on the extermination of the use of "www," here are some links:

http://no-www.org/

http://dmiessler.com/writing/its_time_to_drop_the_www/

http://computerworld.com/blogs/node/5895

http://unitstep.net/blog/2006/12/27/taking...out-of-the-web/

http://blogs.guardian.co.uk/askjack/2006/0...to_trebley.html

http://wiki.dreamhost.com/index.php/Removi...rom_your_domain

The primary "yes www" site:

http://www.yes-www.org/

is offline/down at the moment...it no longer appears to be hosted anywhere (perhaps the owner got tired of all the comments that disagreed with his premise that were on his site when it was last online).

DT

Posted
... The primary "yes www" site:

http://www.yes-www.org/

is offline/down at the moment...it no longer appears to be hosted anywhere (perhaps the owner got tired of all the comments that disagreed with his premise that were on his site when it was last online).

Interesting, though O/T to take it further here I will indulge myself to the extent of noting http://web.archive.org/web/*/http://www.yes-www.org/ indicates it ceased being maintained May 16, 2007 however clues from the archived pages lead to http://www.hm2k.com/articles/yes-www
Posted
clues from the archived pages lead to http://www.hm2k.com/articles/yes-www

I don't think that article is by the same author as the owner of the "yes-www.org" website. The owner of "hm2k.com" is apparently in Staffordshire, England, while the other (Michael Hampton) is in Iowa. Furthermore, the article you cited refers to the original "yes-www.org" site as "An odd website for the www prefix."

DT

  • 3 weeks later...
Posted

As noted in yet another discussion elsewhere, my alleged 'fix' doesn't actually work as intended. Yes, it does now return hits from the "Spamcop.net & FAQ" web-pages .... but, of course, the construct of "site:spamcop.net" will of course bring back results from "forum.spamcop.net" also .. in addition to news.spamcop.net ...

Tried the quick and dirty of adding -forum.spamcop.net to the string .. odd, resulting parse sent out then includes "sitesearch=spamcop.net+-forum.spamcop.net+-news.spamcop.net" but the results are from all over ..... constructed a search query directly into the Google input box .... appears that the "site:" construct doesn't work the same as the encoded "sitesearch=" string .... hmm, off to do some research on this small issue ...

Posted
You may want to try the Google Webmaster Center and tools section, if you haven't already.

Target for that page is different, though noting once again that I stil need to clean up some ancient posts to 'fix' some referenced links there that were changed during one of the Forum app updates ...

Spent a bit of time looking at the Custom Search Engine .. but ... not sure about spending the time on developing one of those just now ....

Posted

Looks like (if I was going to do it) I'd have to go with a completely customized "Linked Custom Search Engine' and build all the XML scripts myself. Until a year or so ago, I'd jump on the challenge without a stumble. Current status and conditions don't have me in quite that mood at present. (though the 'challenge does eat at me <g>) And even then, the primary issue is that I could only fix the Forum's search code .. the "Official" page is simply a whole 'nother issue .. and I'm more than a bit tired of that exercise.

  • 3 weeks later...
Posted

but Google "site" searches with the "www." in front "spamcop.net" produce very few hits...and that's ridiculous.

Thank you DavidT ... that post was informative!

  • 1 month later...
Posted

Newsgroup post caused me to do a search 'here' .. still unhappy with the results, but, as I was looking for something in the Official FAQ (one of those pages that had content that folks above didn't want to share, so none of that made it to the embedded FAQ I had once, it is linked to/from the single-page-access-monster FAQ 'here', but I digress) ... I decided to take a look at the Help page on www.spamcop.net ...

OK, as usual, no one bothered to mention anything about work being finished, but ... code there has been modified. However, as stated above, the fix there ends up with the same issue as here, making part of the dropdown-menu selection (and radio-buttons on the folowing screen) a bit non-functional. Per the Topic-starting post query, a search for 'formmail' with SpamCop & FAQ selected returns not only the official/original FAQ entry, but also links to Forum entries, newsgroup archives, on and on ... the issue being that www, forum, news, etc. are all sub-domains of spamcop.net, so they all match that part of the search query.

I'm going to tag this Topic as Resolved, as the primary problem is in fact now fixed. That I'm not happy with the code at either site is my problem, I guess <g>

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...