topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Friday March 29, 2024, 8:27 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: google results  (Read 6451 times)

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,823
    • View Profile
    • Donate to Member
google results
« on: November 23, 2009, 10:18 AM »
hello

in google search results, some webpages' text preview has more text that if you visit the webpage itself (where full view of the whole text is restricted)

I am not talking about webpages that are archived/cached, or at least the webpages I am talking about don't have a 'cached version' link when they appear in google results

why this happens? how can google robot retrieve all the text while a regular user can't?
how can one see what google robot 'sees'? maybe disguish as a google robot?

thanks


4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5,641
    • View Profile
    • Donate to Member
Re: google results
« Reply #1 on: November 23, 2009, 06:05 PM »
It's normally because the website has inserted all types of search terms into the pages' meta data, (or some other non-visible portion of HTML code).  If you view the page source you'll probably find all the terms you searched for enclosed in meta tags, eg.

<meta http-equiv="keywords" content="sex rock roll naked wolfman ad-infinitum>"

I wish all search engines would only index the visible content of websites.

Some websites also set the text colour to the same as the background colour to hide all the bogus search terms just to get you to their site to view the ads.

I've seen sites where the terms in the meta data filled 5 screens, whereas the entire site fitted into one screen or less.
« Last Edit: November 23, 2009, 06:09 PM by 4wd »

Stoic Joker

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 6,646
    • View Profile
    • Donate to Member
Re: google results
« Reply #2 on: November 23, 2009, 06:55 PM »
I thought most search engines had started blocking/blacklisting/ranking last sites that did that sort of thing?

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5,641
    • View Profile
    • Donate to Member
Re: google results
« Reply #3 on: November 24, 2009, 12:19 AM »
I thought most search engines had started blocking/blacklisting/ranking last sites that did that sort of thing?

I believe that you're right but I seem to still get the occasional site that somehow gets into the results but has nothing to do with what I was looking for, (ie. search terms aren't anywhere on the visible page nor is it related), but looking in the source will show the terms there.

It's definitely better than it used to be though.

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,823
    • View Profile
    • Donate to Member
Re: google results
« Reply #4 on: November 24, 2009, 07:21 AM »
unfortunately viewing source does not have the desired effect

for examply in this webpage it is not shown the "indication of early ego development, because instinctual needs are.."

webpage

any help how google can read the text?

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5,641
    • View Profile
    • Donate to Member
Re: google results
« Reply #5 on: November 24, 2009, 04:10 PM »
[This is a summary or excerpt from the full text of the book or article. The full text of the document is available to subscribers.]

It could be that Google is displaying an excerpt from the full text which I cannot see without being a subscriber.

If you are a subscriber, log in to the site and see if it matches what was found.

If you want to know how Google gets to bypass authentication methods and index pages you can't access then I don't know  :-[
« Last Edit: November 24, 2009, 04:21 PM by 4wd »

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,823
    • View Profile
    • Donate to Member
Re: google results
« Reply #6 on: November 25, 2009, 06:51 AM »
I am not talking about the full text of the article

I am saying that in google results the preview text is bigger that the preview text offered in the webpage and I wonder how this can happen

for example in this webpage it is not shown the "indication of early ego development, because instinctual needs are.." which is shown in google results!

Curt

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 7,566
    • View Profile
    • Donate to Member
Re: google results
« Reply #7 on: November 25, 2009, 08:48 AM »
I would expect the extra search result text to come from the full text, created when someone was reading the full text while Google was indexing.

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,823
    • View Profile
    • Donate to Member
Re: google results
« Reply #8 on: November 25, 2009, 02:31 PM »
it cannot be, it is from all articles, plus there are other domains, irrelevant, that happens this

and how can google index authenticated webpages?

also, google is supposed to index uptodate only versions of webpages, and since this is not the cached version, I suppose google sees even now the more text

last, please note that google does not display the full text of the article, only a bigger preview, and it is not my intention to view the full text without authenticating first, just wonder how google can do that and I want to use it for other websites as well...
« Last Edit: November 26, 2009, 06:21 AM by kalos »

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5,641
    • View Profile
    • Donate to Member
Re: google results
« Reply #9 on: November 26, 2009, 03:50 PM »
also, google is supposed to index uptodate only versions of webpages, and since this is not the cached version, I suppose google sees even now the more text

Google won't cache it because of this line in the source:

<meta name="robots" content="noarchive">

So what is displayed is always going to be whatever Google indexed, whenever it indexed it.

just wonder how google can do that and I want to use it for other websites as well...

A question: Did you initially search for "indication of early ego development, because instinctual needs are", a smaller subset of it or something completely different on that page?

I don't know how Google does it but I'm sure there is probably info on the WWW describing it - just need to search for it :)

A way to test whether Google does it normally is to search for something near the bottom of an article and see if Google has picked it up on a site you need to authenticate yourself on.

Assuming that Google has been allowed to index it, the following will stop compliant search engines from indexing the page and any links from it:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

Or the existence of a robots.txt file in the root with appropriate rules.


Curt

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 7,566
    • View Profile
    • Donate to Member
Re: google results
« Reply #10 on: November 27, 2009, 03:30 AM »
also, google is supposed to index uptodate only versions of webpages,

I cannot think it is like that. On the contrary, I believe! To the best of my (lack of?) understanding, Google will NEVER give a genuine here-and-now up-to-date answer to your queries, but will at any time "merely" search its own servers for what already has been indexed. Otherwise a search would take for ever.

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,823
    • View Profile
    • Donate to Member
Re: google results
« Reply #11 on: November 27, 2009, 05:34 AM »
A question: Did you initially search for "indication of early ego development, because instinctual needs are", a smaller subset of it or something completely different on that page?

yes, I searched for "indication of early ego development" actually, and then, when I went to that webpage, that text was not there!

as for the date that google harvest webpages, I think it is relative, some webpages are freshly harvested and some others not, I suppose depends on the popularity of the webpage and the frequency with which it is updated/changed