Welcome Guest.   Make a donation to an author on the site April 20, 2014, 08:26:29 AM  *

Please login or register.
Or did you miss your validation email?


Login with username and password (forgot your password?)
Why not become a lifetime supporting member of the site with a one-time donation of any amount? Your donation entitles you to a ton of additional benefits, including access to exclusive discounts and downloads, the ability to enter monthly free software drawings, and a single non-expiring license key for all of our programs.


You must sign up here before you can post and access some areas of the site. Registration is totally free and confidential.
 
The N.A.N.Y. Challenge 2012! Download dozens of custom programs!
   
   Forum Home   Thread Marks Chat! Downloads Search Login Register  
Pages: [1]   Go Down
  Reply  |  New Topic  |  Print  
Author Topic: google results  (Read 2771 times)
kalos
Member
**
Posts: 977

View Profile Give some DonationCredits to this forum member
« on: November 23, 2009, 10:18:25 AM »

hello

in google search results, some webpages' text preview has more text that if you visit the webpage itself (where full view of the whole text is restricted)

I am not talking about webpages that are archived/cached, or at least the webpages I am talking about don't have a 'cached version' link when they appear in google results

why this happens? how can google robot retrieve all the text while a regular user can't?
how can one see what google robot 'sees'? maybe disguish as a google robot?

thanks

Logged
4wd
Supporting Member
**
Posts: 3,222



see users location on a map View Profile Read user's biography. Give some DonationCredits to this forum member
« Reply #1 on: November 23, 2009, 06:05:39 PM »

It's normally because the website has inserted all types of search terms into the pages' meta data, (or some other non-visible portion of HTML code).  If you view the page source you'll probably find all the terms you searched for enclosed in meta tags, eg.

<meta http-equiv="keywords" content="sex rock roll naked wolfman ad-infinitum>"

I wish all search engines would only index the visible content of websites.

Some websites also set the text colour to the same as the background colour to hide all the bogus search terms just to get you to their site to view the ads.

I've seen sites where the terms in the meta data filled 5 screens, whereas the entire site fitted into one screen or less.
« Last Edit: November 23, 2009, 06:09:13 PM by 4wd » Logged

Four wheel drive: Helping you get stuck faster, harder, further from help...........and it's no different on this forum Evil
Stoic Joker
Honorary Member
**
Posts: 4,878



View Profile WWW Give some DonationCredits to this forum member
« Reply #2 on: November 23, 2009, 06:55:01 PM »

I thought most search engines had started blocking/blacklisting/ranking last sites that did that sort of thing?
Logged
4wd
Supporting Member
**
Posts: 3,222



see users location on a map View Profile Read user's biography. Give some DonationCredits to this forum member
« Reply #3 on: November 24, 2009, 12:19:19 AM »

I thought most search engines had started blocking/blacklisting/ranking last sites that did that sort of thing?

I believe that you're right but I seem to still get the occasional site that somehow gets into the results but has nothing to do with what I was looking for, (ie. search terms aren't anywhere on the visible page nor is it related), but looking in the source will show the terms there.

It's definitely better than it used to be though.
Logged

Four wheel drive: Helping you get stuck faster, harder, further from help...........and it's no different on this forum Evil
kalos
Member
**
Posts: 977

View Profile Give some DonationCredits to this forum member
« Reply #4 on: November 24, 2009, 07:21:25 AM »

unfortunately viewing source does not have the desired effect

for examply in this webpage it is not shown the "indication of early ego development, because instinctual needs are.."

webpage

any help how google can read the text?
Logged
4wd
Supporting Member
**
Posts: 3,222



see users location on a map View Profile Read user's biography. Give some DonationCredits to this forum member
« Reply #5 on: November 24, 2009, 04:10:28 PM »

Quote
[This is a summary or excerpt from the full text of the book or article. The full text of the document is available to subscribers.]

It could be that Google is displaying an excerpt from the full text which I cannot see without being a subscriber.

If you are a subscriber, log in to the site and see if it matches what was found.

If you want to know how Google gets to bypass authentication methods and index pages you can't access then I don't know  embarassed
« Last Edit: November 24, 2009, 04:21:44 PM by 4wd » Logged

Four wheel drive: Helping you get stuck faster, harder, further from help...........and it's no different on this forum Evil
kalos
Member
**
Posts: 977

View Profile Give some DonationCredits to this forum member
« Reply #6 on: November 25, 2009, 06:51:24 AM »

I am not talking about the full text of the article

I am saying that in google results the preview text is bigger that the preview text offered in the webpage and I wonder how this can happen

for example in this webpage it is not shown the "indication of early ego development, because instinctual needs are.." which is shown in google results!
Logged
Curt
Supporting Member
**
Posts: 6,260

see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #7 on: November 25, 2009, 08:48:07 AM »

I would expect the extra search result text to come from the full text, created when someone was reading the full text while Google was indexing.
Logged
kalos
Member
**
Posts: 977

View Profile Give some DonationCredits to this forum member
« Reply #8 on: November 25, 2009, 02:31:46 PM »

it cannot be, it is from all articles, plus there are other domains, irrelevant, that happens this

and how can google index authenticated webpages?

also, google is supposed to index uptodate only versions of webpages, and since this is not the cached version, I suppose google sees even now the more text

last, please note that google does not display the full text of the article, only a bigger preview, and it is not my intention to view the full text without authenticating first, just wonder how google can do that and I want to use it for other websites as well...
« Last Edit: November 26, 2009, 06:21:07 AM by kalos » Logged
4wd
Supporting Member
**
Posts: 3,222



see users location on a map View Profile Read user's biography. Give some DonationCredits to this forum member
« Reply #9 on: November 26, 2009, 03:50:53 PM »

also, google is supposed to index uptodate only versions of webpages, and since this is not the cached version, I suppose google sees even now the more text

Google won't cache it because of this line in the source:

<meta name="robots" content="noarchive">

So what is displayed is always going to be whatever Google indexed, whenever it indexed it.

Quote
just wonder how google can do that and I want to use it for other websites as well...

A question: Did you initially search for "indication of early ego development, because instinctual needs are", a smaller subset of it or something completely different on that page?

I don't know how Google does it but I'm sure there is probably info on the WWW describing it - just need to search for it smiley

A way to test whether Google does it normally is to search for something near the bottom of an article and see if Google has picked it up on a site you need to authenticate yourself on.

Assuming that Google has been allowed to index it, the following will stop compliant search engines from indexing the page and any links from it:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

Or the existence of a robots.txt file in the root with appropriate rules.

Logged

Four wheel drive: Helping you get stuck faster, harder, further from help...........and it's no different on this forum Evil
Curt
Supporting Member
**
Posts: 6,260

see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #10 on: November 27, 2009, 03:30:22 AM »

also, google is supposed to index uptodate only versions of webpages,

I cannot think it is like that. On the contrary, I believe! To the best of my (lack of?) understanding, Google will NEVER give a genuine here-and-now up-to-date answer to your queries, but will at any time "merely" search its own servers for what already has been indexed. Otherwise a search would take for ever.
Logged
kalos
Member
**
Posts: 977

View Profile Give some DonationCredits to this forum member
« Reply #11 on: November 27, 2009, 05:34:12 AM »

A question: Did you initially search for "indication of early ego development, because instinctual needs are", a smaller subset of it or something completely different on that page?

yes, I searched for "indication of early ego development" actually, and then, when I went to that webpage, that text was not there!

as for the date that google harvest webpages, I think it is relative, some webpages are freshly harvested and some others not, I suppose depends on the popularity of the webpage and the frequency with which it is updated/changed
Logged
Pages: [1]   Go Up
  Reply  |  New Topic  |  Print  
 
Jump to:  
   Forum Home   Thread Marks Chat! Downloads Search Login Register  

DonationCoder.com | About Us
DonationCoder.com Forum | Powered by SMF
[ Page time: 0.037s | Server load: 0.02 ]