|
kalos
|
 |
« on: November 23, 2009, 10:18:25 AM » |
|
hello
in google search results, some webpages' text preview has more text that if you visit the webpage itself (where full view of the whole text is restricted)
I am not talking about webpages that are archived/cached, or at least the webpages I am talking about don't have a 'cached version' link when they appear in google results
why this happens? how can google robot retrieve all the text while a regular user can't? how can one see what google robot 'sees'? maybe disguish as a google robot?
thanks
|
|
|
|
|
Logged
|
|
|
|
|
4wd
|
 |
« Reply #1 on: November 23, 2009, 06:05:39 PM » |
|
It's normally because the website has inserted all types of search terms into the pages' meta data, (or some other non-visible portion of HTML code). If you view the page source you'll probably find all the terms you searched for enclosed in meta tags, eg.
<meta http-equiv="keywords" content="sex rock roll naked wolfman ad-infinitum>"
I wish all search engines would only index the visible content of websites.
Some websites also set the text colour to the same as the background colour to hide all the bogus search terms just to get you to their site to view the ads.
I've seen sites where the terms in the meta data filled 5 screens, whereas the entire site fitted into one screen or less.
|
|
|
|
« Last Edit: November 23, 2009, 06:09:13 PM by 4wd »
|
Logged
|
Four wheel drive: Helping you get stuck faster, harder, further from help...........and it's no different on this forum 
|
|
|
|
Stoic Joker
|
 |
« Reply #2 on: November 23, 2009, 06:55:01 PM » |
|
I thought most search engines had started blocking/blacklisting/ranking last sites that did that sort of thing?
|
|
|
|
|
Logged
|
|
|
|
|
|
4wd
|
 |
« Reply #3 on: November 24, 2009, 12:19:19 AM » |
|
I thought most search engines had started blocking/blacklisting/ranking last sites that did that sort of thing?
I believe that you're right but I seem to still get the occasional site that somehow gets into the results but has nothing to do with what I was looking for, (ie. search terms aren't anywhere on the visible page nor is it related), but looking in the source will show the terms there. It's definitely better than it used to be though.
|
|
|
|
|
Logged
|
Four wheel drive: Helping you get stuck faster, harder, further from help...........and it's no different on this forum 
|
|
|
|
kalos
|
 |
« Reply #4 on: November 24, 2009, 07:21:25 AM » |
|
unfortunately viewing source does not have the desired effect for examply in this webpage it is not shown the "indication of early ego development, because instinctual needs are.." webpageany help how google can read the text?
|
|
|
|
|
Logged
|
|
|
|
|
4wd
|
 |
« Reply #5 on: November 24, 2009, 04:10:28 PM » |
|
[This is a summary or excerpt from the full text of the book or article. The full text of the document is available to subscribers.] It could be that Google is displaying an excerpt from the full text which I cannot see without being a subscriber. If you are a subscriber, log in to the site and see if it matches what was found. If you want to know how Google gets to bypass authentication methods and index pages you can't access then I don't know 
|
|
|
|
« Last Edit: November 24, 2009, 04:21:44 PM by 4wd »
|
Logged
|
Four wheel drive: Helping you get stuck faster, harder, further from help...........and it's no different on this forum 
|
|
|
|
kalos
|
 |
« Reply #6 on: November 25, 2009, 06:51:24 AM » |
|
I am not talking about the full text of the article
I am saying that in google results the preview text is bigger that the preview text offered in the webpage and I wonder how this can happen
for example in this webpage it is not shown the "indication of early ego development, because instinctual needs are.." which is shown in google results!
|
|
|
|
|
Logged
|
|
|
|
|
Curt
|
 |
« Reply #7 on: November 25, 2009, 08:48:07 AM » |
|
I would expect the extra search result text to come from the full text, created when someone was reading the full text while Google was indexing.
|
|
|
|
|
Logged
|
Remember what you said, because in a day or two, I'll have a witty and blistering retort! You'll be devastated THEN!
|
|
|
|
kalos
|
 |
« Reply #8 on: November 25, 2009, 02:31:46 PM » |
|
it cannot be, it is from all articles, plus there are other domains, irrelevant, that happens this
and how can google index authenticated webpages?
also, google is supposed to index uptodate only versions of webpages, and since this is not the cached version, I suppose google sees even now the more text
last, please note that google does not display the full text of the article, only a bigger preview, and it is not my intention to view the full text without authenticating first, just wonder how google can do that and I want to use it for other websites as well...
|
|
|
|
« Last Edit: November 26, 2009, 06:21:07 AM by kalos »
|
Logged
|
|
|
|
|
4wd
|
 |
« Reply #9 on: November 26, 2009, 03:50:53 PM » |
|
also, google is supposed to index uptodate only versions of webpages, and since this is not the cached version, I suppose google sees even now the more text Google won't cache it because of this line in the source: <meta name="robots" content="noarchive">So what is displayed is always going to be whatever Google indexed, whenever it indexed it. just wonder how google can do that and I want to use it for other websites as well... A question: Did you initially search for "indication of early ego development, because instinctual needs are", a smaller subset of it or something completely different on that page? I don't know how Google does it but I'm sure there is probably info on the WWW describing it - just need to search for it  A way to test whether Google does it normally is to search for something near the bottom of an article and see if Google has picked it up on a site you need to authenticate yourself on. Assuming that Google has been allowed to index it, the following will stop compliant search engines from indexing the page and any links from it: <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">Or the existence of a robots.txt file in the root with appropriate rules.
|
|
|
|
|
Logged
|
Four wheel drive: Helping you get stuck faster, harder, further from help...........and it's no different on this forum 
|
|
|
|
Curt
|
 |
« Reply #10 on: November 27, 2009, 03:30:22 AM » |
|
also, google is supposed to index uptodate only versions of webpages,
I cannot think it is like that. On the contrary, I believe! To the best of my (lack of?) understanding, Google will NEVER give a genuine here-and-now up-to-date answer to your queries, but will at any time "merely" search its own servers for what already has been indexed. Otherwise a search would take for ever.
|
|
|
|
|
Logged
|
Remember what you said, because in a day or two, I'll have a witty and blistering retort! You'll be devastated THEN!
|
|
|
|
kalos
|
 |
« Reply #11 on: November 27, 2009, 05:34:12 AM » |
|
A question: Did you initially search for "indication of early ego development, because instinctual needs are", a smaller subset of it or something completely different on that page?
yes, I searched for "indication of early ego development" actually, and then, when I went to that webpage, that text was not there!
as for the date that google harvest webpages, I think it is relative, some webpages are freshly harvested and some others not, I suppose depends on the popularity of the webpage and the frequency with which it is updated/changed
|
|
|
|
|
Logged
|
|
|
|
|