...
However, I am a bit confuzzled by what I have observed in an ad hoc manner as and when I have performed searches, and over an extended period from some time back, because sometimes:
- (i) Some text strings of more than adequate length could not be found in local searches of DCF - that is, using the DCF local search function. I had presumed this could be the result of (say) either a local indexing failure, or deliberate site blocking (non-indexing) of certain strings/posts by admins.
- (ii) Some text strings of more than adequate length were not found in some or any DC Forum posts, in duckgo searches, even though the strings were in one or more DC Forum posts. I had presumed this could be the result of (say) deliberate site blocking - e.g., (say) robots.txt of certain strings/posts, or other blocking, so that search crawlers were inhibited/blocked.
- (iii) Some discussion threads in the main DCF forum being unavailable in Wayback beyond a certain point due to robots.txt blocking. I had presumed this could be the result of deliberate site blocking of certain strings/posts by admins.
...
Any daylight on this would be helpful.
-IainB
I've seen much of this too Iain, so here are my guesses:
1. The new search is a disaster. I think it was one of the days I was working on the Getting Things Done thread and for the life of me I couldn't get it to come up in the *new* DC search. It seems to do "OR" searches so I'd get all threads for "getting", and "Things" and "Done".
The old board had a pretty good hand optimized search by someone from DC, and Mouser hasn't had time / figured out how to port it over.
2. The crawlers of different search engines work differently, and to varying success. Unfortunately, I have to report I like DuckGo's philosophy, but that its engine simply gave me mediocre results a lot like you are saying, you can be staring at the relevant DC post in one window and the DuckGo simply refuses to pull it.
Then change engines and as much as I *don't* like their philosophy, Google IS basically the best except weird cases. I've done the same search in like 4 engines (including Binged-Yahoo and Startpage) and they come up with different results of varying quality. So it can't be a block per se if Google finds it and Duck doesn't.
3. Maybe it costs money to run scans at sufficient intervals or something, but Wayback simply sadly not reliable for thoroughness and I know you are a thorough person. So a page for ex you know changes every day only has such as 5-20 images for that month. Again, Google's spider is the best because it runs at lightning speed and I think I recall once a post I made an hour earlier showed up in Google. [/list]