topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Saturday December 14, 2024, 2:19 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Last post Author Topic: For DC Fans: Bring DC Forum to your Desktop for searching/browsing/etc.  (Read 26604 times)

Wordzilla

  • Forum Search Daemon
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 470
  • Two there should be; no more, no less.
    • View Profile
    • FreeThesaurus.net - The Free Online Synonym Finder
    • Read more about this member.
    • Donate to Member
I recently wrote a DC forum crawler app that reads topics/posts on the DonationCoder.com forum, and saves them as individual html files.

After 10 hours of continuous torturing mouser's server :mrgreen:, we now have 8277 individual html files in a compressed RAR package (divided into 4 vols), which is available from:

http://www.mrcody.com/crawl/032007/

Download the four RAR files (67.9MB in total) and uncompress (458MB) to your desired directory.


This package presumably archives all publicly available (anonymous-level) topics/posts:

From topic #1 (restricted) to topic #7825 (Which programs have you found that do not work with Vista?) as at 03:06:11 AM, March 20, 2007 (Central Time)


Screenshot - 21_03_2007 , 10_11_51 PM.png



Wordzilla

  • Forum Search Daemon
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 470
  • Two there should be; no more, no less.
    • View Profile
    • FreeThesaurus.net - The Free Online Synonym Finder
    • Read more about this member.
    • Donate to Member
You will perhaps find it much more useful than it appears (useless and redundant) if you are:

1. A DC hardcore  8)
2. Using one or more desktop search engines/utilities like Yahoo! X1, Google Desktop Search, Copernic and Windows Desktop Search

From my experience, my currently installed desktop search tools (all of the above) yield more and also more accurate results than Google does, in most cases (given the same search query), and of course much faster.

Another advantage of this approach is that search results are often previewable.


We have a member by the username of sable here and she has a huge champion cat which WoWs me, now I need to find the pic of her cat to show you.  ;)


Query: sable cat


My X1 Desktop Search - took it 5 mins to completely index these html files

1.png


Google Search - Google is your friend!

2.png


DC Forum Search - SMF search sucks  :P

3.png

jgpaiva

  • Global Moderator
  • Joined in 2006
  • *****
  • Posts: 4,727
    • View Profile
    • Donate to Member
One has to love the amazing oh-so-useful forum search results..  ;D ;D
This actually is a very good idea. Next time i search DC and can't find what i'm looking for, i'll know where to go  :Thmbsup:

urlwolf

  • Charter Member
  • Joined in 2006
  • ***
  • Posts: 1,837
    • View Profile
    • Donate to Member
This is very clever and a real timesaver.
Which of the  desktop search engines you recommend?
Thanks!

Wordzilla

  • Forum Search Daemon
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 470
  • Two there should be; no more, no less.
    • View Profile
    • FreeThesaurus.net - The Free Online Synonym Finder
    • Read more about this member.
    • Donate to Member
Thx!  :)

Which of the  desktop search engines you recommend?

If you would like to get the most out of the DC forum crawl, perhaps it's a good idea to dedicate one search engine to indexing & searching these files, like what I did.

In this case, go for Yahoo's X1 Enterprise Client (free). Fast indexing & full-text search with highlighting.

t1.pngFor DC Fans: Bring DC Forum to your Desktop for searching/browsing/etc.



Go to Tools -> Option and set up the search engine like shown below (just my suggestions):

No Delay Indexing

t2.pngFor DC Fans: Bring DC Forum to your Desktop for searching/browsing/etc.

Index exclusively DC Forum html pages

t3.pngFor DC Fans: Bring DC Forum to your Desktop for searching/browsing/etc.

t4.pngFor DC Fans: Bring DC Forum to your Desktop for searching/browsing/etc.

Build one-time index (takes about 5 mins)

t5.pngFor DC Fans: Bring DC Forum to your Desktop for searching/browsing/etc.


note: beware that the X1 client may ask u to install yahoo toolbar etc. during installation, remember to deselect them.



urlwolf

  • Charter Member
  • Joined in 2006
  • ***
  • Posts: 1,837
    • View Profile
    • Donate to Member
While decompressing, I'm shopping around for an indexer :)
Looks like the windows one will make my oneNote searches faster too. Can you limit the dirs that windows DS indexes, to just say the DC archive and my onenote folder? If so, this could be one less application I need to install. Is there any reason you have all four in your computer and also recommend X1 dedicated to just the DC archive?
« Last Edit: March 21, 2007, 08:56 AM by urlwolf »

Wordzilla

  • Forum Search Daemon
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 470
  • Two there should be; no more, no less.
    • View Profile
    • FreeThesaurus.net - The Free Online Synonym Finder
    • Read more about this member.
    • Donate to Member
urlwolf:

We have a few topics here discussing desktop search engines:

desktop search guide page

What is the currently best Desktop Search software?

from X1 8)

Wordzilla

  • Forum Search Daemon
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 470
  • Two there should be; no more, no less.
    • View Profile
    • FreeThesaurus.net - The Free Online Synonym Finder
    • Read more about this member.
    • Donate to Member
Looks like the windows one will make my oneNote searches faster too. Can you limit the dirs that windows DS indexes, to just say the DC archive and my onenote folder?

Yes of course. I have WDS running and it's completed indexing my DC html pages.

Screenshot - 22_03_2007 , 1_14_56 AM.pngFor DC Fans: Bring DC Forum to your Desktop for searching/browsing/etc.

Right click on its tray icon and select "... Options", click "Modify" and add the DCForumCrawl_Titled (rename it if you like) directory. Deselect other unwanted drives/directories. :)


If so, this could be one less application I need to install. Is there any reason you have all four in your computer and also recommend X1 dedicated to just the DC archive?

I've been testing these tools to see which is best for my specific tasks. None of them is best in all areas.

Google Desktop Search is great for general indexing everything, including visited webpages (yay!), however I find it not very convenient to specify how, where, and when the files should be indexed.

With X1, you have fine, manual control over these aspects, which makes it the best tool for the job (for me).



urlwolf

  • Charter Member
  • Joined in 2006
  • ***
  • Posts: 1,837
    • View Profile
    • Donate to Member
Thanks for that.
Looks like WDS is a pain to configure. I basically want it to ignore my entire HD, and index only the 2-3 folders I tell it to. I see that X 1 does that fine, but how difficult is it in WDS?

Also, does any of these applications index music tags? Say I want to search for album="foo", or artist="bar" is that possible?


Wordzilla

  • Forum Search Daemon
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 470
  • Two there should be; no more, no less.
    • View Profile
    • FreeThesaurus.net - The Free Online Synonym Finder
    • Read more about this member.
    • Donate to Member
I basically want it to ignore my entire HD, and index only the 2-3 folders I tell it to.

Go to "Windows Desktop Search Options" , click "Modify" and (de)select drive/directories.

Screenshot - 22_03_2007 , 1_24_22 AM.pngFor DC Fans: Bring DC Forum to your Desktop for searching/browsing/etc.


Also, does any of these applications index music tags? Say I want to search for album="foo", or artist="bar" is that possible?

afaik, there're some free plugins for Google Desktop Search (also listed in the official GDS plugins directory) that do:

Audio Files GDS Indexer
Mp3tag Audio Indexer
GDS Real Media Indexer


urlwolf

  • Charter Member
  • Joined in 2006
  • ***
  • Posts: 1,837
    • View Profile
    • Donate to Member
WDS does music files with no plugins (!):
Likewise, you can search by file extension or file size. But that’s just the beginning. For example, you can search for music files by artist name, or search for photos by the horizontal and vertical resolutions. You can search for text within a file (including PowerPoint presentations). You can search for an Outlook contact by his or her birthday, or search for a meeting request by meeting organizer. You can do Boolean-type searches (where FileName = something and CreationDate >= something else). You can – well, you get the idea.

And it seems it's scriptable... Yummy. Installing it now.

Wordzilla

  • Forum Search Daemon
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 470
  • Two there should be; no more, no less.
    • View Profile
    • FreeThesaurus.net - The Free Online Synonym Finder
    • Read more about this member.
    • Donate to Member
The package gets an update!  Up-to-date as of 27 Apr 2007.

Download: http://www.mrmouser....ForumCrawl070427.rar

72.90MB RAR package containing 8,776 individual html files (ouch! 8))

Works best with desktop search engines.


Darwin

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 6,984
    • View Profile
    • Donate to Member
Wow! Thanks wordzilla. I don't know how I missed this thread a month or so ago. This is AWESOME. I've posted before about how difficult I find it to find old posts (of course, I can't link to that post because I have no idea how to find it!) so this will come in very very handy. Anyway, am downloading it now to be indexed by Archivarius. When that is complete, I'll let you and our millions of listeners know how they get on together.

nosh

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 1,441
    • View Profile
    • Donate to Member
I had written a somewhat detailed review on Desktop Search clients and put it up on my (now non-existent) blog. Copernic even linked to it in the User Reviews section, probably coz I picked their product over others. I don't think the Desktop Search scene has changed very much since then. Here's the full review:



I have compared the following Desktop Search programs for the purpose of this article:  Copernic Desktop Search 2, Google Desktop Search, X1, Yahoo Desktop Search, Microsoft Windows Desktop Search and even the somewhat lesser known but reputed ISYS: desktop. All of these, with the exception of ISYS, are free. All have strengths and weaknesses -  there is unfortunately no one program that "does it all" at this point of time. The best app for you will depend on your specific requirements and what ultimately feels right for you, so trying out more than one app would be a good idea. I'll point out some of the main pros and cons of these apps and close by declaring and briefly reviewing my personal favorite from this lot.

ISYS: desktop is more for corporates and its price tag is none too light either. I did not find anything exceptional about its performance on my machine and I hated the fact that its interface (like so many others, that try too hard) is non-intuitive. The last thing a user needs is a larger than necessary learning curve for what should be a relatively straight forward utility. But then ISYS doesn't claim to be for the average home user so I guess they can be excused. For the purpose of this article though, ISYS is eliminated.

Microsoft Windows Desktop Search
Let's face it - nobody knows Windows better than Microsoft. WDS is quite possibly the fastest app of the lot. What is not so good about it, is that it's very bare bones and has got a relatively ugly, non-customizable interface. If speed matters and speed is all that matters you should definitely give WDS a try.

Google Desktop Search
People have come to expect good things from Google. Google Desktop Search however, seems to be the most misguided utility of the lot. They have gone over-the-top in their quest for desktop search glory and it has misfired badly. First off it pisses me off that Google decided to bundle stuff like Google chat and a couple of other worthless utilities (a note taker?  a "web clips" utility?) with their search app. All this junk, including their chat app get downloaded, installed and start off on their own. Shouldn't those tactics be left to Microsoft? Not only does GDS not have previews for picture or sound files, if your search brings back more than a handful results and you would like to see them all it'll open them in a window in your web browser rather than a window of its own.  Was the design team at Google on crack when they thought out this travesty? Would you like to change your GDS settings? No problem - it'll just open another browser window and keep you on hold till the settings page loads from Google! It's so stupid, it actually makes you laugh! I still can't get over its infuriating reliance on web pages opening up in a browser when this is a client app running right from my desktop. The text file previews had a good look about them (just the right size font, number of lines) but that's ALL it seems to have going for it. Sometimes trying to be different can turn out to be a bad thing and this is a prime example. If you download GDS you get a half-assed desktop search app and a lot of other crap you didn't ask for. This app is a disgrace and the only reason you would use it is coz you didn't know any better.

Yahoo Desktop Search and X1 are quite similar, probably because a lot of what goes on under the hood is handled by the same code. X1 recently went free but in my experience, though the free version is more recent, their paid version works faster and feels lighter. X1 was my choice until a month back. It was not too heavy on resources and did a good job overall. The one drawback I found was, it didn't remember my column settings for different views but that was something one could live with. Yahoo Desktop Search will let you index 300+ file types if you install an optional (free) 4MB expansion pack. If you prefer to index the maximum possible file types for metadata then Yahoo Desktop Search is for you, needless to say you should install the expansion pack too.

Copernic Desktop Search 2
My choice, above all others at the time of this writing is Copernic Desktop Search 2 (currently in pre-release). It supports indexing over 150 file types on last count. Here are some of the most types of data it can index, mind you it can handle several more types than what you see below

Video: avi, mpg, 3gp, wmv
Music: mp3, wav, ogg, ra, rm, cda, wma, aac, au
Images: gif, jpg, bmp, png, psd, tif
Documents: txt, doc, xls, ppt, pdf, html, rtf, hlp
Email: Microsoft Outlook or Microsoft Outlook Express Data, including contact information
Browser History & Favorites: Internet explorer, Firefox, Mozilla, Netscape 
One big drawback that could be a show stopper for some people is that Copernic is unable to list or index files inside compressed archives like zip or rar. It will index these file types (and any other file types for that matter) by their names but it will not be able to see and index the files that lie inside the archive. If you cannot do without indexing archived files, I'd recommend X1. I switched to Copernic for one really simple feature more than any other. Copernic shows you a page full of thumbnails in the picture search category and this makes a HUGE difference if you've got lots of pictures on your hard disk. It can even categorize and group the thumbnails  by date, folder (as shown in the screenshot below), size, filetype and a few other criteria. When it comes to picture search CDS2 simply blows the competition away! Add to that its stylish looks, customizable interface and decent speed and CDS2 is quite a contender, easily my first choice.

copernicid3[1].jpgFor DC Fans: Bring DC Forum to your Desktop for searching/browsing/etc.


CDS2 lets you index networked computers and shows you the number of hits for each category in the toolbar no matter what category you searched in. Its interface is highly customizable, I don't archive emails or contacts and as you can see from the screenshot above, it let me completely remove those categories from the toolbar. Less is more!

There's nothing less about CDS2 when it comes to options and features, though. Besides having all the options that one expects (like an option to remember or not remember previous searches) it also adds new features like query completion or query correction. It's completely flexible and lets you selectively archive (or not) any folder for distinct filetypes. You can select which files are simply to be archived for filenames and which should be scanned for their content, letting you add your own custom extensions to be indexed for content too.


Indexing utilities, as a rule, consume a lot of system resources and if badly configured CDS 2 could easily bring the rest of the PC to a virtual standstill. Configure it correctly and you'll hardly notice it's there till you actually use it. Here are some tips on using it efficiently.


 - Disable the "Display all item when no keyword is entered" option. It's a waste of resources and also compromises your privacy.
 - The query completion and query correction options are always running but hardly used. If after a few uses, you realise you're not really utilizing these options, turn them off completely. 
 - Take a little time to figure out which folders you don't need to index. This can result in a huge saving of system resources and the app will function way more efficiently in all respects if you index in a smart manner and avoid redundancy.
 - The same rule applies when it comes to choosing the file types for indexing metadata. Copernic starts with indexing enabled for all supported file types. Disable extensions for those file types you know you'll never need to search inside. Your indexing will get done a lot faster.
 - Add the extension  .*  to the list of files to be indexed. This will make sure Copernic indexes filenames for every file on the PC. You can minimize using Windows inbuilt file search this way, CDS2 will find your files way faster, provided they're indexed, of course.   
 - Indexing is the most important activity and must be configured right - make use of realtime indexing but let it run in passive mode. Make sure you leave the suspend indexing settings enabled! You can reduce the "Suspend indexing while I use my computer" setting to around 10 seconds without any adverse effect.
copernicid3[1].jpgFor DC Fans: Bring DC Forum to your Desktop for searching/browsing/etc.
« Last Edit: April 28, 2007, 12:46 PM by nosh »

Darwin

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 6,984
    • View Profile
    • Donate to Member
Sweet! I downloaded rar file and had archivarius index it (without extracting it to my harddrive). I wind up with a 78 MB index (in addition to the 73 MB rar file) and then ran a search on "photofiltre". Here's what archivarius returned:

Archivarius - donationcoder.pngFor DC Fans: Bring DC Forum to your Desktop for searching/browsing/etc.

So, you can have your cake and eat it too - all of donationcoder forums at your fingertips, fully searchable for under 200MB. I suspect that X1/YDS and Copernic can do this as well. However, not having any of them loaded up, I can't confirm.

Thanks again, Wordzilla!

EDIT: updated screenshot
« Last Edit: April 28, 2007, 12:51 PM by Darwin »

Ampa

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 592
  • I am cute ;)
    • View Profile
    • MonkeyDash - 2 Player strategy boardgame
    • Donate to Member
Hmm... somehow this feels weird. Downloading an archive of a website to improve the searchability doesn't strike me as the 'correct' approach. Surely we need to invest our effort into better searchability on the web?

brett

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 125
  • Australia
    • View Profile
    • Donate to Member
Hmm... somehow this feels weird. Downloading an archive of a website to improve the searchability doesn't strike me as the 'correct' approach. Surely we need to invest our effort into better searchability on the web?

The first thing that leaps to mind is that I can now carry DC on a large thumb drive, all of wonderful DC, every little bit.
Show me a web search that matches Copernic and I will gladly change. Thanks for the effort Wordzilla, truly helpful.

Brett

mrainey

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 439
    • View Profile
    • Website
    • Donate to Member
Does this forum downloading wind up costing Mouser some money?
Software For Metalworking
http://closetolerancesoftware.com

Wordzilla

  • Forum Search Daemon
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 470
  • Two there should be; no more, no less.
    • View Profile
    • FreeThesaurus.net - The Free Online Synonym Finder
    • Read more about this member.
    • Donate to Member
Thanks guys, glad you like it!  :-*

As Ampa points out, this does not look like one of the best ways to enhanced forum searchability - indeed it's just temporary solution.

We need it coz so far Googlebot seems to have bypassed quite a lot of forum topics and esp. posts in topics that span multiple pages, and SMF forum search is well-known to suck. :(

btw, mouser and I have come up with a beta forum sitemap script that keeps search engines up-to-date on every single public topic and post on the forum, let's c how it works.  :)

Wordzilla

  • Forum Search Daemon
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 470
  • Two there should be; no more, no less.
    • View Profile
    • FreeThesaurus.net - The Free Online Synonym Finder
    • Read more about this member.
    • Donate to Member
Does this forum downloading wind up costing Mouser some money?

Yes, he's complaining to me about huge financial loss!  :D

j/k. Every time the crawler is at work, both DC and I lose 1GB of available monthly allocated bandwidth, but that's no big deal at all. 8)

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,914
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
what wordzilla doesn't know is that i planted a codyvirus in his forum signature.. each time he posts cody steals one big giant gold coin from him :)

one of the best things about asking for donations is that it has enabled us to get a nice server with generous bandwidth so we don't yet have to worry about such things.

superboyac

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 6,347
    • View Profile
    • Donate to Member
This is pretty cool.  If you really wanted to do it right, may I suggest this:
have some kind of program that will update the archive on your computer, similar to how virus definitions are updated every day.  This way, the archive on your computer will always be up to date.

Of course, this is really hardcore.  I've never heard of a forum archiving it's content!  Gotta love it here!

nosh

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 1,441
    • View Profile
    • Donate to Member
have some kind of program that will update the archive on your computer, similar to how virus definitions are updated every day.  This way, the archive on your computer will always be up to date.

I think there's wayyy too much geeky info constantly bombing my poor brain already. I certainly wouldn't  bother replicating anything, if you're so inclined you could run a site grabber of your own and set it to update as often as you care to. IDM has the most amazing site grabber built in, easy to use (wizard based), _highly_ configurable & light on resources. Software doesn't get much sexier.

Wordzilla

  • Forum Search Daemon
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 470
  • Two there should be; no more, no less.
    • View Profile
    • FreeThesaurus.net - The Free Online Synonym Finder
    • Read more about this member.
    • Donate to Member
New package. Up-to-date as of 17 May 2007.

Download: http://www.mrmouser....ForumCrawl070517.rar

76.01MB RAR package containing 9,150 individual html files. It is mouser-killing! :D

Darwin

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 6,984
    • View Profile
    • Donate to Member
Sweet, thanks Wordzilla! BTW, I neglected to underline, bold and italicise the fact that when you've finished indexing the rar file (using Archivarius but probably others like Google, MS and Copernic search engines as well), you can delete it and still read the indexed search results. Of course, you have to set things up so that the indexer doesn't automatically scan that folder for updates...

This means that the donationcoder forums are fully searchable for under 80 MB (not 200 as I reported above).