topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Wednesday April 17, 2024, 7:37 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - qforce [ switch to compact view ]

Pages: [1]
1
DocFetcher 1.0 released. Comments are very welcome.  :Thmbsup:

New features:
- MS Office 2007 support
- Daemon with low cpu usage that watches indexed folders, but runs independently of DocFetcher
- Creating temporary indexes by rightclicking on folders in the file manager
- Portable version can be moved around complete with documents and indexes (i.e. DocFetcher + indexes + documents). You can even burn it on a CD-ROM!

2
C:\Program Files\Aduna -> 90MB
C:\Documents and Settings\UserName\Application Data\Aduna\AutoFocus 5 -> 366MB(this is where index files are maintained/stored)

As for RAM, that depends heavily on the usage: the smarter you build the query the smaller the RAM used and also the number of files resulted from an interrogation: 38.000 files displayed(all .html from all my sources) increased the RAM usage with 60MB...can start with 30MB and use as much as 100+MB.

HTH
Their RAM requirements seem to be very similar to that of DocFetcher.
However, I really wonder why they need so much disk space (are you sure there's no hidden office suite in it? :)).

3
100 MB of disk space for the program itself? This looks more like an office suite with built-in desktop search, if you ask me...
And 128 MB RAM is already half of what my Eclipse IDE (a very hungry beast) usually needs.

We're coming at this from very different perspectives: I have a 320GB harddrive and 4GB of RAM. The effect of this being installed on my computer would be minimal - like a flea on an elephant  ;D Most computers purchased in the past three years have at least 512MB (and more like 1GB) of RAM factory installed and probably more than 80GB harddrive capacity, so...
Well, I didn't mean to stop anybody from using this program. If you think you have enough system resources to run this thing in the background permanently, that's okay with me. All I said was that the hardware requirements seemed a bit much for a desktop search program (e.g. Google Desktop's setup file was something like 2 MB, if I remember correctly), and it's certainly too much for my laptop here, which has 1 GB of RAM, minus 40% of that for the OS and the web browser, and when I fire up my IDE or a virtual desktop (Windows XP running inside a Linux machine  8)), there's not much left for a desktop search program of that magnitude.

4
100 MB of disk space for the program itself? This looks more like an office suite with built-in desktop search, if you ask me...

Greetings.

That is the size of the index file and not the program itself. Autofocus.exe(ver. 5.0 for MSWin) has 44,6KB...
Regarding RAM, as much as I could see is that Java is the hungry beast...

Best regards.
The DocFetcher.exe is 41,5 KB, but that doesn't mean anything, it's just a launcher. You really have to add up everything that is installed on the machine, and according to the website of Autofocus this sums up to 100 MB.

And what do you mean by "100 MB for the index file"? Maybe I'm missing something here, but the way I see it, if a file isn't the result of indexing, then it's part of the base installation (i.e. the program), right?

5
It looked fairly interesting until I read the hardware requirements section...

Hardware requirements

    * CPU: the absolute minimum is a Pentium II at 400 MHz, a Pentium III at 1 GHz or better is recommended.
    * main memory: minimally 128 MB, 256 MB is recommended.
    * disk space requirements: 100 MB + 2 MB per 1000 scanned items.

They don't seem that bad to me...

100 MB of disk space for the program itself? This looks more like an office suite with built-in desktop search, if you ask me...
And 128 MB RAM is already half of what my Eclipse IDE (a very hungry beast) usually needs.

6
BTW, AutoFocus is also base partially on Lucene so that should make it quite familiar to you...
All the best.
It looked fairly interesting until I read the hardware requirements section...

7
That's exactly what I'm getting at. I don't know of any other tool that can search through, e.g., ACDSee's database. However, any photo album app will allow you to save tags into EXIF or IPTC metadata in the images themselves. And since this is a standard, any desktop search app worth its salt can access it.

Having done that, now I can use my search app to find, say, "Mexico 2008" and get all my related photos, emails exchanged with the travel agent, and the AVI of the time-lapse sunset I made. Sure, all of these things are handled through different apps. But the ability to search like this allows me to have all of the materials related to a given project in front of me at once. (Which is why I also think that the Windows way of organizing files under "My Documents" in app-centric folders is idiotic)
Some people are just too lazy to add half a dozen tags to each and every image they store on their computer. Do you really do that? :o
On a related note I'd like to mention delicious.com, a social bookmarking site. This site allows you to assign multiple tags to a bookmark. which sounds awesome in theory (multiple categorization, yay!), but after a while I stopped bothering with all this tagging. Not sure why, but it felt like "too much work"... It seems there's a subtle, but significant difference between tagging and the capability to put a file in more than one folder.

Open source people sometimes amaze me. You refuse to use any such program (even though, as I noted, there's a standard way for them to store their data in most cases), despite how much good it might do you.
Maybe I should also mention the second reason why I don't use albums and the likes: I don't have too many pictures on my computer (a few hundred or so, rarely updated), and I stopped collecting music a long time ago (last.fm anyone?), so there's not much to organize here. This is not too amazing an explanation, is it?

8
Greetings.

What would be the required syntax if one tries to find documents that contain certain strings?
As much as I knew
"word1 word2"
was suppose to that job whereas
word1 word2
is the equivalent of AND, do please correct me if I am wrong.
Just give me a second to check the Lucene documentation... ah, here it is: http://lucene.apache...eryparsersyntax.html
DocFetcher is based on Apache Lucene, and therefore supports all operators described on that page. As for the AND operator, yes, there is one. Example: "some string" AND "some other string"

As for the problems with the hierarchical file system, have you guys considered "albums" and similar features which are provided by decent picture managers and media players these days? This is basically a way to put files into multiple categories. I never used that sort of thing, though, because of the potential risk of vendor lock-in (meaning that all that categorization data is lost when I move to another program).

9
I agree with you when e-mails are involved but their respective attachments are completely different thing, in this case you do need a DtS software.
Now I understand :)

Creating and maintaining hierachies takes time, that'a a fact and probably the most important reason for using DtS. Because I don't want to spend time for that  the rest my answers follow:

a.Images - can be found in many places therefore I use DtS to find them all and then I view them as thumbnails...and thus the decision is easiest
b. Music - here you can, generally search by filename...or metadata/tags. Or, can be leftovers(.ac3 files) from video conversions(DVD->.avi) that, in time, can stack up heavily...
c. Videos - when you use several sources for getting them on your computer they can also get lost in various places, especially when you have more than 1 HDD. For now I have more than 60 movies on my computer...mpeg/avi/iso/vob, you name it.

I also do not think that the typical savvy DtS user is searching mainly for the above but rather for documents with a certain content, at least this is my case. My search ratio is 95%/5% for content/a,v,p.
You seem to be the kind of user who doesn't clean up his folders very often and who then uses full hard drive desktop search to keep that mess under control. Don't get me wrong though, I don't think there's anything wrong with doing it this way. (I'm the kind of user who's folders are highly organized and who only uses desktop search to access stuff where the hierarchical system doesn't help much, i.e. documents.)

I gave it a try for a folder with less than 500 indexable documents and I got 2 messages:
Needed 19 bytes to create the next chunk header, but only found 4 bytes, ignoring rest of data
### Skipped: Not enough memory left in the Java Virtual Machine.
The Java Virtual Machine in which DocFetcher is running has a memory cap, and your file was too big for that. The manual explains how to raise that cap. However, I admit that this error message should've been more helpful.

Also I didn't get what I was expecting from a Boolean search:

search:"word1 word2"
returned a diffent set(number) of documents compared to
search:word1 word2
but in preview in both cases I saw enlightened both search terms(???).
The first case is AND, the second one is OR. Well, the preview highlighting wasn't fully implemented... :-[ Thanks for pointing this out.

So, for now I wish you all the best but I stick to Autofocus

To each his own.  :Thmbsup:

10
Considering that you can assign tags (apart from other info) to several media formats and save them within the file, I don't think it does really make more sense. What's more, several apps used to view or manipulate media can parse that data and save it in a local database (only accessible by that app, though). Whether you bother to use those methods is another story.

Both methods are not mutually exclusive, and I use them without problems. Depending of the moment, it makes more sense to use one or another, but I don't think there's an optimal solution.
Considering that you can assign tags (apart from other info) to several media formats and save them within the file, I don't think it does really make more sense. What's more, several apps used to view or manipulate media can parse that data and save it in a local database (only accessible by that app, though). Whether you bother to use those methods is another story.

Both methods are not mutually exclusive, and I use them without problems. Depending of the moment, it makes more sense to use one or another, but I don't think there's an optimal solution.
Okay, good point. Thanks for your answer.

11
@ qforce:
Nice piece of software. Looks like it is based on the Eclipse IDE (which is something I appreciate).
Yes, you are right. DocFetcher and Eclipse are based on the same GUI toolkit (named SWT), and DocFetcher is being developed in the Eclipse IDE.

When you make such software, how long will it then take for the big software players to gobble up your talent? So let them buy you out after some, enjoy life from the interest those millions generate. By know you are in the ideal position to not care about who prefers whatever.

Your mind is too many steps ahead at this moment  ;)
I never wanted to work in a big software company. I'm basically a (would-be) scientist who just wrote an utility to better manage his science-related resources and who then decided to share his work with others. And I do care about the needs of my users, because for me, Open Source is basically some sort of "charity".

12
qforce,

What difference does it make to you what others prefer to use? Who made you the arbiter of what is right and what is not WRT search engines? Sounds like a personal problem...

I do not intend to be an "arbiter" or anything, I was just giving my opinion and wanted to see the opinion of others, in the hope that it will give me some important clues about the direction in which I should move with the future releases of my program. No personal issue here...

All these various search facilities in photo apps, music apps, etc. utilize different search methods - some use Regexp, some do not, some search filenames only, some search text within documents. I find it much easier to use a desktop search engine and become very familiar with its search features. For many users, trying to become adept at so many different search methods is a bother that they do not wish to do.

Most users here are a little more savvy than what you seem to think.

Jim
I still don't get it. Let me explain it with this example: Say, I get tired of my current wallpaper and I want to replace it with another, which had this cool sports car on it. So what do I do? Fire up my all-powerful desktop search app and type the name of that file? Well, no. I open my picture browser and click my way down the folder hierarchy to a folder named "Wallpapers", then I browse all the pictures in it until I find the image with the sports car. Why didn't I use a desktop search program? Because I didn't know the filename ("ColinMcRAE_xxx.jpg" or something), and when I saved the file, I didn't bother adding meta data to it (e.g. keywords like "sports car").

It would be very cool if the computer was able to run an image analysis on files like that in order to automatically extract keywords, e.g. "car", "sports car", "mud", "street", "race", etc. If that were possible, I could've typed "sports car" into a desktop search program.

So my point is this: I think (and this is really just an opinion), in the case of images and other media a hierarchical management system makes more sense.

13
Have you tried the built-in search feature of Outlook?  :o  It's completely unusable for anyone who has accumulated more than ... oh ... 100 messages.
Why oh why do people continue to buy and use crap like that even when they know it's complete rubbish. Jesus... Really makes me angry.

For corporations it's even more significant (hello, open-source world: corporations really do exist). Obviously you want to be able to manage your own internal email. The bandwidth cost could be enormous, and more importantly, many organizations must guarantee privacy (HIPAA, Sarbanes-Oxley). And if they've got to have a localized client, then they can't rely on GMail's search.
Well, as an unpaid freelance programmer I do not really care about the enormous needs of big corporations (why would I). However, I do care about this: Do "normal" people really need super-powerful search programs?

Why force people to learn multiple apps? It might be fine for me; I'm well-practiced at such learning, and might benefit from targeted optimizations. But what about for my mom? I think it's fairly typical for people to think that anything they got off the web or via email are all "from the Internet"; how do you explain to such a person when they need to use which tool? (I remember trying to explain to my grandfather, as he was scanning genealogical material, when to save as JPG vs. when to save as PNG. What's an instinctive selection to us is befuddling and nonsensical to "civilians")

More importantly, it's impossible to compartmentalize mail vs documents vs media, etc. A huge portion of my email contains attached documents. And a non-trivial portion of my docs contain embedded images and audio. So if one is to effectively find all email that contain a document that has an embedded image, one needs to be able to handle the whole chain, all the way down.
Yes, but... images, music and video do not contain text (except for the filename and meta data), so the way I see it, it doesn't make much sense to use a desktop search program instead of a picture manager, a media player, etc., to retrieve these files. It would make sense if computers had reliable image recognition capabilities, if they were able to "understand" music, etc., but that's not the case.

14
Hi,

I'm the project admin of DocFetcher, an Open Source desktop search app. I've noticed in several posts in this thread that there seems to be a real need for e-mail indexing and the likes, which puzzles me a bit. More precisely: Why do you guys need an additional program to search your local e-mails when you could use the search feature of your respective e-mail client instead? Moreover, why do you use e-mail clients at all? I, for one, use Google Mail, and am perfectly happy with its search capabilities.

On a more general note, are there any people out there who definitely need a desktop search app to locate images, music, videos, etc.? If so, then why don't you use your picture managers, media players, etc. to do that? Wouldn't that be a much more efficient and appropriate way to organize images, music, etc.?

I'd be thankful for any enlightenment about this issue.

Btw, DocFetcher 1.0 is (probably) about to be released this month and adds support for MS Office 2007 and WordPerfect.

15
I assumed that DocFetcher would need to be placed in Start in order to keep the index updated.
Was I assuming wrongly? If I was, there is of course no need for it to start minimized. But if I was right, and it needs to start with Windows, then it really should start minimized, because I would never use it until later  when I have finished reading my mail, etcetera.
Sure, a checkbox will be all it would take.
Thanks for asking! :-)
Nope, DocFetcher doesn't automatically update its indexes when it starts. However, there are plans to implement a non-java daemon that will take care of the index updates, and that won't display any annoying startup windows (I promise!  :Thmbsup:)

16
I am not going to test DocFetcher any further - unless someone tells me how to make it start minimized - it is too annoying to see it fill up the screen at each startup! The shortcut is of course marked Start Minimized - I also tried if it would accept -tray for an argument - but to no avail.
@Curt: Some questions I'd like to ask you:
1) Why do you want to start DocFetcher in mimized mode?
2) Would it be enough to have a checkbox "start minimized" on the preferences panel?
Thanks in advance!

17
looks interesting qforce, do you, or do you intend to support the docx and other formats from MS Office 2007?
DocFetcher will support MS Office 2007 as soon as the guys from Apache POI are done with implementing support for these formats, which I expect to happen soon. (Apache POI is the library DocFetcher uses to extract text from MS Office files.)

18
Btw are you planning any thunderbird and contacts support at all?
Actually no. With Gmail and all that web stuff I can easily search in my e-mails and contacts from any computer with internet connection, not just from the computer where my e-mail client is installed. That's why I abandoned thunderbird a long time ago, and why I never really felt the need to go beyond document indexing.
Furthermore, I think it would somewhat bloat up the user interface and the search results, making the program less usable for what it was originally written, that is, document retrieval. For a more detailed explanation, you can read the "Comparison To Other Desktop Search Applications" section on the DocFetcher website.

You can, of course, try to convince me otherwise :D

Btw, the other reason why I left thunderbird behind is that I've lost hundreds of e-mails because I forgot to include them in the backup before formatting the disk. FUCK... Lesson learned: Don't use programs that store important data in obscure "profile" folders.

19
Thanks for the friendly welcome :)
I hope Curd won't take my comment as a personal offence or something; I just couldn't help but laugh :D

20
I am not going to test DocFetcher any further - unless someone tells me how to make it start minimized - it is too annoying to see it fill up the screen at each startup! The shortcut is of course marked Start Minimized - I also tried if it would accept -tray for an argument - but to no avail.

Haha, this one gave me a good laugh. Because, you know, I'm the guy who wrote dog... errh, DocFetcher  8), and the reason why I'm laughing is that it feels so unreal to see people making such complaints instead of submitting a feature request.
This is what I like so much about Open Source: Instead of wasting your time being angry, you can either ask the developers to fix it, or fix it yourself.

Pages: [1]