topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Friday October 23, 2020, 1:33 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: What Indexed PDF searcher (Desktop Searcher) has all of the following?  (Read 4710 times)

Yatom

  • Participant
  • Joined in 2020
  • *
  • default avatar
  • Posts: 22
    • View Profile
    • Donate to Member
Please advise, as I'm going dizzy searching, reading, downloading, and testing various Desktop searcher programs, such as Qiqqa, Mendeley, dtSearch, Archivarius, Lookeen, etc.  What program for Windows 7 (my OS) can do all of the following things:

#1) Index and search multiple PDF's quickly (the first and most basic requirement)

#2) Use Boolean search operators, and preferably including a "WITHIN" (proximity) function that can be used in conjunction with AND / OR / NOT functions.  Here's an example search I'd like to do:

(director OR manager) WITHIN 10 Words (promoted OR fired)

#3) Has a real PDF viewer to view search hits, versus just displaying search results in plain text, as Archivarius seems to do.  Qiqqa and Mendeley have some sort of a PDF viewer which highlight the results within the PDF itself--again, not just plain text.  But Archivarius can't do this, can it?

4) Has a Ctrl+F "Find on Page" locator to further "drill down" and locate information following the initial search results.  ( Qiqqa doesn't seem to have this, which surprises me.)

5) Can search not only English, but also Hebrew and Greek in Unicode (Archivarius does this nicely).

Please advise which software can do *all* or at least most of the above.  So far, Archivarius seems to do #1, #2 (but can it do proximity in conjunction with Boolean??), #4, and #5 (but it has no PDF viewer, does it?)

Thank you so very much for your advice.
« Last Edit: February 24, 2020, 03:32 AM by Yatom »

Yatom

  • Participant
  • Joined in 2020
  • *
  • default avatar
  • Posts: 22
    • View Profile
    • Donate to Member
Regarding my post above, unless anyone points me elsewhere, I've narrowed the decision down to Archivarius 3000 and dtSearch for my PDF library needs, even though they don't have native PDF viewers built into them.  (That's okay, after all).  However, each has at least one problem that I'm trying to figure out.

Archivarius does a terrible job of displaying the PDF text.  I have several books scanned into pdf with opposing pages (left and right), and Archivarius actually merges the text together at random, showing that it can't discern that they are opposite pages in a book.  That's really terrible.  The version I'm using is 4.62.  Can anyone speak to any improvements on this issue?

Note: dtSearch isn't much better in its PDF display (speaking of plain text, just as in Archivarius), but it doesn't seem to have the problem of combining opposing pages of a book together. But now on the problem with dtSearch.

So far, it doesn't handle my OCR'd Hebrew text.  It can't find words like המים which are certainly in the documents I've indexed (and which Archivarius easily finds).  Does anybody have experience /advice regarding unicode languages like Hebrew, because a program this powerful (and expensive) should have no problem.  Again, apples to apples comparison with Archivarius, the latter handles my Hebrew perfectly.

Also, can the search results (from PDF's, since that's all I search) be customized to show more or less context around the hit?  Archivarius offers two settings ("Fragments Found" and "Full Text"), and it does this, but I can't find an option to do so in dtSearch.

Also, I see that by using [...] in Archivarius, I can search for terms that are within 10 words of each other.  But is there any option to adjust that distance, greater or less?  Can Archivarius only do 10 words apart, and not 25, 50, etc.?  And what about in dtSearch?  (I do a lot of proximity searches like this).

And as I mentioned in the first post up above, can either program accomplish something like this:

(manager OR director) WITHIN 25 WORDS (promoted OR fired)

Lastly for now, is there a way to search only comments or "sticky notes" that I've placed within my PDF's using either of these programs?  That would be a nice option.

Thank you for your advice on these questions,
Yatom
« Last Edit: February 25, 2020, 02:51 AM by Yatom »

cranioscopical

  • Friend of the Site
  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 4,757
    • View Profile
    • Donate to Member
I don't know of anything that completely meets your needs. I did, however, stumble across DocFetcher which might be worth your attention.

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,533
  • Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
@Yatom:
...What program for Windows 7 (my OS) can do all of the following things: ...
You will probably find all your requirements - and a lot of new requirements (once you discover what else is possible) - being met by the superb Qiqqa.
Qiqqa has gone open source after 10 years of steady and highly successful development and use in the field.
It had always had a $FREE version anyway, for most users.
My review of Qiqqa (dated 2013) is here: Qiqqa - Reference Management System - Mini-Review - there seemed to be nothing else that could quite match it in the marketplace, and I think that's probably even more so the case today, though Elsevier's Mendeley might be quite good, but that's a different breed of cat now that Elsevier own it and it would be subject to Elsevier's apparently notorious rapacious $charging regime.

Regarding the DocFetcher software that @cranioscopical referred to above:
@Contro: I took a look at the details on the DocFetcher website, and it seems to be purely a document Search/Index proggy - could be an alternative/replacement  to (say) Windows Search/Index. Thus apparently not the same thing as Qiqqa at all.
« Last Edit: March 01, 2020, 03:25 PM by IainB »

kfitting

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 593
    • View Profile
    • Donate to Member
Calibre? You've probably stumbled across this already though.

https://calibre-ebook.com

LM7

  • Supporting Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 21
    • View Profile
    • Donate to Member
Three other options that you might consider (best experimenting with them - I don't know, for example, whether they work with Win 7) -

1. UltraRecall. This enables you to create a repository of PDF files and search them in various ways (Boolean search operators etc.), and that includes searching Hebrew and internal display of PDF files. UR has additional advantages: (a) you can annotate relevant PDFs (not internal PDF annotations, but annotations in UR which are "attached" to the PDF files) and search these annotations along with the original PDFs; (b) you can also search within other types of files, including all Office file types (except for Access), RTF files, and HTML, and all such files can be internally displayed in UR. You can also use UR to search files which are linked to the program, and not just files stored internally in UR.

2. X1. This is a full fledged desktop search program (not based on a user-generated repository of files), which enables you to search multiple file types, and provides internal display options, though not close to those provided by UR.

I use both UR and X1 extensively (obviously, for different kinds of projects), and I am extremely happy with both of them.

I also have a license for Archivarius, and while I prefer X1 over Archivarius, Archivarius has one potentially significant advantage over X1 (although I find it inferior in other ways, e.g. in terms of its interface): it enables you to search for prefixes, whereas X1 does not. Thus, for example, you can search Archivarius for [muse] and find "bemused" and "muses," whereas X1 will only locate the latter form (to find "bemused" you will have to do a double search = bemuse* or muse*).

3. Copernic Desktop Search, paid version. (The free version does not let you search PDFs). I don't have too much experience with this one, and I am disinclined towards it, since it works on a subscription model rather than a one-time payment model, as with the other products. Still, you might want to try it out.

Other options, which I am not sure at all how helpful you would find them, are Zotero and the no longer developed, but still downloadable, Smereka Tree Projects (http://personaldatabase.org/).

If you are not satisfied with these options, another place you might want to check out, is the excellent Outliner Software forum (outlinersoftware.com)

Good luck! (If you want to PM me about any of this I would be happy to try and help.)

superboyac

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 6,285
    • View Profile
    • Donate to Member
Regarding my post above, unless anyone points me elsewhere, I've narrowed the decision down to Archivarius 3000 and dtSearch for my PDF library needs
i've possibly spent even more time than you looking for the optimal software for this sort of thing over the years.  I also narrowed it down to archivarius and dtsearch.

archivarius' weak point is that only shows plain text, as you mention.  otherwise it would be almost perfect.

dtsearch is the option that can actually show pdf.  you need to use "dtsearch web", not the plain dtsearch option.  And i believe there is a pdf plugin that you have to install separately.  for me, this is the best solution that i have seen.

x1 search is also a decent one, but i dont prefer over dtsearch web.

superboyac

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 6,285
    • View Profile
    • Donate to Member
@Yatom:
...What program for Windows 7 (my OS) can do all of the following things: ...
You will probably find all your requirements - and a lot of new requirements (once you discover what else is possible) - being met by the superb Qiqqa.
Qiqqa has gone open source after 10 years of steady and highly successful development and use in the field.
It had always had a $FREE version anyway, for most users.
My review of Qiqqa (dated 2013) is here: Qiqqa - Reference Management System - Mini-Review - there seemed to be nothing else that could quite match it in the marketplace, and I think that's probably even more so the case today, though Elsevier's Mendeley might be quite good, but that's a different breed of cat now that Elsevier own it and it would be subject to Elsevier's apparently notorious rapacious $charging regime.

Regarding the DocFetcher software that @cranioscopical referred to above:
@Contro: I took a look at the details on the DocFetcher website, and it seems to be purely a document Search/Index proggy - could be an alternative/replacement  to (say) Windows Search/Index. Thus apparently not the same thing as Qiqqa at all.

hey nice find!  i have to try this it looks so good!

Yatom

  • Participant
  • Joined in 2020
  • *
  • default avatar
  • Posts: 22
    • View Profile
    • Donate to Member
Thank you all for your answers above, and I apologize for the delay in response.  (I didn't receive notification about this thread for some reason).

I feel blessed by all of your suggestions and advice.  In the meantime, I wrote just a day or so ago my experience with Archivarius, Dtsearch, and X1 (well, brief with X1).  You can see it as this thread:
https://www.donation....msg437231#msg437231

@SuperboyAC:
archivarius' weak point is that only shows plain text, as you mention.

Interestingly, I'm actually completely over this issue, and I am pretty thrilled with Archivarius viewer.  Yes, initially I was set on having a PDF viewer, but here's what I realize: opening all my search hits in a PDF viewer *automatically / initially* would probably take a ton of time.  (I have some big PDF's ~ 200MB - 1GB).

And yet, Archivarius has a simple button which automatically launches the default program for whatever file you are previewing, so getting to a real PDF viewer/editor only takes the click of one button.  So also if my search results happen to be in a MS Word doc, or .txt file.

Truth be told, now that I see that Archivarius can handle so many file types, my mind has expanded beyond just building a PDF-only library.  I now have an index with about 8,000 files--everything from .doc, .pdf, .jpg, .txt., .xml, .html.  And Archivarius previews these all quite well.  (Of course, formatting such as paragraphs and line breaks are all stripped out for the most part, but the text is easy enough to read or search in most cases).

My chief complaints with Archivarius, after *finally* figuring out that I can turn the blasted "morphology mode" off, and do a real "exact search," are as follows (note: see this thread regarding "morphology search: https://www.donation....msg437231#msg437231)

#1) Phrase search (e.g. "Run the race") doesn't allow wildcards in it.  It treats every word inside the parenthesis as exact, so (and I kid you not), if your document said "Run the racer", the example I just gave would not find it.  You can't search "Run the race*" with a wildcard.  That's hard for me to believe.

#2) Proximity/vicinity search is limited to only 10 words apart.  What if I want a range of 20, 50, or 100 words?

#3) Wildcards can't be used in a proximity/vicinity search.  (What!?)  So you have to know *exactly* how something is spelled, and if it has even an additional letter on the very end, you will not find it.  That's egregious.  Just let me stick a * or a ? in there!

#4) There is no case-sensitive searching.  The program claims to offer that in one of the search forms, but it doesn't work.

#5) There are a number of serious bugs (which I won't list right now), and the developer hasn't answered any email to multiple addresses for over a month.  Zero support whatsoever.  I've had to figure out the entire program for myself, pretty much.  But, I can help others now if they need it.

Thanks again for all of your helpful replies and advice above.  I just took a quick peek at Ultra Recall's webpage.  Does it allow you to index your library (8,000 files of various types for me), and search them like Archivarius or Dtsearch?


David.P

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 198
  • Ergonomics Junkie
    • View Profile
    • Donate to Member
Count me in for the search for the best file indexer/desktop search tool. (BTW, why has the original thread been locked?)

I'm running into limitations of Archivarius 3000 lately, particularly the lack of development and support by Likasoft, and some persistent bugs.

Docfetcher looked promising, but unfortunately, the find and preview features are somewhat limited and clunky (finds PDF comments but doesn't show them in the preview pane, doesn't find PDF search terms when they are hyphenated at the end of a line, all search terms are highlighted in the same color, slow preview of large PDF files etc.).

The highly praised Lookeen is unusable for me because there is not even a proper preview highlighting all occurrences in the found documents.

I'd love to find something sleek and fast like Archivarius 3000, which is however also actively developed and has a large and active user base.

I have half a million files, mostly *.msg, Office and PDF files, but also other file types like e-mail attachments, archives etc.

Yatom

  • Participant
  • Joined in 2020
  • *
  • default avatar
  • Posts: 22
    • View Profile
    • Donate to Member
Count me in for the search for the best file indexer/desktop search tool.

You may want to try "dTsearch."  Of all the programs I tried (about six or seven, I think), dTsearch came the closest in overall functionality to Archivarius 3000.  However, aside from costing about $200 (while Archivarius costs only $36), the following were the truly prohibitive aspects that kept me from continuing with it.

#1) Long time to open large files.  I have multiple PDF files that are 500MB or greater, and one in particular that I frequently use that is ~1GB.  Archivarius displays it (plain text) in the preview pane in about 3 seconds, while DtSearch takes at least 60 seconds to do the same.  Since I deal with this particular file (and others of large size) I cannot wait so long each and every time I run a search.

#2) The Find on Page utility freezes in DtSearch when searching in large files.  For me, Find on Page is absolutely indispensable.  I have to have it.  And the fact that Archivarius and DtSearch both have a "Highlight all" option, so that you can type a word and it instantly gets highlighted anywhere it appears on the page, is indispensable.  Both programs do this; however, DtSearch literally jams up, or freezes when the file is large.  Archivarius never freezes up.

#3) Hebrew text always gets completely distorted (flipped around) in DtSearch.  This is a huge "no, no," as I deal with Hebrew frequently. Archivarius isn't completely perfect with Hebrew text, but it definitely handles it far better than DtSearch.  I wrote to DtSearch's Support about this issue (and they even mention it on their website), and the Support didn't have any helpful solution for it.

One advantage of DtSearch over Archivarius, however, is that it *did* find sticky note comments in my PDF files, which for some reason, Archivarius doesn't.  I would really like if someone here can correct me if I'm wrong, but can Archivarius not find notes in PDF files?

David.P

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 198
  • Ergonomics Junkie
    • View Profile
    • Donate to Member
Thanks Yatom.

The issues #1 and #2 that you noted would mean that dtSearch would also not fit my needs, as compared to Archivarius 3000 which displays those things instantly (at least when the index file is on a SSD).

I can confirm that Archivarius, while finding text box and highlighter comments, it doesn't find text in Sticky Note comments in PDF files.

Let's hope that Likasoft will be revived somehow... 

One could also think so, since they allegedly have many customers from large corporations such as Siemens or BP.

Have you guys tried Copernic Desktop Search recently?

I must say that I particularly like those "jump to search term occurrence" buttons and the multi-color highlighting:



It is a mystery to me why not every search tool has this very feature.
« Last Edit: June 01, 2020, 06:39 AM by David.P »

sphere

  • Participant
  • Joined in 2018
  • *
  • default avatar
  • Posts: 91
    • View Profile
    • Donate to Member
Does Copernic Desktop Search find sticky note text?

I believe Copernic Desktop Search went to subscription mode. I have considered trying to find an older version for purchase.

Yatom

  • Participant
  • Joined in 2020
  • *
  • default avatar
  • Posts: 22
    • View Profile
    • Donate to Member
I must say that I particularly like those "jump to search term occurrence" buttons and the multi-color highlighting:

Archivarius does both of these things too.  See "Screenshot #1" attached here.Screenshot #1.jpgWhat Indexed PDF searcher (Desktop Searcher) has all of the following?Screenshot #2.jpgWhat Indexed PDF searcher (Desktop Searcher) has all of the following?Screenshot #1.jpgWhat Indexed PDF searcher (Desktop Searcher) has all of the following?  You can even change the colors around.

The only thing it can't do which would be really helpful for me is to allow the user to distinguish the regular search hit color and the Find on Page hit color.  In other words, if you choose pink for the results / hit color, then both your normal search (in the search bar) and anything you type on the Find on Page locator will be highlighted in pink.  You can't make different colors between the two.  That's a bummer.

But there is a workaround that gets as close to this as I could find--viz., you can designate the regular search hits to be underlined, while the Find on Page locator remains non-underlined.  Then, even though the hit color is the same, you can have some distinction between what your primary search result is, and what you are seeing in the Find on Page.  See "Screenshot #2" (attached) for demonstration.

I can confirm that Archivarius, while finding text box and highlighter comments, it doesn't find text in Sticky Note comments in PDF files.

Actually, somewhat frustratingly, I just noticed that *some* sticky notes are beginning to show up in my searches.  How do I know?  I began putting a subscription like (YTAM 06-02-20) at the end of every sticky note that I write in a PDF, thinking that someday a program will allow me to seach the utterly unique "code" (YTAM), and suddenly all my sticky notes will be searchable.

That being said, since I started doing this, about half (very unpredictably) of my sticky notes *are* being found by Archivarius.  Obviously, this is no good, as it is totally unreliable.  But maybe(?) there is hope somehow.

What a bummer that the developer / support isn't around.  This is such a great program.  Just has some really nasty and annoying bugs.

David.P

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 198
  • Ergonomics Junkie
    • View Profile
    • Donate to Member
Archivarius [has "jump to search term occurrence" buttons] too.
Yeah sure. That's why I can't switch to any program that doesn't do that. Like X1 Search, Lookeen, DocFetcher or FileLocator Pro, who all don't have that feature.

you can designate the regular search hits to be underlined, while the Find on Page locator remains non-underlined.
How did you do that? I can't find an option like this in Archivarius.

What a bummer that the developer / support isn't around.  This is such a great program.
So true.
« Last Edit: June 03, 2020, 10:10 AM by David.P »

Yatom

  • Participant
  • Joined in 2020
  • *
  • default avatar
  • Posts: 22
    • View Profile
    • Donate to Member
you can designate the regular search hits to be underlined, while the Find on Page locator remains non-underlined.
-Yatom (Today at 12:03 AM)
How did you do that? I can't find an option like this in Archivarius.

See attached screenshot.

David.P

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 198
  • Ergonomics Junkie
    • View Profile
    • Donate to Member
Thanks Yatom, found it.

BTW, I also like the "Found Fragments" view setting very much, which I just discovered today.

sphere

  • Participant
  • Joined in 2018
  • *
  • default avatar
  • Posts: 91
    • View Profile
    • Donate to Member
I am becoming very curious about Archivarius... specifically how it compares to dtsearch and others.

I know some people who swear by recoll.,,

I have thousands of pdfs- some that have comments and sticky notes and I have needed  to find something that has been able to search that layer.  I also have "highlighted" text that I would love to be able to identify.

I have appreciated all of your insight Archivarius these threads have given

Yatom

  • Participant
  • Joined in 2020
  • *
  • default avatar
  • Posts: 22
    • View Profile
    • Donate to Member
I know some people who swear by recoll.,,

Thanks, Sphere.  I've never used or heard of Recoll.  However, it says on their website that PDF, WORD, RTF, and other file types require some kind of special add-on (apparently a different one for each one of these).  Take a look at this screenshot.

Archivarius indexes all of these "natively"--actually about 200 file types (most I've never even heard of).  PDF's are indispensable to me.  They make up the majority of my primary library.  I could understand one or two special file types requiring an add-on of some kind, but such basic and ubiquitous files as PDF, WORD, and RTF should be indexable natively (for my needs).


My questions about Recoll would be the following:

1) Does it have a proximity / vicinity search, like: "President (WITHIN 10 Words) conference"  If so, can the user customize the range (10 words, 25 words, 100 words)?  Archivarius is limited to 10 words apart, but I've jerry rigged a way to get up to 20 words apart.

2) Does it have a Find on Page locator?

3) How does it handle large files (500MB+)?

4) Does it handle Hebrew and other unicode languages?

5) Does it find sticky notes in documents?

And there are other questions I'm not thinking of right now.  Certainly, it has Archivarius beat on Support, since Archivarius literally has zero.  Archivarius does all of these except for #5, although I said in a recent post that it is very oddly finding *some* of my sticky notes.

Feel free to ask me any questions about Archivarius.  Make some of this knowledge I've built up by racking my brain useful.