topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Sunday April 5, 2020, 1:58 am
  • Proudly celebrating 15 years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: What Indexed PDF searcher (Desktop Searcher) has all of the following?  (Read 1486 times)

Yatom

  • Participant
  • Joined in 2020
  • *
  • default avatar
  • Posts: 12
    • View Profile
    • Donate to Member
Please advise, as I'm going dizzy searching, reading, downloading, and testing various Desktop searcher programs, such as Qiqqa, Mendeley, dtSearch, Archivarius, Lookeen, etc.  What program for Windows 7 (my OS) can do all of the following things:

#1) Index and search multiple PDF's quickly (the first and most basic requirement)

#2) Use Boolean search operators, and preferably including a "WITHIN" (proximity) function that can be used in conjunction with AND / OR / NOT functions.  Here's an example search I'd like to do:

(director OR manager) WITHIN 10 Words (promoted OR fired)

#3) Has a real PDF viewer to view search hits, versus just displaying search results in plain text, as Archivarius seems to do.  Qiqqa and Mendeley have some sort of a PDF viewer which highlight the results within the PDF itself--again, not just plain text.  But Archivarius can't do this, can it?

4) Has a Ctrl+F "Find on Page" locator to further "drill down" and locate information following the initial search results.  ( Qiqqa doesn't seem to have this, which surprises me.)

5) Can search not only English, but also Hebrew and Greek in Unicode (Archivarius does this nicely).

Please advise which software can do *all* or at least most of the above.  So far, Archivarius seems to do #1, #2 (but can it do proximity in conjunction with Boolean??), #4, and #5 (but it has no PDF viewer, does it?)

Thank you so very much for your advice.
« Last Edit: February 24, 2020, 03:32 AM by Yatom »

Yatom

  • Participant
  • Joined in 2020
  • *
  • default avatar
  • Posts: 12
    • View Profile
    • Donate to Member
Regarding my post above, unless anyone points me elsewhere, I've narrowed the decision down to Archivarius 3000 and dtSearch for my PDF library needs, even though they don't have native PDF viewers built into them.  (That's okay, after all).  However, each has at least one problem that I'm trying to figure out.

Archivarius does a terrible job of displaying the PDF text.  I have several books scanned into pdf with opposing pages (left and right), and Archivarius actually merges the text together at random, showing that it can't discern that they are opposite pages in a book.  That's really terrible.  The version I'm using is 4.62.  Can anyone speak to any improvements on this issue?

Note: dtSearch isn't much better in its PDF display (speaking of plain text, just as in Archivarius), but it doesn't seem to have the problem of combining opposing pages of a book together. But now on the problem with dtSearch.

So far, it doesn't handle my OCR'd Hebrew text.  It can't find words like המים which are certainly in the documents I've indexed (and which Archivarius easily finds).  Does anybody have experience /advice regarding unicode languages like Hebrew, because a program this powerful (and expensive) should have no problem.  Again, apples to apples comparison with Archivarius, the latter handles my Hebrew perfectly.

Also, can the search results (from PDF's, since that's all I search) be customized to show more or less context around the hit?  Archivarius offers two settings ("Fragments Found" and "Full Text"), and it does this, but I can't find an option to do so in dtSearch.

Also, I see that by using [...] in Archivarius, I can search for terms that are within 10 words of each other.  But is there any option to adjust that distance, greater or less?  Can Archivarius only do 10 words apart, and not 25, 50, etc.?  And what about in dtSearch?  (I do a lot of proximity searches like this).

And as I mentioned in the first post up above, can either program accomplish something like this:

(manager OR director) WITHIN 25 WORDS (promoted OR fired)

Lastly for now, is there a way to search only comments or "sticky notes" that I've placed within my PDF's using either of these programs?  That would be a nice option.

Thank you for your advice on these questions,
Yatom
« Last Edit: February 25, 2020, 02:51 AM by Yatom »

cranioscopical

  • Friend of the Site
  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 4,742
    • View Profile
    • Donate to Member
I don't know of anything that completely meets your needs. I did, however, stumble across DocFetcher which might be worth your attention.

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,527
  • Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
@Yatom:
...What program for Windows 7 (my OS) can do all of the following things: ...
You will probably find all your requirements - and a lot of new requirements (once you discover what else is possible) - being met by the superb Qiqqa.
Qiqqa has gone open source after 10 years of steady and highly successful development and use in the field.
It had always had a $FREE version anyway, for most users.
My review of Qiqqa (dated 2013) is here: Qiqqa - Reference Management System - Mini-Review - there seemed to be nothing else that could quite match it in the marketplace, and I think that's probably even more so the case today, though Elsevier's Mendeley might be quite good, but that's a different breed of cat now that Elsevier own it and it would be subject to Elsevier's apparently notorious rapacious $charging regime.

Regarding the DocFetcher software that @cranioscopical referred to above:
@Contro: I took a look at the details on the DocFetcher website, and it seems to be purely a document Search/Index proggy - could be an alternative/replacement  to (say) Windows Search/Index. Thus apparently not the same thing as Qiqqa at all.
« Last Edit: March 01, 2020, 03:25 PM by IainB »

kfitting

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 593
    • View Profile
    • Donate to Member
Calibre? You've probably stumbled across this already though.

https://calibre-ebook.com

LM7

  • Supporting Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 21
    • View Profile
    • Donate to Member
Three other options that you might consider (best experimenting with them - I don't know, for example, whether they work with Win 7) -

1. UltraRecall. This enables you to create a repository of PDF files and search them in various ways (Boolean search operators etc.), and that includes searching Hebrew and internal display of PDF files. UR has additional advantages: (a) you can annotate relevant PDFs (not internal PDF annotations, but annotations in UR which are "attached" to the PDF files) and search these annotations along with the original PDFs; (b) you can also search within other types of files, including all Office file types (except for Access), RTF files, and HTML, and all such files can be internally displayed in UR. You can also use UR to search files which are linked to the program, and not just files stored internally in UR.

2. X1. This is a full fledged desktop search program (not based on a user-generated repository of files), which enables you to search multiple file types, and provides internal display options, though not close to those provided by UR.

I use both UR and X1 extensively (obviously, for different kinds of projects), and I am extremely happy with both of them.

I also have a license for Archivarius, and while I prefer X1 over Archivarius, Archivarius has one potentially significant advantage over X1 (although I find it inferior in other ways, e.g. in terms of its interface): it enables you to search for prefixes, whereas X1 does not. Thus, for example, you can search Archivarius for [muse] and find "bemused" and "muses," whereas X1 will only locate the latter form (to find "bemused" you will have to do a double search = bemuse* or muse*).

3. Copernic Desktop Search, paid version. (The free version does not let you search PDFs). I don't have too much experience with this one, and I am disinclined towards it, since it works on a subscription model rather than a one-time payment model, as with the other products. Still, you might want to try it out.

Other options, which I am not sure at all how helpful you would find them, are Zotero and the no longer developed, but still downloadable, Smereka Tree Projects (http://personaldatabase.org/).

If you are not satisfied with these options, another place you might want to check out, is the excellent Outliner Software forum (outlinersoftware.com)

Good luck! (If you want to PM me about any of this I would be happy to try and help.)

superboyac

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 6,201
    • View Profile
    • Donate to Member
Regarding my post above, unless anyone points me elsewhere, I've narrowed the decision down to Archivarius 3000 and dtSearch for my PDF library needs
i've possibly spent even more time than you looking for the optimal software for this sort of thing over the years.  I also narrowed it down to archivarius and dtsearch.

archivarius' weak point is that only shows plain text, as you mention.  otherwise it would be almost perfect.

dtsearch is the option that can actually show pdf.  you need to use "dtsearch web", not the plain dtsearch option.  And i believe there is a pdf plugin that you have to install separately.  for me, this is the best solution that i have seen.

x1 search is also a decent one, but i dont prefer over dtsearch web.

superboyac

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 6,201
    • View Profile
    • Donate to Member
@Yatom:
...What program for Windows 7 (my OS) can do all of the following things: ...
You will probably find all your requirements - and a lot of new requirements (once you discover what else is possible) - being met by the superb Qiqqa.
Qiqqa has gone open source after 10 years of steady and highly successful development and use in the field.
It had always had a $FREE version anyway, for most users.
My review of Qiqqa (dated 2013) is here: Qiqqa - Reference Management System - Mini-Review - there seemed to be nothing else that could quite match it in the marketplace, and I think that's probably even more so the case today, though Elsevier's Mendeley might be quite good, but that's a different breed of cat now that Elsevier own it and it would be subject to Elsevier's apparently notorious rapacious $charging regime.

Regarding the DocFetcher software that @cranioscopical referred to above:
@Contro: I took a look at the details on the DocFetcher website, and it seems to be purely a document Search/Index proggy - could be an alternative/replacement  to (say) Windows Search/Index. Thus apparently not the same thing as Qiqqa at all.

hey nice find!  i have to try this it looks so good!

Yatom

  • Participant
  • Joined in 2020
  • *
  • default avatar
  • Posts: 12
    • View Profile
    • Donate to Member
Thank you all for your answers above, and I apologize for the delay in response.  (I didn't receive notification about this thread for some reason).

I feel blessed by all of your suggestions and advice.  In the meantime, I wrote just a day or so ago my experience with Archivarius, Dtsearch, and X1 (well, brief with X1).  You can see it as this thread:
http://www.donationc....msg437231#msg437231

@SuperboyAC:
archivarius' weak point is that only shows plain text, as you mention.

Interestingly, I'm actually completely over this issue, and I am pretty thrilled with Archivarius viewer.  Yes, initially I was set on having a PDF viewer, but here's what I realize: opening all my search hits in a PDF viewer *automatically / initially* would probably take a ton of time.  (I have some big PDF's ~ 200MB - 1GB).

And yet, Archivarius has a simple button which automatically launches the default program for whatever file you are previewing, so getting to a real PDF viewer/editor only takes the click of one button.  So also if my search results happen to be in a MS Word doc, or .txt file.

Truth be told, now that I see that Archivarius can handle so many file types, my mind has expanded beyond just building a PDF-only library.  I now have an index with about 8,000 files--everything from .doc, .pdf, .jpg, .txt., .xml, .html.  And Archivarius previews these all quite well.  (Of course, formatting such as paragraphs and line breaks are all stripped out for the most part, but the text is easy enough to read or search in most cases).

My chief complaints with Archivarius, after *finally* figuring out that I can turn the blasted "morphology mode" off, and do a real "exact search," are as follows (note: see this thread regarding "morphology search: http://www.donationc....msg437231#msg437231)

#1) Phrase search (e.g. "Run the race") doesn't allow wildcards in it.  It treats every word inside the parenthesis as exact, so (and I kid you not), if your document said "Run the racer", the example I just gave would not find it.  You can't search "Run the race*" with a wildcard.  That's hard for me to believe.

#2) Proximity/vicinity search is limited to only 10 words apart.  What if I want a range of 20, 50, or 100 words?

#3) Wildcards can't be used in a proximity/vicinity search.  (What!?)  So you have to know *exactly* how something is spelled, and if it has even an additional letter on the very end, you will not find it.  That's egregious.  Just let me stick a * or a ? in there!

#4) There is no case-sensitive searching.  The program claims to offer that in one of the search forms, but it doesn't work.

#5) There are a number of serious bugs (which I won't list right now), and the developer hasn't answered any email to multiple addresses for over a month.  Zero support whatsoever.  I've had to figure out the entire program for myself, pretty much.  But, I can help others now if they need it.

Thanks again for all of your helpful replies and advice above.  I just took a quick peek at Ultra Recall's webpage.  Does it allow you to index your library (8,000 files of various types for me), and search them like Archivarius or Dtsearch?