topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Friday April 19, 2024, 2:17 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: Desktop search; NTFS file numbers  (Read 5267 times)

peter.s

  • Participant
  • Joined in 2013
  • *
  • default avatar
  • Posts: 116
    • View Profile
    • Donate to Member
Desktop search; NTFS file numbers
« on: January 11, 2015, 07:55 AM »
This is a spin-off of page 32 (!) of this thread https://www.donation...x.php?topic=2434.775 ,

since I don't think real info should be buried within page 32 or 33 of a someday gross-page-long thread of which readers will perhaps read page 1, and then the very last (pages) only; on the other hand, even buried on some page 32, wrong and/or incomplete "info" should not be left unattended.
____________________

Re searching:

Read my posts in http://www.outliners...om/topics/viewt/5593

(re searching, and re tagging, the latter coming with the 260 chars for path plus filename limitations of course if you wanna do it within the file name... another possibly good reason to "encode" tags, in some form of .oac (Organisation(al things) - Assurances - Cars), instead of "writing them out")

Among other things, I say over there that you are probably well advised to use different tools for different search situations, according to the specific strengths of those tools; this is in accordance with what users say over here in the above DC thread.

Also, note that just searching within subsets of data is not only a very good idea for performance reasons (File Locator et al.), but also for getting (much) less irrelevant results: If you get 700 "hits", in many instances, it's not really a good idea to try to narrow down by adding further "AND" search terms, since that would probably exclude quite some relevant hits; narrowing down to specific directories would probably be the far better ("search in search") strategy; btw, another argument for tagging, especially for additional, specific tagging of everything that is in the subfolder into which it "naturally" belongs, but which belongs into alternative contexts, too (ultimately, a better file system should do this trick).

(Citations from the above page 32:)

Armando: "That said, I always find it weird when Everything is listed side by side with other software like X1, DTSearch or Archivarius. It's not the  same thing at all! Yes, most so called "Desktop search" software will be able to search file names (although not foldernames), but software like Everything won't be able to search file content." - Well said, I run into this irresponsible stew again and again; let's say that with "Everything" (and with Listary, which just integrates ET for this functionality), the file NAME search problem has definitely been resolved, but that does not resolve our full text search issues. Btw, I'm sure ET has been mentioned on pages 1 to 31 of that thread over and over again, and it's by nature such overlong threads will treat the same issues again and again, again and again giving the same "answers" to those identical problems, but of course, this will not stop posters who try to post just the maximum of post numbers, instead of trying to shut up whenever they can not add something new to the object of discussion. (I have said this before: Traditional forum sw is not the best solution for technical fora (or then, any forum), some tree-shaped sw (integrating a prominent subtree "new things", and other "favorites" sub-trees) would have been a thousand times better, and yes, such a system would obviously expose such overly-redundant, just-stealing-your-time posts. (At 40hz: Note I never said 100 p.c. of your posts are crap, I just say 95 or more p.c. of them are... well, sometimes they are quite funny at least, e.g. when a bachelor tries to tell fathers of 3 or 4 how to rise children: It's just that some people know-it-all, but really everything, for every thing in this life and this world, they are the ultimate expert - boys of 4 excel in this, too.)

Innuendo on Copernic: Stupid bugs, leaves out hits that should be there. I can confirm both observations, so I discarded this crap years before, and there is no sign things would have evolved in the right direction over there in the meantime, all to the contrary (v3>v4, OMG).

X1: See jity2's instructive link: http://forums.x1.com....php?f=68&t=9638) . My comment, though: X1's special option which then finds any (? did you try capitals, too, and "weird" non-German/French accented chars?) accented char, by just entering the respective base char, is quite ingenious (and new info for me, thank you!), and I think it can be of tremendous help IF it works "over" all possible file formats (but I so much doubt this!), and without fault, just compare with File Locator's "handling" (i.e. in fact mis-treating) accented chars even in simple .rtf files (explained in the outliner thread) - thus, if X1 found (sic, I don't dare say "finds") all these hits, by simply entering "relevement", for finding "relèvement" (which could, please note, have been wrongly written rélèvement" in some third-party source text within your "database" / file-system-based data repository, which detail would make you would not find it by entering the correct wording), this would be a very strong argument for using X1, and you clearly should not undervalue this feature, especially since you're a Continental and by this will probably have stored an enormous amount of text bodies containing accented chars, and which rather often will have accent errors within those original texts.

X1 again, a traditional problem of X1 not treated here: What about its handling of OL (Outlook) data? Not only that ancient X1 versions did not treat such data well, but far worse, X1 was deemed, by some commentators, to damage OL files, which of course would be perfectly inacceptable. What about this? I can't trial (neither buy, which I would have done, otherwise) the current X1 version, with my XP Win version, and it might be this obvious X1-vs.-OL problem has been resolved in the meantime (but even then, the question would remain which OL versions would possibly be affected even then? X1-current vs. OL-current possibly ok, but X1-current vs. OL-ancient-versions =?!). I understand that few people would be sufficiently motivated to trial this upon their real data, but then, better trial this, with let's say a replication of your current data, put onto an alternative pc, instead of runningg the risk that even X1-current will damage any OL data on your running system, don't you think so? (And then, thankfully, share your hopeful all-clear signal, or then, your warnings, in case - which would of course be a step further, not necessarily included within your first step of verifying...)

Innuendo on X1 vs. the rest, and in particular dtSearch:

"X1 - Far from perfect, but the absolute best if you use the criteria above as a guideline. Sadly, it seems they are very aware of being the best and have priced their product accordingly. Very expensive...just expensive enough to put it over the line of insulting. If you want the best, you and your wallet will be oh so painfully aware that you are paying for the best."

"dtSearch - This is a solution geared towards corporations and the cold UI and barely there acceptable list of features make this an unappetizing choice for home users. I would wager they make their bones by providing lucrative support plans and willingness to accept company purchase orders. There are more capable, less expensive, more efficient options available."

This cannot stay uncommented since it's obviously wrong in some respects, from my own trialling both; of course, if X1 has got some advantages (beyond the GUI, which indeed is much better, but then, some macroing for dtSearch could probably prevent some premature decision like jity2's one: "In fact after watching some videos about it, I won't try it because I don't use regex for searching keywords, and because the interface seems not very enough user friendly (I don't want to click many times just to do a keyword search !)."), please tell us!

First of all, I can confirm that both developers have (competent) staff (i.e. no comparison with the usual "either it's the developer himself, or some incompetent (since not trained, not informed, not even half-way correctly paid "Indian"") that is really and VERY helpful, in giving information, and in discussing features, or even lack of features, both X1 and dtSearch people are professional and congenial, and if I say dtSearch staff is even "better" than X1 staff, this, while being true, is not to denigrate X1 staff: we're discussing just different degrees of excellence here. (Now compare with Copernic.)

This being said, X1 seems to be visually-brilliant sw for standard applics, whilst dtSearch FINDS IT ALL. In fact, when trialling, I did not encounter any exotic file format from which I wasn't able to get the relevant hits, whilst in X1, if it was not in their (quite standard file format) list, it was not indexed, and thus was not found: It's as simple as that. (Remember the forensic objectives of dtSearch, but it's exactly this additional purpose of it that makes it capable of searching lots of even quite widespread file formats where most other (index-based) desktop search tools fail.

Also, allow for a brief divagation into askSam country: The reason some people cling to it, is the rarity of full-text "db's" able to find numerics. Okay, okay, any search tool can find "386", be it as part of a "string", or even as a "word" (i.e. as a number, or as part of a number), but what about "between 350 and 400"? Okay, okay, you can try (and even succeed, in part), with regex (= again, dtSearch instead of X1). But askSam does this, and similar, with "pseudo-fields", and normally, for such tasks, you need "real" db's for this, and as we all know, for most text-heavy data, people prefer text-based sw, instead of putting it all into relational db's. As you also know, there are some SQLite/other-db-based 2-pane outliners / basic IMS' that have got additional "columns" in order to get numeric data into, but that's not the same (, and even within there, searching for numeric data RANGES is far from evident).

Now that's for numeric ranges in db's, and now look into dtSearch's possibilities of identifying numeric ranges in pseudo-fields in "full text", similar to askSam, and you will see the incredible (and obviously, again, regex-driven) power of dtSearch.

Thus, dear Innuendo, your X1 being "the absolute best" is perfectly unsustainable, but it's in order to inform you better that I post this, and not at all in order to insinuate you had known better whilst writing the above.

____________________

Re ntfs file numbers:

jity2 in the above DC thread: "With CDS V3.6 size of the index was 85 Go with about 2,000,000 files indexed (Note: In one hdd drive I even hit the NTFS limit : too much files to handle !) . It took about 15 days to complete 24/24 7/7." Note: the last info is good to know... ;-(

It's evident 2 million (!) files cannot reach any "NTFS limit" but if you do lots of things completety wrong, and if you persistently left out 3 zeros, it would have been 8.6 (or, with the XP number, 4.3, but nothing near 2.0:)

eVista on

https://social.techn...forum=itprovistaapps :

"In short, the absolute limit on the number of files per NTFS volume seems to be 2 at the 32nd power minus 1*, but this would require 512 byte sectors and a maximum file size limit of one file per sector. Therefore, in practice, one has to calculate a realistic average file size and then apply these principles to that file size."

Note: That would be a little less than 4.3 (i.e. 2power32-1) billion files (for Continentals: 4,3 Milliarden/milliards/etc.), for XP, whilst it's 2power64-1 for Vista on, i.e. slightly less than 8.6 billion files.

EDIT: OF COURSE THAT IS NOT TRUE: The number you get everywhere is 2power32 = slightly less than 4.3 billion files, and I read that's for XP, whilst from Vista on, it would be double of that, which would make it a little less than 8.6 indeed (I cannot confirm this of course), and that would then be 2power33, not 64 (I obviously got lead astray by Win32/64 (which probably is behind that doubling though)).

No need to list all the google finds, just let me say that with "ntfs file number" you'll get the results you need, incl. wikipedia, MS...

But then, special mention to http://stackoverflow...iles-and-directories

with an absolutely brilliant "best answer", and then also lots of valuable details further down that page.

I think this last link will give you plenty of ideas how to better organize your stuff, but anyway, no search tool whatsoever should choke by some "2,000,000 limit", ntfs or otherwise.
When the wise points to the moon, the moron just looks at his pointer. China.
« Last Edit: January 11, 2015, 11:49 AM by peter.s »

jity2

  • Charter Member
  • Joined in 2006
  • ***
  • default avatar
  • Posts: 126
    • View Profile
    • Donate to Member
Re: Desktop search; NTFS file numbers
« Reply #1 on: January 11, 2015, 08:15 AM »
Hi Peter,
Thank you for your comments and stackoverflow link. ;)

It's evident 2 million (!) files cannot reach any "NTFS limit"
I agree. I was obviously not clear enough !
In fact here what I did to reach some kind of NTFS limit in one hard drive when I unzipped a monthly archive a few months ago. (The solution was to remove the unzipped version of some previous months and add them into another HDD) :
First, I have 2 version of my data:
- one is zipped (no problem here. Total size here is more than 2 To).
- one is the same but unzipped version.
In the unzipped version has (from memory !) more than 10 Millions files in total, but I chosen to index the content of only some of the files : (about 2millions) For instance I index pdf, doc, many small eml files... etc But I do NOT index the filename or content of images .jpg,gif.. .js files...).
Since the last 15 years I have saved a lot of html files (https://www.donation...opic=23782.msg215928).

Thanks again for your link and especially this comment which I think is relevant to me ! ;)
http://stackoverflow...tories/683390#683390

Re:X1 outlook.
 Sorry I can't help as I don't use it (I use gmail).

See ya ;)
« Last Edit: January 11, 2015, 08:42 AM by jity2 »

peter.s

  • Participant
  • Joined in 2013
  • *
  • default avatar
  • Posts: 116
    • View Profile
    • Donate to Member
Re: Desktop search; NTFS file numbers
« Reply #2 on: January 11, 2015, 12:42 PM »
Hi jity2,

When I spoke of sharing being a step beyond finding out, I definitely didn't have you in mind, and I apologize for not having made my pov clear enough that I very much appreciate your sharing findings, in fact it was your details that motivated me to put some more details in. ;-)

Since you mention my mentioning path/file name lengths, let me cite from techrepublic:

http://www.techrepub...n-an-ntfs-directory/

"Something else to look out for
TonytheTiger 8 years ago
is the length limits. Filenames can be up to 255 characters, but path and file combined can only be 260 characters (you can get around the path part by using Subst or 'net use' and setting a drive letter farther down in the directory structure)."

Well, I would try file and folder naming hygiene first, and in most cases, that should do it.

And since you mention html and all those innumerable, worthless "service files", that reminds me of the importance of some "smarter" search tool hopefully being able to leave all these out of its indexing, but NOT by suffix name only, but by a combi of suffix and vicinity within the file structure, and possibly, if needed, also content: In fact, you separate your files/folders into an application part, and then a contents part (i.e., most of us do so), but the html format and similar formats blur this distinction again, shuffling servicing code into the "contents" area of your data, so it's obvious a smarter search tool should de-clutter this intrusion again.

Also, above, I forgot to mention Armando's

"Why do I mix Archivarius and DtSearch ? Simply because their algorithms for dealing with space and dashes are different and lead to different results. But if I had to choose one (but I woudn't...), I'd probably go with DtSearch : indexing is fairly quick and there are more search options to get what you want. Archivarius is fast too, but its search syntax isn't as sophisticated. Both could have better interface.

I use everything for filename/foldername search as it's so quick and its search syntax is very flexible and powerful (e.g. Regex can be used). (...) [Edit: about X1 : used to be my favorite, many years ago, but had to drop it because of performance reason and inaccuracy : it wouldn't index bigger documents well enough. See my comments earlier in the thread. To me, accuracy and precision are of absolute importance. If I'm looking for something and can't get to it when I know it's there... and then I'm forced to search "by hand"... There's a BIG problem.]"

The second passage bolded by me is both very important, in case, and subject to questioning since they clearly did some work upon their tool - question is, how deep that work would possibly have been: In short: Problem resolved, or then, not?

The first bolded passage is of the very highest interest, and should certainly not be buried in some page 32 of somewhere and somewhat, but should be investigated further, all the more so since my observation re French accents applies here, analogously: Many parallel wordings for the same phrase, with or without hyphens (let alone dashes), or then, "written together", i.e. in one word (or even in abbrevs), and further complicated when the phrase contains more than just two elements: a space between the first two elements, but then a hyphen between the second and third one, or the other way round...

Which makes me wonder which of these tools might be able to correctly treat as equal English and American English, but without doing so by "fuzzy searching" which would bring masses of unwanted pseudo-hits...

(History's irony: askSam, by its overwhelming success in those ancient times, "killed" another, similar "full text db" program, but which HAD semantic search, whilst AS, even 30 years later, never got to that (and has now be moribund for some 5, 6 or 8 years)... and cannot be found yet in any of those 2- and little-3-figure desktop search tools (but in 4- and 5-figure corporate tools it seems... and all this is about market considerations, not about technology: technology-wise, not speaking of (possible) AI, it all would be some additional concordance tables, especially when indexing, and less so when search time comes).)

And no, I'm not trying to talk you into running dtSearch indexing for days: It would just put unnecessary strain on your hardware, and from your findings and what we think we know, we can rather safely assume it would be somewhere between 6 and 8 full days of indexing, when X1 needs 10, and Copernic needs 15. (Even though I'm musing about possible surprises, and then, you ran your stuff for 25 consecutive days now, so some 5 days more, percentage-wise... ;-) ) Let's just say, that would have been utterly instructive. ;-)


EDIT:

The 8.3 problem/solution is often mentioned; in

http://stackoverflow...-lots-of-small-files it is explained best:

"NTFS actually will perform fine with many more than 10,000 files in a directory as long as you tell it to stop creating alternative file names compatible with 16 bit Windows platforms. By default NTFS automatically creates an '8 dot 3' file name for every file that is created. This becomes a problem when there are many files in a directory because Windows looks at the files in the directory to make sure the name they are creating isn't already in use. You can disable '8 dot 3' naming by setting the NtfsDisable8dot3NameCreation registry value to 1. The value is found in the HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\FileSystem registry path. It is safe to make this change as '8 dot 3' name files are only required by programs written for very old versions of Windows.

A reboot is required before this setting will take effect.
share|improve this answer
edited Oct 24 '08 at 23:06
community wiki
2 revs
Dan Finucane"

ANOTHER EDIT:

Since Copernic is (again) on bits today, here's another element being relevant for our subject (from over there):

"Jaap Aap Hi, is my understanding from the fine print correct that this includes the upgrade to v5? And will sorting by relevance be included at that point?"

I would have worded this "SOME sorting by relevance", since there are innumerable ways of implementing sorting by relevance into a search tool, but it's clear as day this functionality, while being of the highest possible importance, has not been treated by developers (of the "consumer products" discussed here at least) with due attention, up to now, if I'm not mistaken? (This being said, it's obvious that a badly-implemented "display-by-relevance" would need to come with the option to disable it.)
When the wise points to the moon, the moron just looks at his pointer. China.
« Last Edit: January 12, 2015, 05:55 AM by peter.s »