Messages - peter.s [ switch to compact view ]

Pages: prev1 2 3 4 [5] 6 7 8 9 10 ... 24next
General Software Discussion / Re: Desktop search; NTFS file numbers
« on: January 11, 2015, 12:42 PM »
Hi jity2,

When I spoke of sharing being a step beyond finding out, I definitely didn't have you in mind, and I apologize for not having made my pov clear enough that I very much appreciate your sharing findings, in fact it was your details that motivated me to put some more details in. ;-)

Since you mention my mentioning path/file name lengths, let me cite from techrepublic:

"Something else to look out for
TonytheTiger 8 years ago
is the length limits. Filenames can be up to 255 characters, but path and file combined can only be 260 characters (you can get around the path part by using Subst or 'net use' and setting a drive letter farther down in the directory structure)."

Well, I would try file and folder naming hygiene first, and in most cases, that should do it.

And since you mention html and all those innumerable, worthless "service files", that reminds me of the importance of some "smarter" search tool hopefully being able to leave all these out of its indexing, but NOT by suffix name only, but by a combi of suffix and vicinity within the file structure, and possibly, if needed, also content: In fact, you separate your files/folders into an application part, and then a contents part (i.e., most of us do so), but the html format and similar formats blur this distinction again, shuffling servicing code into the "contents" area of your data, so it's obvious a smarter search tool should de-clutter this intrusion again.

Also, above, I forgot to mention Armando's

"Why do I mix Archivarius and DtSearch ? Simply because their algorithms for dealing with space and dashes are different and lead to different results. But if I had to choose one (but I woudn't...), I'd probably go with DtSearch : indexing is fairly quick and there are more search options to get what you want. Archivarius is fast too, but its search syntax isn't as sophisticated. Both could have better interface.

I use everything for filename/foldername search as it's so quick and its search syntax is very flexible and powerful (e.g. Regex can be used). (...) [Edit: about X1 : used to be my favorite, many years ago, but had to drop it because of performance reason and inaccuracy : it wouldn't index bigger documents well enough. See my comments earlier in the thread. To me, accuracy and precision are of absolute importance. If I'm looking for something and can't get to it when I know it's there... and then I'm forced to search "by hand"... There's a BIG problem.]"

The second passage bolded by me is both very important, in case, and subject to questioning since they clearly did some work upon their tool - question is, how deep that work would possibly have been: In short: Problem resolved, or then, not?

The first bolded passage is of the very highest interest, and should certainly not be buried in some page 32 of somewhere and somewhat, but should be investigated further, all the more so since my observation re French accents applies here, analogously: Many parallel wordings for the same phrase, with or without hyphens (let alone dashes), or then, "written together", i.e. in one word (or even in abbrevs), and further complicated when the phrase contains more than just two elements: a space between the first two elements, but then a hyphen between the second and third one, or the other way round...

Which makes me wonder which of these tools might be able to correctly treat as equal English and American English, but without doing so by "fuzzy searching" which would bring masses of unwanted pseudo-hits...

(History's irony: askSam, by its overwhelming success in those ancient times, "killed" another, similar "full text db" program, but which HAD semantic search, whilst AS, even 30 years later, never got to that (and has now be moribund for some 5, 6 or 8 years)... and cannot be found yet in any of those 2- and little-3-figure desktop search tools (but in 4- and 5-figure corporate tools it seems... and all this is about market considerations, not about technology: technology-wise, not speaking of (possible) AI, it all would be some additional concordance tables, especially when indexing, and less so when search time comes).)

And no, I'm not trying to talk you into running dtSearch indexing for days: It would just put unnecessary strain on your hardware, and from your findings and what we think we know, we can rather safely assume it would be somewhere between 6 and 8 full days of indexing, when X1 needs 10, and Copernic needs 15. (Even though I'm musing about possible surprises, and then, you ran your stuff for 25 consecutive days now, so some 5 days more, percentage-wise... ;-) ) Let's just say, that would have been utterly instructive. ;-)


The 8.3 problem/solution is often mentioned; in it is explained best:

"NTFS actually will perform fine with many more than 10,000 files in a directory as long as you tell it to stop creating alternative file names compatible with 16 bit Windows platforms. By default NTFS automatically creates an '8 dot 3' file name for every file that is created. This becomes a problem when there are many files in a directory because Windows looks at the files in the directory to make sure the name they are creating isn't already in use. You can disable '8 dot 3' naming by setting the NtfsDisable8dot3NameCreation registry value to 1. The value is found in the HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\FileSystem registry path. It is safe to make this change as '8 dot 3' name files are only required by programs written for very old versions of Windows.

A reboot is required before this setting will take effect.
share|improve this answer
edited Oct 24 '08 at 23:06
community wiki
2 revs
Dan Finucane"


Since Copernic is (again) on bits today, here's another element being relevant for our subject (from over there):

"Jaap Aap Hi, is my understanding from the fine print correct that this includes the upgrade to v5? And will sorting by relevance be included at that point?"

I would have worded this "SOME sorting by relevance", since there are innumerable ways of implementing sorting by relevance into a search tool, but it's clear as day this functionality, while being of the highest possible importance, has not been treated by developers (of the "consumer products" discussed here at least) with due attention, up to now, if I'm not mistaken? (This being said, it's obvious that a badly-implemented "display-by-relevance" would need to come with the option to disable it.)

General Software Discussion / Desktop search; NTFS file numbers
« on: January 11, 2015, 07:55 AM »
This is a spin-off of page 32 (!) of this thread ,

since I don't think real info should be buried within page 32 or 33 of a someday gross-page-long thread of which readers will perhaps read page 1, and then the very last (pages) only; on the other hand, even buried on some page 32, wrong and/or incomplete "info" should not be left unattended.

Re searching:

Read my posts in

(re searching, and re tagging, the latter coming with the 260 chars for path plus filename limitations of course if you wanna do it within the file name... another possibly good reason to "encode" tags, in some form of .oac (Organisation(al things) - Assurances - Cars), instead of "writing them out")

Among other things, I say over there that you are probably well advised to use different tools for different search situations, according to the specific strengths of those tools; this is in accordance with what users say over here in the above DC thread.

Also, note that just searching within subsets of data is not only a very good idea for performance reasons (File Locator et al.), but also for getting (much) less irrelevant results: If you get 700 "hits", in many instances, it's not really a good idea to try to narrow down by adding further "AND" search terms, since that would probably exclude quite some relevant hits; narrowing down to specific directories would probably be the far better ("search in search") strategy; btw, another argument for tagging, especially for additional, specific tagging of everything that is in the subfolder into which it "naturally" belongs, but which belongs into alternative contexts, too (ultimately, a better file system should do this trick).

(Citations from the above page 32:)

Armando: "That said, I always find it weird when Everything is listed side by side with other software like X1, DTSearch or Archivarius. It's not the  same thing at all! Yes, most so called "Desktop search" software will be able to search file names (although not foldernames), but software like Everything won't be able to search file content." - Well said, I run into this irresponsible stew again and again; let's say that with "Everything" (and with Listary, which just integrates ET for this functionality), the file NAME search problem has definitely been resolved, but that does not resolve our full text search issues. Btw, I'm sure ET has been mentioned on pages 1 to 31 of that thread over and over again, and it's by nature such overlong threads will treat the same issues again and again, again and again giving the same "answers" to those identical problems, but of course, this will not stop posters who try to post just the maximum of post numbers, instead of trying to shut up whenever they can not add something new to the object of discussion. (I have said this before: Traditional forum sw is not the best solution for technical fora (or then, any forum), some tree-shaped sw (integrating a prominent subtree "new things", and other "favorites" sub-trees) would have been a thousand times better, and yes, such a system would obviously expose such overly-redundant, just-stealing-your-time posts. (At 40hz: Note I never said 100 p.c. of your posts are crap, I just say 95 or more p.c. of them are... well, sometimes they are quite funny at least, e.g. when a bachelor tries to tell fathers of 3 or 4 how to rise children: It's just that some people know-it-all, but really everything, for every thing in this life and this world, they are the ultimate expert - boys of 4 excel in this, too.)

Innuendo on Copernic: Stupid bugs, leaves out hits that should be there. I can confirm both observations, so I discarded this crap years before, and there is no sign things would have evolved in the right direction over there in the meantime, all to the contrary (v3>v4, OMG).

X1: See jity2's instructive link: . My comment, though: X1's special option which then finds any (? did you try capitals, too, and "weird" non-German/French accented chars?) accented char, by just entering the respective base char, is quite ingenious (and new info for me, thank you!), and I think it can be of tremendous help IF it works "over" all possible file formats (but I so much doubt this!), and without fault, just compare with File Locator's "handling" (i.e. in fact mis-treating) accented chars even in simple .rtf files (explained in the outliner thread) - thus, if X1 found (sic, I don't dare say "finds") all these hits, by simply entering "relevement", for finding "relèvement" (which could, please note, have been wrongly written rélèvement" in some third-party source text within your "database" / file-system-based data repository, which detail would make you would not find it by entering the correct wording), this would be a very strong argument for using X1, and you clearly should not undervalue this feature, especially since you're a Continental and by this will probably have stored an enormous amount of text bodies containing accented chars, and which rather often will have accent errors within those original texts.

X1 again, a traditional problem of X1 not treated here: What about its handling of OL (Outlook) data? Not only that ancient X1 versions did not treat such data well, but far worse, X1 was deemed, by some commentators, to damage OL files, which of course would be perfectly inacceptable. What about this? I can't trial (neither buy, which I would have done, otherwise) the current X1 version, with my XP Win version, and it might be this obvious X1-vs.-OL problem has been resolved in the meantime (but even then, the question would remain which OL versions would possibly be affected even then? X1-current vs. OL-current possibly ok, but X1-current vs. OL-ancient-versions =?!). I understand that few people would be sufficiently motivated to trial this upon their real data, but then, better trial this, with let's say a replication of your current data, put onto an alternative pc, instead of runningg the risk that even X1-current will damage any OL data on your running system, don't you think so? (And then, thankfully, share your hopeful all-clear signal, or then, your warnings, in case - which would of course be a step further, not necessarily included within your first step of verifying...)

Innuendo on X1 vs. the rest, and in particular dtSearch:

"X1 - Far from perfect, but the absolute best if you use the criteria above as a guideline. Sadly, it seems they are very aware of being the best and have priced their product accordingly. Very expensive...just expensive enough to put it over the line of insulting. If you want the best, you and your wallet will be oh so painfully aware that you are paying for the best."

"dtSearch - This is a solution geared towards corporations and the cold UI and barely there acceptable list of features make this an unappetizing choice for home users. I would wager they make their bones by providing lucrative support plans and willingness to accept company purchase orders. There are more capable, less expensive, more efficient options available."

This cannot stay uncommented since it's obviously wrong in some respects, from my own trialling both; of course, if X1 has got some advantages (beyond the GUI, which indeed is much better, but then, some macroing for dtSearch could probably prevent some premature decision like jity2's one: "In fact after watching some videos about it, I won't try it because I don't use regex for searching keywords, and because the interface seems not very enough user friendly (I don't want to click many times just to do a keyword search !)."), please tell us!

First of all, I can confirm that both developers have (competent) staff (i.e. no comparison with the usual "either it's the developer himself, or some incompetent (since not trained, not informed, not even half-way correctly paid "Indian"") that is really and VERY helpful, in giving information, and in discussing features, or even lack of features, both X1 and dtSearch people are professional and congenial, and if I say dtSearch staff is even "better" than X1 staff, this, while being true, is not to denigrate X1 staff: we're discussing just different degrees of excellence here. (Now compare with Copernic.)

This being said, X1 seems to be visually-brilliant sw for standard applics, whilst dtSearch FINDS IT ALL. In fact, when trialling, I did not encounter any exotic file format from which I wasn't able to get the relevant hits, whilst in X1, if it was not in their (quite standard file format) list, it was not indexed, and thus was not found: It's as simple as that. (Remember the forensic objectives of dtSearch, but it's exactly this additional purpose of it that makes it capable of searching lots of even quite widespread file formats where most other (index-based) desktop search tools fail.

Also, allow for a brief divagation into askSam country: The reason some people cling to it, is the rarity of full-text "db's" able to find numerics. Okay, okay, any search tool can find "386", be it as part of a "string", or even as a "word" (i.e. as a number, or as part of a number), but what about "between 350 and 400"? Okay, okay, you can try (and even succeed, in part), with regex (= again, dtSearch instead of X1). But askSam does this, and similar, with "pseudo-fields", and normally, for such tasks, you need "real" db's for this, and as we all know, for most text-heavy data, people prefer text-based sw, instead of putting it all into relational db's. As you also know, there are some SQLite/other-db-based 2-pane outliners / basic IMS' that have got additional "columns" in order to get numeric data into, but that's not the same (, and even within there, searching for numeric data RANGES is far from evident).

Now that's for numeric ranges in db's, and now look into dtSearch's possibilities of identifying numeric ranges in pseudo-fields in "full text", similar to askSam, and you will see the incredible (and obviously, again, regex-driven) power of dtSearch.

Thus, dear Innuendo, your X1 being "the absolute best" is perfectly unsustainable, but it's in order to inform you better that I post this, and not at all in order to insinuate you had known better whilst writing the above.


Re ntfs file numbers:

jity2 in the above DC thread: "With CDS V3.6 size of the index was 85 Go with about 2,000,000 files indexed (Note: In one hdd drive I even hit the NTFS limit : too much files to handle !) . It took about 15 days to complete 24/24 7/7." Note: the last info is good to know... ;-(

It's evident 2 million (!) files cannot reach any "NTFS limit" but if you do lots of things completety wrong, and if you persistently left out 3 zeros, it would have been 8.6 (or, with the XP number, 4.3, but nothing near 2.0:)

eVista on :

"In short, the absolute limit on the number of files per NTFS volume seems to be 2 at the 32nd power minus 1*, but this would require 512 byte sectors and a maximum file size limit of one file per sector. Therefore, in practice, one has to calculate a realistic average file size and then apply these principles to that file size."

Note: That would be a little less than 4.3 (i.e. 2power32-1) billion files (for Continentals: 4,3 Milliarden/milliards/etc.), for XP, whilst it's 2power64-1 for Vista on, i.e. slightly less than 8.6 billion files.

EDIT: OF COURSE THAT IS NOT TRUE: The number you get everywhere is 2power32 = slightly less than 4.3 billion files, and I read that's for XP, whilst from Vista on, it would be double of that, which would make it a little less than 8.6 indeed (I cannot confirm this of course), and that would then be 2power33, not 64 (I obviously got lead astray by Win32/64 (which probably is behind that doubling though)).

No need to list all the google finds, just let me say that with "ntfs file number" you'll get the results you need, incl. wikipedia, MS...

But then, special mention to

with an absolutely brilliant "best answer", and then also lots of valuable details further down that page.

I think this last link will give you plenty of ideas how to better organize your stuff, but anyway, no search tool whatsoever should choke by some "2,000,000 limit", ntfs or otherwise.

Spin off "Desktop search; NTFS file numbers" here:

General Software Discussion / And IT Man of the Year 2014 Is...
« on: December 24, 2014, 08:05 AM »
Diego Garcia.

(who of course stands for a rare, high-brow collaborative programming effort). Here's why (in French, but a Century ago, that was the lingua franca for the educated people of this world anyway, so some google translation effort should not be out of your reach):

(this is "part 2", but which includes part 1 - as you all know, the French do have a reputation of being a little unorganized).

You will learn that Boeing have their own, official patent for their ways to remote control their own aircraft, which comes handy e.g. whenever they decide, for whatever reason, that such an engine should be brought down immediately.

It's a secret for no one that ace technology can almost exclusively be found in weaponry, and for programming, that's similar - whilst e.g. if you want to hear the most elaborate lies there are, both government authorities and air carriers (and their paid or free yappies) are prime addressees.

Of course, they ain't bearable as long as you consider their output fun, especially since if you don't take it all on second degree, your intellectual prerequisite should be that you think logic is a new iApp (just one example: yes, in order to bring down an aircraft onto some military base, in order to destroy it, yes, first they will let you do that, instead of intercepting you, and second, it's a brilliant idea to drill traditional landing on their runway: but let's not forget most people will swallow anything that comes from their thinking delegates - if that reminds you of stories of spit in the North Corean camps).

And now for the reasons of all this, well, don't trust "science" and her lies either, just "trust" the bad connections within your own, poor brain (or do even not if you don't want to fall to self-deceit any other second):

Here's a wonderful specimen of why, for example, even corporations like MS ain't able to output decent software (just two lesser-known examples: yes, Word has got cross-links, but have a look at the way they are implemented; yes, there is Active Directory, but look at its incredibly bad permissions M), and it also explains why even very smart people's output, in most cases, is abysmal: the smarter you are, the sooner in your life you will have internalized that total self-censorship is in your primal interest: They call this "survival instinct" (Darwin's "best fit");  it's not but in very few industries where "anything goes" that ironically you are entitled to set your thinking free (and the more perverted and / or strange the better).

But back to the regular way of collaboration and which assures that nothing outstanding will be created, even if the combined I.Q. of 3 people amounts to 500, and how they "sell" you their propaganda (note the lovely pic which I'd call the "shut-up nigga" - well, I'm just the messenger, and of course that pic reminded me of Uncle Tom, and of the mythological three apes; also note the perfect white-collar clothing of the shut-up nigga - so please identify to him if you're white, too: if even bogeyman can be hand-tame, you can be be a "good dog!", too!):

Well, if you wonder how a "normal person" can title

"How to Succeed at Work? Censor Yourself",

here's why,

"After a childhood of jumping from country to country, Nathan is used to feeling like a tourist everywhere he goes."

Yes, that's the fate of many a diplomat's child: Lifelong deracination to the point of believing in the salvationary nature of any Ebola saliva they feed you, instead of just gulping it in order to survive for some more days. (Of course, I don't even mention possible insurgency against your gaolers: That's as out of the question for the lifelong inmates of Western oligarchies as it is for Pyongyang's slaves.)

If up to now, you only felt that it was oh so queer that even very smart people a) "believe" and / or b) produce quite underwhelming output, even in big corporations where there's plenty of resources, well, face it:

The human brain's interconnections ain't done that brilliantly yet... which might end up quite soon in some new ai mainframes even queerer than man himself, and that could be another end (except for the mythical cockroaches, and then, in some more million years from now...).

In the meantime, "Merry Christmas!" and similar are sort of an obscenity, don't you think so?

The above "How to Succeed at Work? Censor Yourself" is one side of the coin, the findings of the Milgram experiment (1963) being the other side, the "coin" being Man's Perverted Nature.

Hence, no hope for any decent MS software ever! ;-) And sorry for possibly having impeded your Christmas illusion, but at least smart people should revert to thinking mode here and then, at the very least, and perhaps Christmas' contemplative mood could lower your traditional, human resistance  to home truth.

(Notes: a: Man's worth in this society being determined by his standing, it's consequential that the smartest coders go to MS et al., instead of doing their own thing, since most own things in coding don't generate high 6- to low 7-digit incomes p.a., and that's why even from "independent developers", you don't get real goodies in most occurrences either; b: Don't blame me for not having read the original Cornell Univ article from "Creativity from Constraint? How Political Correctness Influences Creativity in Mixed-Sex Work Groups", I reminded you of the reasons for this just days ago; don't blame me for not developing the bias between Nathan's article and the mixed-sex work group setting we're referring to - believe us: mixed-sex group thinking has very little to do with mental auto-crippling, fascism just being another word for group-dynamics, and vice-versa, and leafing thru any newspaper of your choice does show the effects of this very unhealthy miswiring of most of ours' brains, page after page.)

General Software Discussion / Re: Good news for any InfoSelect users
« on: December 21, 2014, 12:52 PM »

Your infatuation with IS is established, and I also am acquainted with the ways some people here kick up long-buried threads from the dust again, with titles having become hilarious in time, so I didn't really fell for the obvious trap "Good news", for revival of a more-than-5-years old thread. In other words, I have been zero astonished from the absence of any good news. On the other hand, you never know, there might be a chance there is, so you have to look it up, and that way, they get you anyway - it's disrespect for readers NOT interested in lies but in news, but as long as foxes do protect the hens...


Thus, since I have been lured into this anyway, let me say that some, "But what IS has never addressed is the Cloud and all the features that, as in OneNote, allow one seamlessly to store and search things that aren't straight text, including audio." is of sociological interest indeed, but that in fact, illiterates discuss linguistics.

Any discussion about how to group and make available files and info therin in multiple, optimized ways, is a modern one, and an expedient one, cf. , and I do not deny that cloud access has become worthwile to discuss, too ; any whining about not all your stuff being replicated within some proprietary db repository is just completely unsexy, vieux jeu blatant amateur blah blah (not to be confounded with possible enthousiasts' rectifiable beginners' mistakes).

You simply have to conceive that smart sw designers - Mr. Lewis here - might have realized that the "put it in the pim" concept has had its day, and that they act accordingly* (and oh, that, "The response to my second e-mail was no response at all. I don't understand why MicLog refuses even to confirm that they are (a) working on an upgrade; (b) not working on an upgrade. If it's the latter, then there's nothing to lose by saying so." - that was a good one! cf. the never-ending dying of askSam) - and get some other clothes, please, your old ones are unbearable to look at. (Oh, it isn't even you? You a replica? That fits then.)

* = Look at it from another angle: One smart developer at least doesn't degrade himself to taking any more money from idiot users willing to throw their money again and again into the wrong direction. It's just that you ain't accustomed to honest merchants anymore, Slatty.

Pages: prev1 2 3 4 [5] 6 7 8 9 10 ... 24next
Go to full version