topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Wednesday December 24, 2025, 7:49 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Recent Posts

Pages: prev1 ... 262 263 264 265 266 [267] 268 269 270 271 272 ... 364next
6651
Living Room / Re: Use video RAM as a swap disk?
« Last post by f0dder on October 15, 2007, 03:20 PM »
We left the theme of this thread by far I think. Perhaps we should open a new "coders" corner?
-Crush
We should have a moderator move all filesearchindexing stuff to a different thread :)

I agree that people are normally interested in fuzzy and not exact matches, I still can't help loving the simplicity & speed of "compare two strings just by comparing two integers" :)

About open/close speed, I'd keep the relevant files open for the duration of the program, usually you're going to do multiple searches (at least I tend to) - and there's not much reason not to keep them open.

Case-insensitive boyer-moore is cute, and a decent approach when you're doing partial searches without wildcards - I would definitely do search term analysis and choose method depending on that.

I still think the stringtable is a good idea, since it simplifies the main structures so much - and with caching, it should become very efficient as well.

Sorting would destroy some features as adding changes with time stamps. To access the entries with the right time would need a total reorganisation of the sorting. This also made the decision easier to do sorting with extra tables.
-Crush
Well, the sorting was only for the stringtable really, and only to make inserts and lookups very fast (fast exact lookups are useful not just for exact matches, but also when adding new (possibly already existing) strings to the stringtable).

Sorting your filelists based on different criteria should indeed be done by creating an additional vector of pointers/indexes and sort those, leaving the original structures intact. Much faster than swapping around, too.

Besides, my Filesystem-benching also helped to find a new way to scan directory structures on medias much faster then the "normal" way.
-Crush
Do tell :)

I was playing around with a FindFirstFile/FindNextFile loop that didn't SetCurrentDirectory(), didn't recurse, and used a single MAX_PATH char array... but I realized that FindFirstFile internally does a SetCurrentDirectory(), so you don't save much by doing it this way, apart from some limited amount of stack memory.

bloomsearching?
6653
Yup, but keep in mind this came fairly late in the call, after I had established that the machine was stone cold dead, pushing up the daisies, joined the bleeding choir invisible, an ex-computer.
-Ralf Maximus
Fair enough :)

That kind of scripted no-brain "support" really sucks... But that's what you get most of the time, unless you find a competent contact person you can get through to, directly.
6654
One time I called to get an RNA on a laptop that was totally dead, and the tech wanted me to hold the F2 key down while booting and check out some BIOS settings.  I patiently explained that the computer WOULD NOT BOOT, but did that phase them?  No.  The Script Must Be Followed.
In all fairness, you can often get into the BIOS even though the system won't boot, since that term these days is more associated with getting the OS up, rather than the old "basic bootstrapping". If I hear somebody say they system won't boot, my first guess is that it's an OS or BIOS problem of sorts - otherwise people tend to say "it's dead" instead of "It doesn't boot" :)
6655
Living Room / Re: Use video RAM as a swap disk?
« Last post by f0dder on October 15, 2007, 09:15 AM »
I have something like the following in mind - 48 bytes on an x86-32 machine with default padding, 72 bytes for x86-64. Which means you could keep the main structures in-memory for 1 million files and only use 48 megabytes of memory, not bad for speed.

And since it's a fixed-size structure, it would be very easy and efficient to do read caching and only keep part of the structure in memory. Having stringtable index for both the "real" path/file string as well as uppercase versions means that if you're searching for a non-wildcard string or checking if a file already exists, you're only doing an integer operation instead of a (relatively :) )slow string compare. Here's my idea of a fileobject structure:

Code: C++ [Select]
  1. struct FileObject
  2. {
  3.         uint32          strPath;                // stringtable index of pathname (containing folder)
  4.         uint32          strPathUp;              // -"-, uppercase version
  5.         uint32          strName;                // stringtable index of filename
  6.         uint32          strNameUp;              // -"-, uppercase version
  7.  
  8.         uint64          fileSize;
  9.         uint32          fileFlags;              // flags & attributes
  10.         FILETIME        timeCreate;             // file creation time
  11.         FILETIME        timeUpdate;             // file last update time
  12.  
  13.         uint32          blobIndex;              // "blob" table index (plugin data)
  14. };


The string table requires some more explanation, in order to see why it's a good idea. Unique strings are only to be stored once, so identically named files have the same index (fast compares, as mentioned earlier). The string data can be cached as well, so you don't have to keep it all in memory. String indexes are static (ie., adding a new string doesn't cause other string indexes to change) - but at the same time we really want the string to be sorted, to have fast lookups. We want to be able to use binary searches.

So, the string table consists of a "string index table" that maps string indexes as used in the FileObject to the sorted indexes (which will change upon adding a string). Other than that, the stringtable consists of {u32 size, u32 filepos} pairs (or u64 if you want to support really huge systems). So, you can keep the entire stringtable index tables in memory, or (again since it's a fixed-size structure) you can do fast & efficient lookups - the index tables as well as the strings themselves can be cached.

Strings and uppercase strings could be in two separate tables, or share one table.

So, if you're doing a non-wildcard search, you start by uppercasing the search string, and then check if it's present in the uppercase string table. If not, you can abort your search right away. If it's present, the lookup gives you an integer for your comparisons, and you can then do ultra-fast binary search for the file. Wildcard searches require somewhat more work, but still not that bad, if you implement caching properly.

The blob table would take some pondering, since it has more variable-length stuff. I'm a big fan of "top-level fixed-size structures" because it makes things easier and more efficient, at the cost of some additional indirection...

Code: C++ [Select]
  1. struct BlobIndex
  2. {
  3.         u32             blobCount;      // *number* of blob items for this index
  4.         u64             fileOffset;     // file position of BlobItem list
  5. };
  6.  
  7. struct BlobItem
  8. {
  9.         GUID    blobType;       // blob type - I prefer GUIDs to ints
  10.         u32             blobSize;       // size of the blob item
  11.         u64             fileOffset;     // file position of the blob data
  12. };

Again, the main structure (array of BlobIndex'es) is small and you can keep a lot of them in memory without problems, and it's easy to cache as well because of the fixed size. It's also fairly fast to get the list of blob items, since it's just seeking to bidx.fileOffset and reading blobCount*sizeof(BlobItem) bytes.

With this approach, and proper serilization, you can eliminate the "write-seekback-updatewrite-seekend" entirely, you don't have to do a "read-seek" loop just to get your lists, etc.

Even for 100.000's of files you could keep the entire main structures in memory without being too much of a memory hog (speed often comes at a size penalty though), but it would still be very easy to do caching if your class layout is decent.

It would be relatively trivial to code as well, I think the "worst" code to write would be updating the string-index table upon sorting, but that's really not too bad. Binary search of strings, as well as stringcompare-by-integercheck means very nice speed properties.

Also, with proper class design, it's also very easy to do performance/stats collecting, which you can use to tune things like cache sizes... either by manual analyzing, or adapting at runtime.

If you want full-text searching in files, that could probably be added as a blob item, although I think I'd prefer an additional file for that.
6656
Living Room / Re: Use video RAM as a swap disk?
« Last post by f0dder on October 15, 2007, 02:57 AM »
My idea was to keep the full string table in memory, this way you wouldn't have to do additional seek/read stuff. This could take a fair amount of memory, but that might be a decent sacrifice for a lot of speed, depending on what the program is for :)

You could also keep the string table on disk and only have {offset, length} pairs in memory, and access the table with a memory-mapped files - that puts you at the mercy of the windows filesystem cache and is only really suitable when you only need read access to the strings, but would use less memory permanently.

Hm, you say plugins can write additional data... unless a plugin can write substantial amounts of data (sevaral megabytes), I would make serialization write to a memory stream before even going to a file class, that way you can easily calculate sizes, and write out bug chunks without seeking back and forth.

How often is plugin data used compared to the "main structure" information? Fixed-size records are nice, you could probably keep the entire "basic" file information set completely in memory, and move plugin data chunks to another file. That would make the basic information very fast & easy to deal with...

Are you doing a file indexer, or something else? What's the typical usage scenario? How often are writes/updates done compared to reads? What kind of file numbers are you dealing with?

I really like brainstorming these kind of optimazion scenarios :)

I only wondered why this performance problem isn´t remarked more often by others handling with big amount datas.
-Crush
Some people care, other people don't... I know that some console developers care a lot :), including extensive logging in their bigfile code, so they can trace usage and read patterns, and re-organize data in the bigfiles according to this (makes a lot of difference when you're dealing with the über-slow seeks on optical media). Also means pondering a lot about data structures, finding ways to be able to read them more-or-less directly (with simple post-read fixups) instead of slow & inefficient member-by-member serializing...

There was an article on a gamedev forum by the developers from... I think it was the Commandos game, good read anyway.
6657
General Software Discussion / Re: Fixed Drive Letters for Removable Drives
« Last post by f0dder on October 14, 2007, 06:21 PM »
Yeah, there really isn't any way to assign them automatically on other computers. Software designed to run on portable devices (and heck, just about any other software as well) shouldn't use hardcoded paths.
6658
I'm not too keen on either of the two formats, but OOXML is clearly evil, and the way MS have been pushing it and trying to fasttrack it through ISO is nasty. Shame on them.

I still find myself running Office2000 though, because even on my not-so-shabby machine, OpenOffice is a drag to load. Even second-time runs when executables, DLLs, etc. are all cached. And the museum I do admin'ing for can't move away from MS Office either, because "everybody else" are using MSO, and (because of closed fileformats) OO can't read everything properly.

It's a big royal mess. I hope MS will ultimately dump OOXML and use either it's old .doc format or ODF as primary document format. But that's a dream, and we all know it won't happen, and that ISO will eventually accept OOXML as a standard. You know how it goes: "god money, I'll do anything for you..."
6659
Living Room / Re: Use video RAM as a swap disk?
« Last post by f0dder on October 14, 2007, 03:23 PM »
If you only have simple POD types, you can serialize it all in one go if you stuff it in a struct (which comes naturally if you use the pImpl idiom) - of course there's some potential portability issues by doing this, and it won't work for non-POD types...

But even if you can do this writing and don't have to resort to member-by-member, you definitely should use a write cache to minimize user<>kernel mode transitions.

Having to seek back and forth sounds bad, can't you rearrange your data structures to avoid it? Like, instead of storing the variable-length strings, split those off to a string table and simply use an index or offset integer in the file struct...
6660
IntelliSense in visual studio, nothing but that...
6661
Why anything else but firefox?

It's a slow and bloated pig, but at least it has less (and less severe) security problems than IE, and renders things well.
6662
Living Room / Re: Use video RAM as a swap disk?
« Last post by f0dder on October 14, 2007, 04:21 AM »
I have no idea how to reduce the usermode<>kernalmode switching without an own buffer. Do you have a simple solution?

Nevertheless, I´ll use my own caching-system in the future and don´t trust in the "normal" filesystem too much. Let´s next see how caching with VMem will work compared to normal memory.
Simple solution: do larger writes - user your own buffer is one decent way to do larger writes. But you can probably find other ways to increase efficiency as well (ie., don't write an integer at a time, write out chunks of integers from your array).

Caching with video memory is going to be slower than normal memory, transfers there are a bit more expensive... especially if you're stuck with AGP instead of PCI-e, sinced AGP readbacks are slooooow.
6663
General Software Discussion / Re: Adobe Acrobat Reader Security Vulnerability
« Last post by f0dder on October 14, 2007, 04:19 AM »
Well f0dder <grin> MS doesnt agree with you - they owned up to it being a Windows XP+IE7 bug per Betanews:
http://www.betanews....Microsoft/1192118748
They actually don't disagree.

What Microsoft has patched is the ShellExecute function, which didn't do proper verification. But if something can get to ShellExecute without user intervention, you have a serious problem...
6664
Living Room / Re: Use video RAM as a swap disk?
« Last post by f0dder on October 13, 2007, 04:39 PM »
You're interpreting the results wrongly - your bottleneck is certainly the user<>kernel switching, very evident with such a high kernel usage in the first test. The filesystem caches are efficient enough, but writing one byte at a time has never been a good idea :)
6665
General Software Discussion / Re: How can a BSOD ruin an mbr???
« Last post by f0dder on October 13, 2007, 04:36 PM »
Parity error sounds pretty serious, I've never seen it on any systems myself... and I only think you would see it with ECC memory?
6666
Living Room / Re: Use video RAM as a swap disk?
« Last post by f0dder on October 13, 2007, 12:14 PM »
Ah, but you are writing one char at a time. As stated previously, check your CPU usage, especially the time spent in kernelmode...
6667
Living Room / Re: Use video RAM as a swap disk?
« Last post by f0dder on October 13, 2007, 08:41 AM »
So you're writing 100kilobytes one byte at a time? Check your CPU usage during that, and make sure to "show kernel times" - you'll probably find CPU usage to be rather high, with a lot of time spent in kernel mode.

To assure that all datas have been written I included the Close() of the file in the test loop.
-Crush
Are you doing open+close for each byte? That's going to be god-awfully slow. NTFS does filesystem metadata journalling, so open+close has some overhead, including disk access...

The results show how the OS-Buffer works: It only caches the access to the HD tracks and sectors of the file not collecting the given datas intelligently!
-Crush
Nah, NTFS doesn't cache sectors, it caches file streams (remember that each file on NTFS can have multiple streams), which at least theoretically should mean a bit better performance if files are fragmented etc.

You do want to make sure you don't end up calling WriteFile (which ends up doing user<>kernel transitions) with too small buffers. I dunnoe if MFC's CFile class does caching internally, or goes directly to WriteFile. Even though you do get delay-write even if you only write one byte at a time, the user<>kernel transition costs kill you. And you also don't want to perform too many operations that require filesystem metadata journalling.

And another thing, if you know output filesize (or a guesstimate of it) before you start writing, by all means grow the file to the expected size (seek to it, setendoffile, seek to start) beforehand, makes sure your file is in as few fragments as possible, and it's a superfast operation on NTFS.
6668
General Software Discussion / Re: How can a BSOD ruin an mbr???
« Last post by f0dder on October 13, 2007, 08:07 AM »
That particular incident is one of the reasons I've only used nvidia since... too bad, since there's one point where ATI at least excels over nvidia: speed when you're using screen rotation. But then again, I hardly do that anymore :)

Oh yeah, and ATI fanboiz of course claim that it's a windows bug, because it's mentioned in a MS KB article - imho it's the ATI driver that doesn't play clean.
6669
General Software Discussion / Re: How can a BSOD ruin an mbr???
« Last post by f0dder on October 13, 2007, 06:19 AM »
BSOD means you're having problems of one form or another. It could be a hardware failure, and those can be tricky and doing all sorts of stuff. Or it could be a driver running amok and trashing whatever filesystem memory structures that might be flushed back to disk...

I've had really nasty crap happening because of ATI drivers.
6670
Living Room / Re: Use video RAM as a swap disk?
« Last post by f0dder on October 13, 2007, 06:05 AM »
I'm not familiar with those classes you mention, but from the names I reckon they're MFC - and most likely doing their own buffering... but even then, you don't want to write single bytes at a time if you can help it.

I wonder if seek() causes a cache flush - sounds likely to me. What size objects were you dealing with?

fusionIo sounds pretty sweet, but also extremely expensive :)
6671
Living Room / Re: Use video RAM as a swap disk?
« Last post by f0dder on October 12, 2007, 06:36 PM »
If SuperStupidName's MRU works decently (ie., caching your huge .avi file doesn't evict useful data), and it has safe but slightly less conservative write caching than NT's default, then it might be something I should take a look at...

I still wish somebody would do an NT version of vramdir, though :). And a hybrid drive with the flash used for registry+FS metadata. And superfast+huge+cheap solid state disks. And unicorns, wonderful unicorns.

Crush: if you're writing out one byte at a time, you're going to be dead in the water, even if you're not bypassing the OS cache. Why? user<>kernel mode switching is pretty expensive.
6672
Living Room / Re: Use video RAM as a swap disk?
« Last post by f0dder on October 12, 2007, 10:06 AM »
Why would you cache playback for a movie file? You'd need to have a really crappy disk system to have video playback be a problem (except CPU-wise, for HD content). Guess it might be a difference if you have uncompressed HD video, but then you're most likely running off high-end hardware anyway :)

Windows is by default too conservative about it's caching though, we can agree on that. I used to enabled "LargeSystemCache", but that's a reaaaaally bad idea idea to do if you're using ATI graphics drivers (or at least it used to be).

Using a ramdisk for compiles is pretty nice, too bad that ramdisks are fixed-size. For win9x there was a really cool product called "vramdir" which allowed you to create "ram folders", which were dynamically grown/shrinked, and backed up by the paging file. That was pretty damn cool and efficient, and I'm surprised I haven't seen anything like it for NT.

Performance for the current hybrid drives are probably going to be "meh" since they're all (the ones I've seen anyway) laptop drives, and those tend to be slow by defintion.

What would be really cool would be a hybrid drive with fast & enough persistant flash store - use part of it as a static read cache, basically stuff drivers + kernel + system DLL files there, as well as files that, usage-analyzed, are required during boot. Would give you pretty fast system start-up; once system is started, the OS should do it's own cache (including not paging out / discarding any drivers or parts of the kernel, memory is much faster than disk and flash).

Then, use part of the flash for write cache. But not as a generic caching layer, only do very specific stuff there - stuff that's modified often, but does need to be persisted. I'm thinking specifically the windows registry, it's modified all the bloody meaning your system disk will hardly ever spin down. Filesystem metadata should also be stored there. Perhaps also keep smaller files there, would allow you to work on smaller things like word processing and casual web browsing with your harddrive spinned down.

Even if you don't care about your disk spinning, keeping the registry and filesystem metadata almost exclusively in flash cache would mean basically no disk seeks or reads/writes when the system is idle. And especially for the FS metadata, the quantity of data isn't large, but you tend to get a bunch of seeks - and flash memory doesn't have the seek disadvantages that spinning magnetic platters have.

This could give quite nice speedups, depending on how you use your computer.

Of course this would all be obsolete if flash memory became cheap and abundant, and we could have 320gig solid-state drives at a decent price and somewhat faster transfer rates... but I don't think that's likely to happen anytime soon. So, please let us see (cheap!) hybrid drives for the desktop, with more flash than silly 256 or 512 megabytes, and on fast drives. And proper OS support... meaning XP, of course.

...okay, I think I went off on a rambling tangent :)
6673
Living Room / Re: Use video RAM as a swap disk?
« Last post by f0dder on October 12, 2007, 09:02 AM »
Windows has been hardware accelerated for quite a while btw., just not using 3D.

The flash memory of hybrid drives aren't really meant for write caching as far as I understand, but rather  as a static read cache, to speed up things like boot...
6674
Living Room / Re: Use video RAM as a swap disk?
« Last post by f0dder on October 12, 2007, 03:46 AM »
Readback from video memory is "pretty damn slow" compared to regular memory, but should still be plenty faster than disk. Even 512meg is a pretty small amount for swap, though. And you'd need to be pretty careful as to how it's done to avoid losing data.

Funny that this topic re-appears, I just saw it on slashdot yesterday - but it's been discussed several months ago already. I don't personally think video card memory is viable as swap, but there might be some merit in using it for static read-cache, like the flashmem part of the upcoming hybrid harddrives...
6675
General Software Discussion / Re: Adobe Acrobat Reader Security Vulnerability
« Last post by f0dder on October 12, 2007, 03:42 AM »
Seems Windows is the culprit , not Adobe - for this flaw
http://www.betanews...._PDF_Flaw/1192118748
SKA
If the exploit can happen just by opening a .pdf file, without clicking the link, the problem is with Adobe, not Microsoft.

Depending on ShellExecute to do URI filtering? That should be punishable by death. You just do not pass on unverified input, whether it's coming from the keyboard, a file, network, etc.
Pages: prev1 ... 262 263 264 265 266 [267] 268 269 270 271 272 ... 364next