Home | Blog | Software | Reviews and Features | Forum | Help | Donate | About us
topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • December 08, 2016, 08:06:34 PM
  • Proudly celebrating 10 years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Last post Author Topic: Use video RAM as a swap disk?  (Read 14521 times)

Ralf Maximus

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 927
    • View Profile
    • Read more about this member.
    • Donate to Member
Use video RAM as a swap disk?
« on: October 11, 2007, 10:27:18 PM »
Found this link via BoingBoing:
http://gentoo-wiki.c...n_video_card_as_swap

Briefly, it describes how to set up RAM in an old video card so that Linux can use it as a fast(?) RAMdisk.  Very interesting idea, for a number of reasons I thought of:

- video ram, espcially in newer cards, can be substantial: 512MB or 1GB;

- video ram is usually optimized to be very fast, with direct CPU access and wide data paths;

- depending on hardware, you may be able to leverage insanely fast hardware memory move/copy functions, transformations, etc.  (Though I think applying hardware texture shading to a block of data would be, erm, counterproductive...)

The article leaves it as an exercise for the user to actually implement the thing, does not delve into performance much, and warns that video memory is not ECC protected and thus may not be as stable as "real" RAM.  To this I say "pish posh", but then I like to juggle flaming chainsaws in a pool of gasoline.

What are your thoughts on this?  How would you like a Windows driver that allocated half your video RAM as a swap disk?  I mean do you really need all that stuff when you're not playing BioShock?

Could one use DirectX to get at the RAM?

How feasible/desireable is this for real world applications once you get past the "holy shiat that's cool" factor?

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,029
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #1 on: October 12, 2007, 03:46:01 AM »
Readback from video memory is "pretty damn slow" compared to regular memory, but should still be plenty faster than disk. Even 512meg is a pretty small amount for swap, though. And you'd need to be pretty careful as to how it's done to avoid losing data.

Funny that this topic re-appears, I just saw it on slashdot yesterday - but it's been discussed several months ago already. I don't personally think video card memory is viable as swap, but there might be some merit in using it for static read-cache, like the flashmem part of the upcoming hybrid harddrives...
- carpe noctem

Lashiec

  • Member
  • Joined in 2006
  • **
  • Posts: 2,374
    • View Profile
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #2 on: October 12, 2007, 08:11:16 AM »
Some points:

 1) Nobody is going to buy a graphic card with 512 or 1 GB and use it solely for swapping. Mine is 256 MB and I assure you it's used for something else ;)

 2) Even if you buy a card like that, you could also buy an extra gig, I mean, you can afford it.

 3) The desktop is going to be hardware accelerated in the future, so that almost rules out this possibility.

As a gimmick it's OK, but giving all the caveats, and the hoops you have to go through to make it work, it's not worth it. I think it's better to use hybrid hardrives, and even more as they're researching things like this
« Last Edit: October 12, 2007, 08:19:15 AM by Lashiec »

Ralf Maximus

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 927
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #3 on: October 12, 2007, 08:50:12 AM »
Agreed, if you're investing in a high-end 1GB video card, you're probably the kind of person who spends most of their computer time playing games.  I was envisioning those times when most of the RAM is idle -- like when running Windowed non-DirectX apps.

And today's 1GB monster card is tomorrow's toy.  I remember when 1MB was a big deal.  "Who'd EVAR use that much video RAM?!"

Hardware accellerated desktop, aye.  Vista's already doing some of that I believe.  So maybe 2D windows' days are numbered.

Oh well, it still tickles my geek sense when folks do stuff like this.  It reminds me of those days when memory was so precious we'd steal it from the graphics system and pray the user wasn't running in a graphics mode that would notice.

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,029
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #4 on: October 12, 2007, 09:02:30 AM »
Windows has been hardware accelerated for quite a while btw., just not using 3D.

The flash memory of hybrid drives aren't really meant for write caching as far as I understand, but rather  as a static read cache, to speed up things like boot...
- carpe noctem

Ralf Maximus

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 927
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #5 on: October 12, 2007, 09:39:30 AM »
Ach, I'd forgotten about the 2D acceleration.  But that doesn't use much RAM, does it? Nothing close to what a modern FPS utilizes, anyway.

I'm intrigued by the new hybrid drives, but the performance reviews I've seen have been "meh".  I think the real attraction there is robustness (laptops will benefit) and power savings (laptops, again).  When 100% solid state RAM drives start shipping, then I'll be interested.

As far as write-caching goes: for over a year I've been using SuperSpeed SuperCache II, otherwise known as the "really great product with a stupid name".  Basically it's a $79 set of volume filters/drivers that allows the user to designate as much system RAM as desired for caching disk activity.  It augments Windows own caching algorithyms and is completely transparent to all apps.

Performance for some things -- video playback -- is phenominal.  I've allocated 1.5GB to read-caching and most .AVI files end up completey in memory.  Compiles of large software projects also benefit by 2x or 3x.

Write-caching, when it works, is very nice.  However I've not hit upon a combination of memory allocation and write-delay that yields performance with total safety.  SuperCache's reading implementation is perfect and solid; not so the write-caching.  I've experienced mysterious system hangs and BSOD's traced back to the write-cache being overloaded.  It's possible my system is to blame (I do not have a plain vanilla Dell desktop anymore :-) but figuring it out is pretty low on my priority list, as the read-caching is totally worth it alone.

Anyway, it's not directly related to the video RAM thread but thought it worth a mention.

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,029
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #6 on: October 12, 2007, 10:06:46 AM »
Why would you cache playback for a movie file? You'd need to have a really crappy disk system to have video playback be a problem (except CPU-wise, for HD content). Guess it might be a difference if you have uncompressed HD video, but then you're most likely running off high-end hardware anyway :)

Windows is by default too conservative about it's caching though, we can agree on that. I used to enabled "LargeSystemCache", but that's a reaaaaally bad idea idea to do if you're using ATI graphics drivers (or at least it used to be).

Using a ramdisk for compiles is pretty nice, too bad that ramdisks are fixed-size. For win9x there was a really cool product called "vramdir" which allowed you to create "ram folders", which were dynamically grown/shrinked, and backed up by the paging file. That was pretty damn cool and efficient, and I'm surprised I haven't seen anything like it for NT.

Performance for the current hybrid drives are probably going to be "meh" since they're all (the ones I've seen anyway) laptop drives, and those tend to be slow by defintion.

What would be really cool would be a hybrid drive with fast & enough persistant flash store - use part of it as a static read cache, basically stuff drivers + kernel + system DLL files there, as well as files that, usage-analyzed, are required during boot. Would give you pretty fast system start-up; once system is started, the OS should do it's own cache (including not paging out / discarding any drivers or parts of the kernel, memory is much faster than disk and flash).

Then, use part of the flash for write cache. But not as a generic caching layer, only do very specific stuff there - stuff that's modified often, but does need to be persisted. I'm thinking specifically the windows registry, it's modified all the bloody meaning your system disk will hardly ever spin down. Filesystem metadata should also be stored there. Perhaps also keep smaller files there, would allow you to work on smaller things like word processing and casual web browsing with your harddrive spinned down.

Even if you don't care about your disk spinning, keeping the registry and filesystem metadata almost exclusively in flash cache would mean basically no disk seeks or reads/writes when the system is idle. And especially for the FS metadata, the quantity of data isn't large, but you tend to get a bunch of seeks - and flash memory doesn't have the seek disadvantages that spinning magnetic platters have.

This could give quite nice speedups, depending on how you use your computer.

Of course this would all be obsolete if flash memory became cheap and abundant, and we could have 320gig solid-state drives at a decent price and somewhat faster transfer rates... but I don't think that's likely to happen anytime soon. So, please let us see (cheap!) hybrid drives for the desktop, with more flash than silly 256 or 512 megabytes, and on fast drives. And proper OS support... meaning XP, of course.

...okay, I think I went off on a rambling tangent :)
- carpe noctem

Ralf Maximus

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 927
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #7 on: October 12, 2007, 10:47:26 AM »
Why would you cache playback for a movie file? You'd need to have a really crappy disk system to have video playback be a problem (except CPU-wise, for HD content). Guess it might be a difference if you have uncompressed HD video, but then you're most likely running off high-end hardware anyway :)

Heh.  It's not that I *want* to cache the whole thing, that's just what happens with my configuration.  It's eerie to see the disk spin for a bit, the video start, then no further disk activity.  On a really humongoid file.

Theoretically my Windows system files (including the registry) are being cached in RAM thanks to SuperSpeedStupidName II.  I am forced to guess because there's no easy way to interrogate the utility to see *what* it's cached.  But it utilizes a "most frequently accessed" mechanism so it makes sense.  Aaaaand, bottom line, my system is way snappier with it turned on.

Using a ramdisk for compiles is pretty nice, too bad that ramdisks are fixed-size.

I actually started the quest for faster disk access by investigating RAM disks.  Allocating even 1G (from my available 3.25G) worked fine, and for experimentation I set up some .bat files to copy my development projects to the fake disk for testing.  Upon startup it took about two minutes to move all the files (booo) but compiling was very much faster, especially when the linker kicks in and thrashes the disk.

What I found interesrting was that I could use the RAM caching utility instead, skip the RAM disk initialization, and realize the same performance boost -- all without worrying about power failures.

So that's what I'm doing now, and it's been pretty solid for > year.

Crush

  • Member
  • Joined in 2006
  • **
  • Posts: 399
  • Hello dude!
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #8 on: October 12, 2007, 06:30:55 PM »
I´ll test in the next weeks how good GFX-Memory can perform as a cache with my project compared to the "normal" memory. In the last weeks I examined the read/write behaviour of harddiscs and what can be done to get my tasks to maximum speed. The result was really shocking for me: Caching is a really fine thing if it´s done right. Especially write-Caching can boost the speed of programs extremely. As example: Writing out 1 million bytes with standard IO-write routines one after the other (no caching) took 119 Seconds. The same task with a simple homebrewn caching-system and only 64K write-cache speeded this up to 2.4 seconds! Read-Caching is much much better - no comparison to write-caching - every byte you can use for this is gold. I never thought about using GFX-Mem this way ... I´ll see.

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,029
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #9 on: October 12, 2007, 06:36:29 PM »
If SuperStupidName's MRU works decently (ie., caching your huge .avi file doesn't evict useful data), and it has safe but slightly less conservative write caching than NT's default, then it might be something I should take a look at...

I still wish somebody would do an NT version of vramdir, though :). And a hybrid drive with the flash used for registry+FS metadata. And superfast+huge+cheap solid state disks. And unicorns, wonderful unicorns.

Crush: if you're writing out one byte at a time, you're going to be dead in the water, even if you're not bypassing the OS cache. Why? user<>kernel mode switching is pretty expensive.
- carpe noctem

Crush

  • Member
  • Joined in 2006
  • **
  • Posts: 399
  • Hello dude!
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #10 on: October 13, 2007, 02:47:30 AM »
I don´t switch of the cache writing the file with Bios-interrupts or something like that. I´m using CFile and CStdioFile and I also  thought there should be a rather good OS-writecache in the background. The first time I remarked this behaviour was serializing bigger lists of simple data with CArchive and this was even slower than the normal CStdioFile::Write() function. After creating lists of 7-10 MB size I couldn´t believe that my HD (in my 3 year old HP-Laptop) is that slow. Benchmark results (with HD Tach 3) showed a throughput 35.1 MB/s maximum & 23.6 MB/s average read speed.

This wasn´t all:
My prog needed to rewrite after each object-block the size of its data for several reasons and the simpliest way seemed to be checking the filepointer before and after writing an object to calculate its length and to write it to the blockstart with seeking to the position, overwriting the placeholder and seek back to the end of the file to repeat this task. The standard seek() was extremly slow compared to seek within the writecache that only writes to the HD at overflowing or forcing it to flush.

I also think that caching data´s like sound or gfx isn´t quite reasonable. I´m working with directory/file-informations like FARR and a good caching system in the background makes searches turbofast - especially if your code is much faster than the file access. (btw: others in the FARR-threads also cried for a searchcache)

It would be a pity if the transfer rates of future drives like Fusion-io (http://www.techworld...dex.cfm?newsid=10210) that can reach 600 MB/s couldn´t be used as optimal as possible by slow or non-existing caching-systems & code.
« Last Edit: October 13, 2007, 02:57:54 AM by Crush »

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,029
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #11 on: October 13, 2007, 06:05:28 AM »
I'm not familiar with those classes you mention, but from the names I reckon they're MFC - and most likely doing their own buffering... but even then, you don't want to write single bytes at a time if you can help it.

I wonder if seek() causes a cache flush - sounds likely to me. What size objects were you dealing with?

fusionIo sounds pretty sweet, but also extremely expensive :)
- carpe noctem

Crush

  • Member
  • Joined in 2006
  • **
  • Posts: 399
  • Hello dude!
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #12 on: October 13, 2007, 08:16:07 AM »
Please keep in mind regarding to the speeds that I was talking only of 4 Megabytes and reduce the size in the following test to only 100 Kilobytes!

The objects are from 16 up to 40-50 bytes in avarage, I guess. The most parts of the class elements are 2-4 bytes in size. I now wanted to repeat the test with bigger elements and have to admit that I also wrote 4-Byte-sized integers during the upper mentioned test. Now I tested it again with
100.000 ints (4byte)
Normal: (first run 22.7 s, second run 18.7 s -> perhaps here is the OS-writebuffer slightly visible?)
Quickfile: 0.03145 s.

Then the same test with 100.000 bytes as I wanted first:
Normal: 1.) 18.46 s 2.) 18.52 s
Quickfile: 1.) 0.02432 s and 2.) 0.02442

I think you´ll look the same way as I did first. The First test is 721 times faster with write cache and the second 759 times.To assure that all datas have been written I included the Close() of the file in the test loop. I don´t use superspecial IO-Routines: My class is derived from CFile directly and only adds the write cache - so it is sure that the same routines for writing are used.

The results show how the OS-Buffer works: It only caches the access to the HD tracks and sectors of the file not collecting the given datas intelligently!

A very interesting thing is that there seems to be even more potential. I calculate a throughput of 4.111.842 Bytes/s and HD Tach 3 benchmark stated an average speed of 23.6 to 35.1 MB/s. Ok, the file-system and OS needs some time, but is it sooo much that I slowdown to 1/6 of the possible speed? The more things I try out the more I believe that most software is not optimal using the hardware.

Later tests and benches with file reading and directory structure analysis led to similar results. If there is a caching system - it doesn´t gives you the full power of access as it could be able to do! I implemented some new caching features & hacks and the overall speed is much much higher than in the beginning.

Something like FusionIO will be standard for normal users in 4-5 years or even faster.
« Last Edit: October 13, 2007, 08:21:37 AM by Crush »

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,029
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #13 on: October 13, 2007, 08:41:16 AM »
So you're writing 100kilobytes one byte at a time? Check your CPU usage during that, and make sure to "show kernel times" - you'll probably find CPU usage to be rather high, with a lot of time spent in kernel mode.

Quote from: Crush
To assure that all datas have been written I included the Close() of the file in the test loop.
Are you doing open+close for each byte? That's going to be god-awfully slow. NTFS does filesystem metadata journalling, so open+close has some overhead, including disk access...

Quote from: Crush
The results show how the OS-Buffer works: It only caches the access to the HD tracks and sectors of the file not collecting the given datas intelligently!
Nah, NTFS doesn't cache sectors, it caches file streams (remember that each file on NTFS can have multiple streams), which at least theoretically should mean a bit better performance if files are fragmented etc.

You do want to make sure you don't end up calling WriteFile (which ends up doing user<>kernel transitions) with too small buffers. I dunnoe if MFC's CFile class does caching internally, or goes directly to WriteFile. Even though you do get delay-write even if you only write one byte at a time, the user<>kernel transition costs kill you. And you also don't want to perform too many operations that require filesystem metadata journalling.

And another thing, if you know output filesize (or a guesstimate of it) before you start writing, by all means grow the file to the expected size (seek to it, setendoffile, seek to start) beforehand, makes sure your file is in as few fragments as possible, and it's a superfast operation on NTFS.
- carpe noctem

Crush

  • Member
  • Joined in 2006
  • **
  • Posts: 399
  • Hello dude!
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #14 on: October 13, 2007, 10:37:30 AM »
@f0dder
No, I´m not doing for each byte an open/close ... only 1 time open before writing the bytes and 1 close after the loop. I ment the Close()-Command is also within the time mesure loop to ensure that the datas have been written to HD. It would be silly to do something like this  :)

My system can only handle about 150 file open/close tasks a second. Writing 100.000 bytes with it would take over 11 minutes this way!

The Write() surely uses the system-caching as standard, because there are some commands at opening that can turn off the cache buffer. The files I´m creating don´t contain any alternate data streams. Turning off the cache is an extra command for the one´s using the "normal" file-functions without any knowledge.

But it was a good hint to check the filesize: I didn´t write 1.000.000 or 100.000 bytes/ints - it have been 0x1000000 and 0x100000 = 16xmore datas
This lead to a throughput of 43.115.789 Bytes/s with QuickFile ... that´s what I expected first. This also means the normal file system can only write 56802 Bytes/s what is extremly disappointing. But I can now forget the remark of the last post about never reaching the maximum throughput.

For the mentioned test I didn´t use seek() and only wrote characters as fast as possible.

Perhaps you´d like to see the main loop then you will understand that I don´t use any dirty tricks:

CFile ff(_T("E:\\x.y"), CFile::modeCreate|CFile::modeWrite|CFile::shareExclusive|CFile::typeBinary|CFile::osSequentialScan);
 char num = 0;
 
 QueryPerformanceCounter((LARGE_INTEGER*) &nc1); // I use the high resolution performance-counter

// the main loop
// oh, I see that I used hexadecimal 0x100000 = 1.048.576 Bytes, sorry  :-[ but this had no influence till now I only compared CFile & QuickFile timings
 for (int x=0; x< 0x100000; x++)
   ff.Write(&num,sizeof(num));
 ff.Close();

 QueryPerformanceCounter((LARGE_INTEGER*) &nc2); // stopping the timer to calculate the used time

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,029
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #15 on: October 13, 2007, 12:14:36 PM »
Ah, but you are writing one char at a time. As stated previously, check your CPU usage, especially the time spent in kernelmode...
- carpe noctem

Crush

  • Member
  • Joined in 2006
  • **
  • Posts: 399
  • Hello dude!
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #16 on: October 13, 2007, 01:50:00 PM »
Here´s the kernalmode usage with normal IO (CFile normal in ~10 second bench):

Results for User Mode Process BENCHMARKER.EXE (PID = 3180)

    User Time                   = 1.73% of the Elapsed Time // the time used by the program itself is very small
    Kernel Time                 = 48.11% of the Elapsed Time // a rather big kernalmode usage!!!

                                  Total      Avg. Rate
    Page Faults          ,            0,         0/sec.
    I/O Read Operations  ,            0,         0/sec.
    I/O Write Operations ,      2779646,         268322/sec.
    I/O Other Operations ,            0,         0/sec.
    I/O Read Bytes       ,            0,         0/ I/O
    I/O Write Bytes      ,      2779646,         1/ I/O // the IO-Scanner shows that the bytes are written seperately with no cache!
    I/O Other Bytes      ,            0,         0/ I/O

OutputResults: ProcessModuleCount (Including Managed-Code JITs) = 22
Percentage in the following table is based on the Total Hits for this Process

Time   196 hits, 25000 events per hit --------
 Module                                Hits   msec  %Total  Events/Sec
ntdll                                   117      10359    59 %      282363   // too much
kernel32                                 48      10359    24 %      115841   // too much
MFC80U                                   27      10359    13 %       65160   // too much
Benchmarker                               4      10359     2 %        9653 // ok


And here are the results with QuickFile (write caching in ~10 second bench)


Results for User Mode Process BENCHMARKER.EXE (PID = 4028)

    User Time                   = 19.27% of the Elapsed Time  // The main cpu time is spend for the program itself, that´s  fine
    Kernel Time                 = 1.79% of the Elapsed Time   // This is something I can accept

                                  Total      Avg. Rate
    Page Faults          ,            0,         0/sec.
    I/O Read Operations  ,            0,         0/sec.
    I/O Write Operations ,         2749,         274/sec.  // the caching leads to less hardware write actions
    I/O Other Operations ,            0,         0/sec.
    I/O Read Bytes       ,            0,         0/ I/O
    I/O Write Bytes      ,    180158464,         65536/ I/O  // here you see my standard IO-cacheblock size (0x10000)
    I/O Other Bytes      ,            0,         0/ I/O

Time   1576 hits, 25000 events per hit --------
 Module                                Hits   msec  %Total  Events/Sec
Benchmarker                             961      10015    60 %     2398901   // great! most time is spend to create the cache!
MSVCR80                                 614      10015    38 %     1532700   // I think this cpu time is used by the CFile-Class itself
ntdll                                     1      10015     0 %        2496           // this is acceptable  :D

This shows that writecache releases the kernal32 & ntdll - there is definately some kinds of caches are active in WinXP with ntfs, but its not very effective in write actions with small portions. The coders of M$ perhaps concentrated in optimizing read-caching more than writing. I´d like to know how Linux filesystems would perform in such a test.
« Last Edit: October 13, 2007, 01:53:50 PM by Crush »

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,029
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #17 on: October 13, 2007, 04:39:19 PM »
You're interpreting the results wrongly - your bottleneck is certainly the user<>kernel switching, very evident with such a high kernel usage in the first test. The filesystem caches are efficient enough, but writing one byte at a time has never been a good idea :)
- carpe noctem

Crush

  • Member
  • Joined in 2006
  • **
  • Posts: 399
  • Hello dude!
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #18 on: October 13, 2007, 06:21:15 PM »
I have no idea how to reduce the usermode<>kernalmode switching without an own buffer. Do you have a simple solution?

Nevertheless, I´ll use my own caching-system in the future and don´t trust in the "normal" filesystem too much. Let´s next see how caching with VMem will work compared to normal memory.

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,029
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #19 on: October 14, 2007, 04:21:58 AM »
I have no idea how to reduce the usermode<>kernalmode switching without an own buffer. Do you have a simple solution?

Nevertheless, I´ll use my own caching-system in the future and don´t trust in the "normal" filesystem too much. Let´s next see how caching with VMem will work compared to normal memory.
Simple solution: do larger writes - user your own buffer is one decent way to do larger writes. But you can probably find other ways to increase efficiency as well (ie., don't write an integer at a time, write out chunks of integers from your array).

Caching with video memory is going to be slower than normal memory, transfers there are a bit more expensive... especially if you're stuck with AGP instead of PCI-e, sinced AGP readbacks are slooooow.
- carpe noctem

Crush

  • Member
  • Joined in 2006
  • **
  • Posts: 399
  • Hello dude!
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #20 on: October 14, 2007, 06:08:44 AM »
I know it´s faster to write bigger chunks - that´s the principle behind writecaching, the problem is: What, if the structures / objects consist of very much small datatypes and you have a huge amount to write?

Example:

struct fileobject
{
  char type;
  char attributes;
  UINT modificationdate;
  UINT flags;
  UINT strlen;
  string filename;
} fo;

Normally you define the output in a class and write each member of the strucuture one after each other. That´s also the functionallity archive classes serialize objects simliar as this:
void Fileobject::Write(CFile & out)
{
  out.write(&type, sizeof(type));
  out.write(&attributes, sizeof(attributes));
  out.write(&modificationdate, sizeof(modificationdate));
  out.write(&flags, sizeof(flags));
  out.write(&strlen, sizeof(strlen));
  out.write(&filename, strlen);
};

Often I have filestructures with several 100.000 objects or more (especially on HDs). The only way I could imagine to save this with the normal filesystem is to write static parts of the structure clustered this way:

outputFile.Write((char*)&fo, (char*)&fo->filename - (char*)&fo);
outputFile.Write((char*)&fo->filename, fo->strlen);

It´s possible and much faster than writing something byte by byte (this was only an example to show the system lacks), but not a very fine way to save datas, isn´t it? One problem still exists: The write-commands are depending on the size of the structure. Caching still is much faster than serializing. I first did it this way and wondered why some of my results needed sometimes 10 seconds or more only for writing a few Megabytes of datas. Often the times to build up the structure in memory by reading the directory-structure of partitions took less time (analyzing the dirs the internal XP-Cache was working very fast). This was the reason why I got into benching and thinking of the IO-speeds. As I said before: If you also have to seek() after each object somewhere else to write the complete object size and seek back to the end make out of a slow donky an even slower turtle. This is unfortunately what I am forced to do because of some features that need it!
« Last Edit: October 14, 2007, 06:16:27 AM by Crush »

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,029
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #21 on: October 14, 2007, 03:23:04 PM »
If you only have simple POD types, you can serialize it all in one go if you stuff it in a struct (which comes naturally if you use the pImpl idiom) - of course there's some potential portability issues by doing this, and it won't work for non-POD types...

But even if you can do this writing and don't have to resort to member-by-member, you definitely should use a write cache to minimize user<>kernel mode transitions.

Having to seek back and forth sounds bad, can't you rearrange your data structures to avoid it? Like, instead of storing the variable-length strings, split those off to a string table and simply use an index or offset integer in the file struct...
- carpe noctem

Crush

  • Member
  • Joined in 2006
  • **
  • Posts: 399
  • Hello dude!
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #22 on: October 15, 2007, 01:51:36 AM »
It´s not possible to reference to strings at another position (this would lead to extensive file-seeking) - I want to be able to redirect directly to the files and the entries by offset to skip unneeded data and only have to access them if I want to analyze/visualize them for different conceptual reasons. My Quickfile class is preventing unnecessary slowdowns in the future - that´s enough for my needs. I only wondered why this performance problem isn´t remarked more often by others handling with big amount datas. The seeking is needed for each data block to write it´s length before (also for skipping sets faster) and to read single blocks directly from the file to memory without useless datas. Calculating the size before writing the block is nearly impossible, because additional informations can be woven into the blocks by plugins. Writing the block size also helps to rearrange/insert/cut/rebuild new files faster at changes. The Strings are the main search & sort criteria and so don´t should be outsourced in additional files or blocks to avoid many open/close/seeks.
« Last Edit: October 15, 2007, 02:00:04 AM by Crush »

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,029
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #23 on: October 15, 2007, 02:57:17 AM »
My idea was to keep the full string table in memory, this way you wouldn't have to do additional seek/read stuff. This could take a fair amount of memory, but that might be a decent sacrifice for a lot of speed, depending on what the program is for :)

You could also keep the string table on disk and only have {offset, length} pairs in memory, and access the table with a memory-mapped files - that puts you at the mercy of the windows filesystem cache and is only really suitable when you only need read access to the strings, but would use less memory permanently.

Hm, you say plugins can write additional data... unless a plugin can write substantial amounts of data (sevaral megabytes), I would make serialization write to a memory stream before even going to a file class, that way you can easily calculate sizes, and write out bug chunks without seeking back and forth.

How often is plugin data used compared to the "main structure" information? Fixed-size records are nice, you could probably keep the entire "basic" file information set completely in memory, and move plugin data chunks to another file. That would make the basic information very fast & easy to deal with...

Are you doing a file indexer, or something else? What's the typical usage scenario? How often are writes/updates done compared to reads? What kind of file numbers are you dealing with?

I really like brainstorming these kind of optimazion scenarios :)

Quote from: Crush
I only wondered why this performance problem isn´t remarked more often by others handling with big amount datas.
Some people care, other people don't... I know that some console developers care a lot :), including extensive logging in their bigfile code, so they can trace usage and read patterns, and re-organize data in the bigfiles according to this (makes a lot of difference when you're dealing with the über-slow seeks on optical media). Also means pondering a lot about data structures, finding ways to be able to read them more-or-less directly (with simple post-read fixups) instead of slow & inefficient member-by-member serializing...

There was an article on a gamedev forum by the developers from... I think it was the Commandos game, good read anyway.
- carpe noctem
« Last Edit: October 15, 2007, 03:03:32 AM by f0dder »

Crush

  • Member
  • Joined in 2006
  • **
  • Posts: 399
  • Hello dude!
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Use video RAM as a swap disk?
« Reply #24 on: October 15, 2007, 06:15:11 AM »
You´re right! I´m working on a high-speed file-indexer for extremely large amount of datas. Especially for really big and fast medias as the FusionIO! One of it´s feature is the ability to choose the way it handles the information: Seeking the datas in small parts on disc remembering their position for the results, getting the datas cached in memory or insert the real datas from the reference together in memory for further works, so that you can decide the speed/memory usage/working speed on your own. This way it is possible to search/result/browse through millions of results with rather less memory.

According to this I made this 2 threads:
http://www.donationc...dex.php?topic=7764.0
http://www.donationc...dex.php?topic=7183.0 // infos about the cataloger itself. The result list is quite old - I´ve risen the speed up to more than twice!

The plugins should be able to write as much datas as they like, but only informational that can be searched for should go into the main searchbase, others in seperated datasets (files). They are used only as often as the filetype triggers their call. Some plugins should be able to add additional temporary datas to results and database from the internet.

I personally want to Index CD/DVD/HD/Network/FTP and/or HTTP (like a webspider).

The number of files in a single dataset can get up to several 100.000 entries. Big networks could deliver several millions.

Quote
I really like brainstorming these kind of optimazion scenarios
I also like this and made really a big head about all possible optimizations that could be done. Most of the things I found forced me to compromise something that has advantages & disadvantages in several ways - I had to weight them.

« Last Edit: October 15, 2007, 06:40:39 AM by Crush »