Other Software > Announce Your Software/Service/Product
Bvckup 2
Deozaan:
Bvvckup! :P
f0dder:
Have a little faith, will you? :)-apankrat (December 19, 2012, 03:50 PM)
--- End quote ---
*Grin* :)
Hope you don't take my posts as grumpy-old-man. I'm just interested in these things, and some of what you're syaing sounds weird compared to my own experiences. But I can handle being proved wrong, and always like learning new stuff ;)
(Also, I've been spending quite some time looking at backup software lately - pretty much everything sucks in one way or another. Closest I've come yet are Genie Timeline which was kinda nice but had bugs and shortcomings, and Crashplan which does some of the stuff GTL sucked at better, but has it's own problems - *sigh*.)
Hm, you might have a point wrt. warm cache querying - but have you tested the code across several OSes, especially pre-Vista? That's when Microsoft started doing a lot of work on lock-free data structures and algorithms in the kernel. Have you tested on XP and below?
This is with warmed up cache. C:\ was scanned in full immediately before this test. Interestingly enough, playing with the order, in which sub-directories are queued for scanning, can speed things up by additional 5-10%:-apankrat (December 19, 2012, 03:50 PM)
--- End quote ---
Hrm, last I played with different scanning techniques was back on XP - that's some years ago, which also means quite slower hardware. I tested NTFS, FAT32 and even ISO9660 (on a physical CD, since that's the slowest seek-speed I had available). I tried depth- vs. breadth-first, tried eliminiating SetCurrentDirectory calls since that'd mean less user<>kernel transitions (and I had hoped CWD wouldn't change, but it did - FindFirstFile probably changes directory internally), spent some effort on making the traversal non-recursive and eliminating as many memory allocations as possible... and nothing really did much of a difference. Was hellish doing cold-boots between each and every benchmark :)
Can't remember if that was before or after I got a Raptor disk - so it might have been on hardware before NCQ got commonplace, and it was definitely on XP. Still, even with NCQ, it's my experience that you don't need a lot of active streams before performance dies with mechanical disks. For SSDs, the story is entirely different, though - there, on some models, a moderate queue depth can be necessary to reach full performance. So a cold-scan on an SSD might benefit from multiple threads - I'd be surprised if a mechanical disk did, though!
Got any benchmark code you're willing to share? I'd be interested in trying it out on my own system, I'm afraid I didn't keep the stuff I wrote back then (and there were no threaded versions anyway).
there will always be time when the OS is not doing anything for us, because our app is busy copying what it got from the OS into its own data structures. So if we have 2+ threads pulling at the API, it eliminates these idle OS times.-apankrat (December 19, 2012, 03:50 PM)
--- End quote ---
It's my understanding that what you're generally waiting for when traversing the filesystme is disk I/O - the CPU overhead of data structure copying and user<>kernel switches should be entirely dwarfed compared to the I/O. Which is why I'm surprised you say multiple threads help when there's a mechanical disk involved. I'd like to verify myself - and I'd like even more if somebody can find a good explanation :D
Thirdly, the problem of the OS sitting idle becomes even more pronounced when you do an over-the-network scan.-apankrat (December 19, 2012, 03:50 PM)
--- End quote ---
That's a part I'm fully convinced you're right, without seeing benchmarks :) - there's indeed quite some latency even on a LAN, and the SMB/CIFS protocol sucks.
I got the point re: marketing speak though. I will try and back it up with the graphs :)-apankrat (December 19, 2012, 03:50 PM)
--- End quote ---
Please also change the wording, though :P - even with graphs, the sentence is still suspicious. I'm too tired at the moment to come up with something better that isn't going to confuse normal people, though :)
With regards to the MFT/USN - I really don't want to descent to that level. I considered using USN, for example, for move detection and it is - basically - a support hell. As much as I love troubleshooting NTFS nuances, this is just not my cup of tea.-apankrat (December 19, 2012, 03:50 PM)
--- End quote ---
It's a nastily low level to be operating at - and it definitely shouldn't be the only scanning available, since it might break anytime in the future. I'm also not sure MFT scanning is the best fit for a backup program, it's my understanding you pretty much have to read it in it's entirety (possibly lots of memory use, constructing a larger in-memory graph than necessary, or spending CPU on pruning items you're not interested in?) - but g'darnit it's fast. WizTree can scan my entire source partition in a fraction of the time just part of it can be traversed via API calls...
USN is tricky getting right, and I haven't had time to play enough with it myself. But IMHO the speed benefits should make it worth it. Without USN parsing, after (re)starting the backup program, you have to do complete traversal of all paths in the backup set. It's quite a lot faster simply scanning the USN logs and picking up changes - but yes, complex.
---
What about Symlinks, Hardlinks and Junctions? Do you handle those correctly, and have they given you much headache? :)
apankrat:
Hm, you might have a point wrt. warm cache querying - but have you tested the code across several OSes, especially pre-Vista? That's when Microsoft started doing a lot of work on lock-free data structures and algorithms in the kernel. Have you tested on XP and below?-f0dder (December 19, 2012, 06:01 PM)
--- End quote ---
I tested on XP, but I foolishly lent my copy of Win 3.11 to someone so 'm afraid it's going to stay just the XP for now.
I tried depth- vs. breadth--apankrat (December 19, 2012, 03:50 PM)
--- End quote ---
Consistent 10% difference :)
Got any benchmark code you're willing to share? I'd be interested in trying it out on my own system, I'm afraid I didn't keep the stuff I wrote back then (and there were no threaded versions anyway).
--- End quote ---
Will do in a bit. I assume the command-line version is OK?
What about Symlinks, Hardlinks and Junctions? Do you handle those correctly, and have they given you much headache? :)
--- End quote ---
You bet they did, but I'd like to think I have them sorted out. See here.
I skipped the hardlinks though. That'd be chasing a very far end of the tail of the demand curve, I just don't have time for this now.
f0dder:
I tested on XP, but I foolishly lent my copy of Win 3.11 to someone so 'm afraid it's going to stay just the XP for now.-apankrat (December 20, 2012, 01:36 PM)
--- End quote ---
*big grin* - good luck running win3.x on modern hardware, too :P. Except for fSekrit, I personally wouldn't bother supporting anything lower than XP these days. But there's still a fair amount of people on that system, for various reasons... if writing LEAN_AND_MEAN software, there's a fair amount of people who'll appreciate XP support :)
I tried depth- vs. breadth--f0dder (December 19, 2012, 06:01 PM)
--- End quote ---
Consistent 10% difference :)-apankrat (December 19, 2012, 03:50 PM)
--- End quote ---
Hmm! Also with a single-threaded scan? I really didn't see any noticable performance difference, which confused me - I would have supposed breadth-first to be faster (unless my hazy overview of MFT is wrong). Perhaps I simply got the code wrong?
Got any benchmark code you're willing to share? I'd be interested in trying it out on my own system, I'm afraid I didn't keep the stuff I wrote back then (and there were no threaded versions anyway).-f0dder (December 19, 2012, 06:01 PM)
--- End quote ---
Will do in a bit. I assume the command-line version is OK?-apankrat (December 19, 2012, 03:50 PM)
--- End quote ---
Sure thing. Would be nice with some source as well, but I can understand if it'll be too time-consuming to remove dependencies on code you want to keep private :)
What about Symlinks, Hardlinks and Junctions? Do you handle those correctly, and have they given you much headache? :)-f0dder (December 19, 2012, 06:01 PM)
--- End quote ---
You bet they did, but I'd like to think I have them sorted out. See here. I skipped the hardlinks though. That'd be chasing a very far end of the tail of the demand curve, I just don't have time for this now.-apankrat (December 19, 2012, 03:50 PM)
--- End quote ---
Seems like a sane enough scheme to handle things. I'm not sure there's a one-size-fits-all solution for this, anyway - and hardlink handlink is another headache. Not sure if I'd prefer to have, say, block de-duplication handle it. Would probably be less processing time to have specific hardlink support, and would be needed for proper restore - ugh :). But that's obviously outside the scope of what bvkup is designed for!
Bvvckup! :P-Deozaan (December 19, 2012, 04:21 PM)
--- End quote ---
I like that - reminds me of Gobliiinsw though in (sane) reverse :P
apankrat:
Thanks for those who submitted the error reports. There were two problems (one with the demo throwing an exception on older XP boxes - worked out to be an incorrect API documentation, and the other one was with UAC-less setups - the program didn't realize it had full admin rights as is). Both are fixed now, thanks a lot for helping. Please re-test at will if still interested.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version