Bvckup 2

ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Other Software > Announce Your Software/Service/Product

Bvckup 2

<< < (11/13) > >>

KynloStephen66515:
* Stephen66515 may have to review this, seeing as he did the review for the first version :P

f0dder:
Hey, just wanted to let you know I haven't forgotten this thread :)

Finally got around to setting up the testbox again, so it should be ready for some... testing :). Might install a bit more software on it, just to have a few more files - the more the merrier. Did end up with a Raptor disk in it, though... pondering whether I should use a somewhat slower disk, but I guess I'll run some tests first. Hopefully during the coming week :)

I've tried running few cold-cache test and got numbers that are wildly different - from 131 seconds to 70 to 24. This is on a box that was shutdown, then powered on (with network connection disabled, with Windows Search and Windows Update services disabled, virtually no active tasks in the Task Manager and no resident antivirus/malware apps... so theoretically there's nothing that would actively populate disk index cache on boot). I will see what I've missed in a bit and re-run the tests.-apankrat (December 27, 2012, 12:10 PM)
--- End quote ---
Sounds weird that you got such discrepancies - did you manage to make it somewhat reliable? Obviously there's going to be some fluctuation, but even 74 vs 24 seconds sound... interesting.

How are you doing the threading? Threadpool of workers that fetch a path from a queue, and while processing a path add found directories to the queue and "do whatever" with files? (Guess it could be somewhat difficult controlling breadth vs depth scanning reliably that way?) Or a different level of granularity?

With regards to getting 3x speed up on a warm cache - what's not to like? :) Arguably, this is the use case for real-time backups, with an operational profile consisting largely of frequent scans of small number of selected locations.-apankrat (December 27, 2012, 12:10 PM)
--- End quote ---
Oh, it's not that it's not something to like, I was just thinking that as a user, the absolute time difference isn't that big - and it will be dwarfed by syncing even a few files. (I'm pretty sensitive to timing myself, so it's something I might notice - but I'm thinking regular users here).

As a programmer, I think it's a cool achivement, though... and it wasn't something I had expected - guess I should revisit that old benchmark code :). And I can't see much reason not to do the threading, at least when keeping the number of threads sane. I pondered a bit about the whole thing; there's some increase in CPU usage and thus speedstep, power consumption and heat (and potential noise from CPU fan), but I don't think any of those are realistic concerns, given the short running time.

Then there's additional memory consumption because of threads - but when sticking with a sane number of threads, that shouldn't be a problem either, neither for user- nor kernelmode memory.

So I don't see a reason to not do the threading, with the data given so far. And the 1.5x cold speedup is nice, hope I can duplicate that on my testbox! :)

apankrat:
Sounds weird that you got such discrepancies - did you manage to make it somewhat reliable? Obviously there's going to be some fluctuation, but even 74 vs 24 seconds sound... interesting.-f0dder (January 13, 2013, 06:28 PM)
--- End quote ---

I did. Don't remember what exactly it was, but I disabled few more things so to prevent any voluminous disk activity on boot and it helped. The numbers I posted afterwards are consistent across several reboots.

How are you doing the threading? Threadpool of workers that fetch a path from a queue, and while processing a path add found directories to the queue and "do whatever" with files? (Guess it could be somewhat difficult controlling breadth vs depth scanning reliably that way?) Or a different level of granularity?
--- End quote ---

Yep, that's pretty much how it works, and indeed the depth/breadth order is not very strict, but it's not way too off either.

With regards to getting 3x speed up on a warm cache - what's not to like? :) Arguably, this is the use case for real-time backups, with an operational profile consisting largely of frequent scans of small number of selected locations.-apankrat (December 27, 2012, 12:10 PM)
--- End quote ---
Oh, it's not that it's not something to like, I was just thinking that as a user, the absolute time difference isn't that big - and it will be dwarfed by syncing even a few files. (I'm pretty sensitive to timing myself, so it's something I might notice - but I'm thinking regular users here).
--- End quote ---

I agree, but the difference between something like 2s and 500 ms is a significant one (from the usability perspective). The difference is going to be even bigger when scanning somewhat more populated disks than my wife's laptop :)

As a programmer, I think it's a cool achivement, though... and it wasn't something I had expected - guess I should revisit that old benchmark code :). And I can't see much reason not to do the threading, at least when keeping the number of threads sane. I pondered a bit about the whole thing; there's some increase in CPU usage and thus speedstep, power consumption and heat (and potential noise from CPU fan), but I don't think any of those are realistic concerns, given the short running time.

Then there's additional memory consumption because of threads - but when sticking with a sane number of threads, that shouldn't be a problem either, neither for user- nor kernelmode memory.

So I don't see a reason to not do the threading, with the data given so far. And the 1.5x cold speedup is nice, hope I can duplicate that on my testbox! :)
--- End quote ---

Looking forward to seeing what you get. In the actual app there's a config option that controls the size of a thread pool. If threading is perceived harmful, one can always turn it down or disable altogether.

f0dder:
Right,

I actually got around to running some benchmarks last weekend, but got sidetracked and forgot to post anything :). So far I've only run warm-cache tests - for cold-cache, I really really really want to be able to automate the process. I want to collect a lot of data sets, but I'm way too lazy to manually do all the reboots necessary :-)

First, specs:
Testbox:
   ASUS P5K-VM
   Corsair XMS2 2GB DDR2 800MHz (2x1GB)
   Intel Core2 E6550 @ 2.33GHz
   Western Digital 3.5" 74GB Raptor

Workstation:
   ASUS P8Z77-V PRO
   Corsair 16GB DDR3 1600MHz (4x8GB)
   Intel Core i7 3770 Ivy Bridge
   INTEL SSD 520 Series 120GB
   Western Digital 2.5" 300GB VelociRaptor

For the workstation, I ran the test on the VelociRaptor which is a big dump of all sorts of crap :). The testbox was freshly installed with Win7-x64 enterprise, LibreOffice 3.6.4, PiriForm Defraggler (didn't defrag it, though), Chrome, and all Windows Updates as of, well, last weekend. I furthermore copied some ~33gig of FLAC music from my server to get some meat on the filesystem - there's ~2.3gig free. The Windows partition is only ~52gig, as I didn't want to nuke the Linux test install I had on the disk - so the Windows partition starts ~18gig into the disk. Furthermore, I've disabled the following services: Defrag, Superfetch, Windows Search (hopefully turns off indexing?). Other than that, it's a pretty vanilla install, I even left the 2gig pagefile in place.

Anyway, I started by running a warmup, then I generated output files by running the following quick hackjob batch file - it does 16 identical passes of 1 to 16 threads, and both depth and breadth - so 512 totalt runs. Oh, and it also starts each pass with a single verbose run:
Spoiler
--- ---@echo off
FOR /L %%I IN (1,1,16) DO CALL :ONEROUND %%I
GOTO :EOF

:ONEROUND
SET OUTFILE=results-run-%1.txt
ECHO ********** ROUND %1, Verbose Stats for 4 threads
ECHO ********** ROUND %1, Verbose Stats for 4 threads > %OUTFILE%
bvckup2-demo2-x64.exe -t 4 -v e:\ >> %OUTFILE%

FOR /L %%I IN (1,1,16) DO CALL :ONEBENCH %1 %%I
GOTO :EOF

:ONEBENCH
ECHO ========== ROUND %1, Breadth, %2 Threads
ECHO ========== ROUND %1, Breadth, %2 Threads >> %OUTFILE%
bvckup2-demo2-x64.exe -q -t %2 --breadth-first e:\ >> %OUTFILE%

ECHO ========== ROUND %1, Depth, %2 Threads
ECHO ========== ROUND %1, Depth, %2 Threads >> %OUTFILE%
bvckup2-demo2-x64.exe -q -t %2 e:\ >> %OUTFILE%
GOTO :EOF

It would seem that the difference between depth- and breadth-first are pretty small for the warm-cache tests, and that there's not much to be gained from using more threads than CPU cores (makes sense for the warm cache scenario). It doesn't seem like there's a lot of penalty to using more threads than cores, though - but it obviously uses slightly more system resources.

I'm attaching a zip file with the raw output from the hackjob batch file, and pondering a decent way to visualize it. I guess the 16 consecutive runs should be processed into {min,max,avg,mean} values - should be easy enough to do the processing, but how to handle the rendering? Some LibreOffice spread sheet, some HTML + JavaScript charting? Got any good ideas? :)

Also, if I find a way to automate the cold-cache testing (suggestions would be very welcome!), I'll throw in stats from my old dualcore-with-SSD laptop.

apankrat:
Apologies for not replying sooner. I was pushing out the major website redesign.

--

It would seem that the difference between depth- and breadth-first are pretty small for the warm-cache tests.
--- End quote ---

I was seeing consistent 10-15% speed up when with depth-first. Your data seems to be supporting this to a degree too. Another thing was that NOT using FindExLargeFetch in warm-cache scenario results in 20-30% speed up (from 1000 ms to 800 ms). I added a command-line argument to control this in more recent versions of bvckup2-demo2.exe

I guess the 16 consecutive runs should be processed into {min,max,avg,mean} values - should be easy enough to do the processing, but how to handle the rendering?
--- End quote ---

I'd take the 80% median (sort and then trim 10% on each end) and then average it out. Also, as a bonus, if the standard deviation is high, then there's too much volatility and so the sample is not representative. With regards to visualizing, let me see what I can do. JS + Canvas might be the simplest option.

Also, if I find a way to automate the cold-cache testing (suggestions would be very welcome!)
--- End quote ---

I think the key to unlocking this problem is called shutdown.exe :)

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version