topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday March 28, 2024, 8:51 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Last post Author Topic: Bvckup 2  (Read 59786 times)

KynloStephen66515

  • Animated Giffer in Chief
  • Honorary Member
  • Joined in 2010
  • **
  • Posts: 3,741
    • View Profile
    • Donate to Member
Re: Bvckup 2
« Reply #50 on: December 28, 2012, 03:00 PM »
* Stephen66515 may have to review this, seeing as he did the review for the first version :P

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,153
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: Bvckup 2
« Reply #51 on: January 13, 2013, 06:28 PM »
Hey, just wanted to let you know I haven't forgotten this thread :)

Finally got around to setting up the testbox again, so it should be ready for some... testing :). Might install a bit more software on it, just to have a few more files - the more the merrier. Did end up with a Raptor disk in it, though... pondering whether I should use a somewhat slower disk, but I guess I'll run some tests first. Hopefully during the coming week :)

I've tried running few cold-cache test and got numbers that are wildly different - from 131 seconds to 70 to 24. This is on a box that was shutdown, then powered on (with network connection disabled, with Windows Search and Windows Update services disabled, virtually no active tasks in the Task Manager and no resident antivirus/malware apps... so theoretically there's nothing that would actively populate disk index cache on boot). I will see what I've missed in a bit and re-run the tests.
Sounds weird that you got such discrepancies - did you manage to make it somewhat reliable? Obviously there's going to be some fluctuation, but even 74 vs 24 seconds sound... interesting.

How are you doing the threading? Threadpool of workers that fetch a path from a queue, and while processing a path add found directories to the queue and "do whatever" with files? (Guess it could be somewhat difficult controlling breadth vs depth scanning reliably that way?) Or a different level of granularity?

With regards to getting 3x speed up on a warm cache - what's not to like? :) Arguably, this is the use case for real-time backups, with an operational profile consisting largely of frequent scans of small number of selected locations.
Oh, it's not that it's not something to like, I was just thinking that as a user, the absolute time difference isn't that big - and it will be dwarfed by syncing even a few files. (I'm pretty sensitive to timing myself, so it's something I might notice - but I'm thinking regular users here).

As a programmer, I think it's a cool achivement, though... and it wasn't something I had expected - guess I should revisit that old benchmark code :). And I can't see much reason not to do the threading, at least when keeping the number of threads sane. I pondered a bit about the whole thing; there's some increase in CPU usage and thus speedstep, power consumption and heat (and potential noise from CPU fan), but I don't think any of those are realistic concerns, given the short running time.

Then there's additional memory consumption because of threads - but when sticking with a sane number of threads, that shouldn't be a problem either, neither for user- nor kernelmode memory.

So I don't see a reason to not do the threading, with the data given so far. And the 1.5x cold speedup is nice, hope I can duplicate that on my testbox! :)
- carpe noctem

apankrat

  • Supporting Member
  • Joined in 2010
  • **
  • Posts: 155
    • View Profile
    • swapped.cc
    • Donate to Member
Re: Bvckup 2
« Reply #52 on: January 14, 2013, 03:23 AM »
Sounds weird that you got such discrepancies - did you manage to make it somewhat reliable? Obviously there's going to be some fluctuation, but even 74 vs 24 seconds sound... interesting.

I did. Don't remember what exactly it was, but I disabled few more things so to prevent any voluminous disk activity on boot and it helped. The numbers I posted afterwards are consistent across several reboots.

How are you doing the threading? Threadpool of workers that fetch a path from a queue, and while processing a path add found directories to the queue and "do whatever" with files? (Guess it could be somewhat difficult controlling breadth vs depth scanning reliably that way?) Or a different level of granularity?

Yep, that's pretty much how it works, and indeed the depth/breadth order is not very strict, but it's not way too off either.

With regards to getting 3x speed up on a warm cache - what's not to like? :) Arguably, this is the use case for real-time backups, with an operational profile consisting largely of frequent scans of small number of selected locations.
Oh, it's not that it's not something to like, I was just thinking that as a user, the absolute time difference isn't that big - and it will be dwarfed by syncing even a few files. (I'm pretty sensitive to timing myself, so it's something I might notice - but I'm thinking regular users here).

I agree, but the difference between something like 2s and 500 ms is a significant one (from the usability perspective). The difference is going to be even bigger when scanning somewhat more populated disks than my wife's laptop :)

As a programmer, I think it's a cool achivement, though... and it wasn't something I had expected - guess I should revisit that old benchmark code :). And I can't see much reason not to do the threading, at least when keeping the number of threads sane. I pondered a bit about the whole thing; there's some increase in CPU usage and thus speedstep, power consumption and heat (and potential noise from CPU fan), but I don't think any of those are realistic concerns, given the short running time.

Then there's additional memory consumption because of threads - but when sticking with a sane number of threads, that shouldn't be a problem either, neither for user- nor kernelmode memory.

So I don't see a reason to not do the threading, with the data given so far. And the 1.5x cold speedup is nice, hope I can duplicate that on my testbox! :)

Looking forward to seeing what you get. In the actual app there's a config option that controls the size of a thread pool. If threading is perceived harmful, one can always turn it down or disable altogether.
Alex

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,153
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: Bvckup 2
« Reply #53 on: January 19, 2013, 02:49 PM »
Right,

I actually got around to running some benchmarks last weekend, but got sidetracked and forgot to post anything :). So far I've only run warm-cache tests - for cold-cache, I really really really want to be able to automate the process. I want to collect a lot of data sets, but I'm way too lazy to manually do all the reboots necessary :-)

First, specs:
Testbox:
   ASUS P5K-VM
   Corsair XMS2 2GB DDR2 800MHz (2x1GB)
   Intel Core2 E6550 @ 2.33GHz
   Western Digital 3.5" 74GB Raptor

Workstation:
   ASUS P8Z77-V PRO
   Corsair 16GB DDR3 1600MHz (4x8GB)
   Intel Core i7 3770 Ivy Bridge
   INTEL SSD 520 Series 120GB
   Western Digital 2.5" 300GB VelociRaptor

For the workstation, I ran the test on the VelociRaptor which is a big dump of all sorts of crap :). The testbox was freshly installed with Win7-x64 enterprise, LibreOffice 3.6.4, PiriForm Defraggler (didn't defrag it, though), Chrome, and all Windows Updates as of, well, last weekend. I furthermore copied some ~33gig of FLAC music from my server to get some meat on the filesystem - there's ~2.3gig free. The Windows partition is only ~52gig, as I didn't want to nuke the Linux test install I had on the disk - so the Windows partition starts ~18gig into the disk. Furthermore, I've disabled the following services: Defrag, Superfetch, Windows Search (hopefully turns off indexing?). Other than that, it's a pretty vanilla install, I even left the 2gig pagefile in place.

Anyway, I started by running a warmup, then I generated output files by running the following quick hackjob batch file - it does 16 identical passes of 1 to 16 threads, and both depth and breadth - so 512 totalt runs. Oh, and it also starts each pass with a single verbose run:
Spoiler
@echo off
FOR /L %%I IN (1,1,16) DO CALL :ONEROUND %%I
GOTO :EOF

:ONEROUND
SET OUTFILE=results-run-%1.txt
ECHO ********** ROUND %1, Verbose Stats for 4 threads
ECHO ********** ROUND %1, Verbose Stats for 4 threads > %OUTFILE%
bvckup2-demo2-x64.exe -t 4 -v e:\ >> %OUTFILE%

FOR /L %%I IN (1,1,16) DO CALL :ONEBENCH %1 %%I
GOTO :EOF

:ONEBENCH
ECHO ========== ROUND %1, Breadth, %2 Threads
ECHO ========== ROUND %1, Breadth, %2 Threads >> %OUTFILE%
bvckup2-demo2-x64.exe -q -t %2 --breadth-first e:\ >> %OUTFILE%

ECHO ========== ROUND %1, Depth, %2 Threads
ECHO ========== ROUND %1, Depth, %2 Threads >> %OUTFILE%
bvckup2-demo2-x64.exe -q -t %2 e:\ >> %OUTFILE%
GOTO :EOF


It would seem that the difference between depth- and breadth-first are pretty small for the warm-cache tests, and that there's not much to be gained from using more threads than CPU cores (makes sense for the warm cache scenario). It doesn't seem like there's a lot of penalty to using more threads than cores, though - but it obviously uses slightly more system resources.

I'm attaching a zip file with the raw output from the hackjob batch file, and pondering a decent way to visualize it. I guess the 16 consecutive runs should be processed into {min,max,avg,mean} values - should be easy enough to do the processing, but how to handle the rendering? Some LibreOffice spread sheet, some HTML + JavaScript charting? Got any good ideas? :)

Also, if I find a way to automate the cold-cache testing (suggestions would be very welcome!), I'll throw in stats from my old dualcore-with-SSD laptop.
- carpe noctem

apankrat

  • Supporting Member
  • Joined in 2010
  • **
  • Posts: 155
    • View Profile
    • swapped.cc
    • Donate to Member
Re: Bvckup 2
« Reply #54 on: January 28, 2013, 04:13 AM »
Apologies for not replying sooner. I was pushing out the major website redesign.

--

It would seem that the difference between depth- and breadth-first are pretty small for the warm-cache tests.

I was seeing consistent 10-15% speed up when with depth-first. Your data seems to be supporting this to a degree too. Another thing was that NOT using FindExLargeFetch in warm-cache scenario results in 20-30% speed up (from 1000 ms to 800 ms). I added a command-line argument to control this in more recent versions of bvckup2-demo2.exe

I guess the 16 consecutive runs should be processed into {min,max,avg,mean} values - should be easy enough to do the processing, but how to handle the rendering?


I'd take the 80% median (sort and then trim 10% on each end) and then average it out. Also, as a bonus, if the standard deviation is high, then there's too much volatility and so the sample is not representative. With regards to visualizing, let me see what I can do. JS + Canvas might be the simplest option.

Also, if I find a way to automate the cold-cache testing (suggestions would be very welcome!)

I think the key to unlocking this problem is called shutdown.exe :)

Alex

tomos

  • Charter Member
  • Joined in 2006
  • ***
  • Posts: 11,959
    • View Profile
    • Donate to Member
Re: Bvckup 2
« Reply #55 on: January 28, 2013, 06:27 AM »
Apologies for not replying sooner. I was pushing out the major website redesign.

very nice :Thmbsup: :D
Tom

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,153
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: Bvckup 2
« Reply #56 on: January 28, 2013, 03:41 PM »
Apologies for not replying sooner. I was pushing out the major website redesign.
No problem - I've been slow/distracted myself. Nice and simple web design - I personally find it slightly annoying that the 'features' text fades in after waiting for the features-text-are has expanded, but that's a minor quibble :). Also, the outline and bgcolor-change (I think?) effect on your buttons is very subtle - having the browser on my secondary TFT, at first I wasn't sure if my eyes were playing tricks on me :P

Btw, "Cache-aware reading" - are there actually any APIs or IOCTLs you can use to determine whether a file is cached? Or are you simply using FILE_FLAG_NO_BUFFERING all the time? :)

Also, if I find a way to automate the cold-cache testing (suggestions would be very welcome!)
I think the key to unlocking this problem is called shutdown.exe :)
Well, I'd want something entirely automated - boot, auto-login and perform test, reboot, advance to next test et cetera. I'm not going to do 512 boots unless it's something 100% automated the machine can perform while I'm at work :P (sure, could probably be handed by being pretty creative with batch files, but... ugh.)

Another thing was that NOT using FindExLargeFetch in warm-cache scenario results in 20-30% speed up (from 1000 ms to 800 ms). I added a command-line argument to control this in more recent versions of bvckup2-demo2.exe
Iirc I did the test with the version supporting this switch, but since it wasn't documented I didn't think to include it - what's it do?
- carpe noctem

apankrat

  • Supporting Member
  • Joined in 2010
  • **
  • Posts: 155
    • View Profile
    • swapped.cc
    • Donate to Member
Re: Bvckup 2
« Reply #57 on: January 29, 2013, 01:39 PM »
very nice :Thmbsup: :D
Thanks.
Alex

apankrat

  • Supporting Member
  • Joined in 2010
  • **
  • Posts: 155
    • View Profile
    • swapped.cc
    • Donate to Member
Re: Bvckup 2
« Reply #58 on: January 29, 2013, 01:46 PM »
Btw, "Cache-aware reading" - are there actually any APIs or IOCTLs you can use to determine whether a file is cached? Or are you simply using FILE_FLAG_NO_BUFFERING all the time? smiley
-f0dder

Let me push the beta out first, I will elaborate then. It's a bit more complicated that FILE_FLAG_NO_BUFFERING, but there are not IOCTLs involved. If you think about it, you should be able to figure it out ;)

Iirc I did the test with the version supporting this switch, but since it wasn't documented I didn't think to include it - what's it do?
-f0dder

I should've updated the post with -? dump, shouldn't I?

Syntax: bvckup2-demo2.exe [-t <threads>] [-v | -q] location-to-scan

  -t   number of threads to use
  -v   verbose, dump API timing profile
  -q   quiet, print only final timing

  --breadth-first    scan siblings then children
  --no-large-fetch   do NOT use FindExLargeFetch
  --no-info-basic    do NOT use FindExInfoBasic, use FindExInfoStandard instead

  Thread count defaults to the number of CPU cores if not specified.

The default is to scan depth first (children, then siblings), use FIND_FIRST_EX_LARGE_FETCH and ask only for FindExInfoBasic. The last three keys on the command-line list allow overriding these.

Based on my experiments the fastest way to scan a warm cache is to first scan the location with FindExInfoStandard and then scan with FindExInfoBasic, without LargeFetch and depth first. In my case, it cut down the scanning time from ~1000 ms to ~ 700 ms, which is significant. But that's just for the warm cache. For the cold cache the above defaults appear to be the best.
Alex

apankrat

  • Supporting Member
  • Joined in 2010
  • **
  • Posts: 155
    • View Profile
    • swapped.cc
    • Donate to Member
Re: Bvckup 2
« Reply #59 on: May 24, 2013, 02:26 PM »
This DNF of backup software is going beta next week.

I thought I'd let you guys know as I would love a good teardown :)

Some work-in-progress screenshots and whatnots are over at http://bvckup2.com/wip
Alex

Nod5

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 1,169
    • View Profile
    • Donate to Member
Re: Bvckup 2
« Reply #60 on: June 07, 2013, 01:01 PM »
I can't find a link to the beta on your site

apankrat

  • Supporting Member
  • Joined in 2010
  • **
  • Posts: 155
    • View Profile
    • swapped.cc
    • Donate to Member
Re: Bvckup 2
« Reply #61 on: June 07, 2013, 03:37 PM »
It's in closed beta at the moment. PM me if you want in. If not, it should hit public beta in a couple of weeks.
Alex

apankrat

  • Supporting Member
  • Joined in 2010
  • **
  • Posts: 155
    • View Profile
    • swapped.cc
    • Donate to Member
Re: Bvckup 2
« Reply #62 on: September 27, 2015, 03:43 PM »
Also, if I find a way to automate the cold-cache testing (suggestions would be very welcome!), I'll throw in stats from my old dualcore-with-SSD laptop.

I found a way (a while ago, I just keep forgetting to update this thread).

This is done by flushing "Standby list" of Windows Memory Manager. For example, with Sysinternals' RAMMap -

rammap.png

It can also be done with NtSetSystemInformation from within the app if needed.
Alex

BGM

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 562
    • View Profile
    • bgmCoder DC
    • Read more about this member.
    • Donate to Member
Re: Bvckup 2
« Reply #63 on: October 26, 2015, 10:09 AM »
I just wanted to say that I use this program on my server to back up all my network's windows workstations.  It is wonderful!

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,153
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: Bvckup 2
« Reply #64 on: November 11, 2015, 02:37 AM »
Also, if I find a way to automate the cold-cache testing (suggestions would be very welcome!), I'll throw in stats from my old dualcore-with-SSD laptop.
I found a way (a while ago, I just keep forgetting to update this thread).

This is done by flushing "Standby list" of Windows Memory Manager. For example, with Sysinternals' RAMMap -

It can also be done with NtSetSystemInformation from within the app if needed.
Great, thanks!

Just tested on Win8.1 x64, and it indeed works. Interesting, because "back in the day" I did a bit of searching, and the concencus seemed to be that the read cache couldn't be flushed, even through NtSetSystemInformation. Any idea when this was introduced? Has it been there a while, but just overlooked? RamMap apparently doesn't run on XP, so perhaps Vista+?

Also, seems you can flush cache for a single file by opening with FILE_FLAG_NO_BUFFERING and closing again, for all Windows version. Here's a StackOverflow post with a bit of information :)
- carpe noctem