topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday December 12, 2024, 2:51 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: Seeking experiences from people backing up relatively large personal data sets  (Read 14199 times)

JavaJones

  • Review 2.0 Designer
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 2,739
    • View Profile
    • Donate to Member
I am currently dealing with some issues with CrashPlan, the combined online and local backup service I reviewed and selected last year for my personal backup needs: https://www.donation...ex.php?topic=26224.0

One of the problems I am seeing is really high memory use, 1.5-2GB for the backup process (running as a service) at peak. It starts out lower but climbs over the course of a day or two to about that level, then hangs there, presumably as a result of performing more complex operations on the large data set, e.g. encryption, deduplication, versioning, etc.

Now until recently I've been reasonably happy with CrashPlan, but my confidence has definitely been shaken lately. I'm not seeking actual recommendations for other options just yet, but I'm starting the research process. A big part of that is trying to determine whether what I am experiencing is anywhere close to normal *considering my data backup needs*. It may simply be that I'm asking too much of the system and need to get more reasonable, hehe. So what I would love is to hear from other people who are doing fairly large backups to *online* systems, ideally with the following features/characteristics (or close to):

  • Data set at least 1TB, preferably around 2TB (my full data set is 1.9TB at present)
  • Number of files at least 1 million, ideally 1.5 million or more (I have 1.5 million files backed up at present)
  • Combined local and online backup (online backup is an important component; if you're only doing local, your info may be valuable, but it makes it not a direct comparison with CrashPlan)
  • Encryption (being done locally)
  • Deduplication being done on the backup set(s)
  • Continuous backup/file system monitoring (this is not a critical requirement as I do not absolutely need the feature, but this is the way CrashPlan runs, so it would make it most directly comparable
  • File versioning

The info I'm looking for is 1: What software are you using, 2: How often/on what schedule does it run, 3: How much data are you backing up, both in terms of number of files, and total size, 4: How much memory does the process (or processes) use at peak and on average, 5: How much CPU does the backup process use when actively backing up.

Hearing from other CrashPlan users with similar circumstances to myself would certainly be useful. It's very possible that the combination of data size, number of files, and features such as deduplication and file versioning simply make such high memory use somewhat inevitable (or a much slower backup by paging out to disk a lot more). If so, then it's time for me to think about getting rid of some features like possibly versioning (or try reducing length of version history perhaps). But I won't know until I can get some reference points as to whether this seems normal under the circumstances. Trying a bunch of different backup systems myself seems somewhat unfeasible as most would make me pay for uploading more than a fraction of my data, and online backup is a critical component of this.

Any info you can provide on your experiences would be great. Thanks!

- Oshyan

Renegade

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 13,291
  • Tell me something you don't know...
    • View Profile
    • Renegade Minds
    • Donate to Member
Not sure if my experience will be helpful, but, here goes...

1: What software are you using,
Acronis True Image
FreeNAS - Dedicated NAS box (HP Microserver)

2: How often/on what schedule does it run,
Acronis runs in non-stop mode.

3: How much data are you backing up, both in terms of number of files, and total size,
Complete system. 128 GB SSD (going to up this to 256 SSD when I have time to stick the drive in)

4: How much memory does the process (or processes) use at peak and on average,
Negligible. I never even notice it. So, I couldn't even tell you. I have 16 GB RAM in this box, so memory is rarely ever an issue.

5: How much CPU does the backup process use when actively backing up.
Again, never even notice it running. I have an AMD Phenom II X6 1090T CPU, which has a good amount of power.

Here's how my hardware stacks up in the WEI:

Screenshot - 2012-11-22 , 6_36_49 PM.png

Acronis backs up to a dedicated 2 TB external drive. This protects my system and any files I've not backed up to the FreeNAS box.

A large amount of my storage is on the FreeNAS box. It has 4 drives in RAID 5. (Not optimal, but whatever - it works.)

I periodically *MOVE* files from my system to the FreeNAS box. They have RAID redundancy there, and I also have the 2 TB backup as well for the entire system and any files I've not backed up to the FreeNAS.

Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker
« Last Edit: November 22, 2012, 01:43 AM by Renegade, Reason: typo »

Jibz

  • Developer
  • Joined in 2005
  • ***
  • Posts: 1,187
    • View Profile
    • Donate to Member
I don't know much about the specifics of CrashPlan, but lets try a little back of the envelope math :-*.

1.5 million files, an average path length of 64 (given unicode paths), a 128-bit hash, date/time, attributes, and a little room for other bookkeeping -- lets say 128 bytes per file, that's around 200 MB for the file list.

While 128 bytes per file could be on the low side, this does not really look like the cause.

2 TB of data, CrashPlan does data de-duplication at the block level, let's guess 16k blocks, that's 128 million blocks. Presumably we need to store a hash and some kind of ID for each block -- lets say 16 bytes per block, that's 2 GB of data.

If the program holds all of that in memory while doing the backup, I guess that could explain the memory usage you are seeing.

If you have all your data in one large backup set, you could try dividing it up into multiple (for instance, I have one set for my images, which is fairly large, but only runs once a day, and a couple for other stuff like my user profile and work folder, which run every 15 minutes). But since it looks like the de-duplication is done across one entire machine, this might not help that much. It could be that data sets of the size you have are rare enough that they prioritize speed over memory usage.

JavaJones

  • Review 2.0 Designer
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 2,739
    • View Profile
    • Donate to Member
Thanks Renegade. Unfortunately with that little data (relatively speaking), it's not a direct comparison. I also have a very beefy machine, actually a bit beefier than yours. ;) And have 16GB of RAM. Most of the time I can "spare" 2GB for my backup process, it just seems like I shouldn't have to. However...

Jibz, I appreciate the angle you took, and it was something I was thinking about as well but didn't really know how to quantify. From your "back of the napkin" calculations indeed the memory use could be justifiable for deduplication. I'm kind of tempted to disable that if I can and see what happens. I do have 2 separate backup sets, 1 for photos (like you, though I haven't changed its frequency of backup, and maybe I should), and one for everything else. The photos are by far the largest backup set, about 2/3s of the data.

So, I'll try to tweak a few things, but would still love to hear some feedback from others with similar backup needs/scenarios, especially anyone using one of the other "unlimited" online backup services with 1+TB of data, e.g. Carbonite, Backblaze, etc.

- Oshyan

Renegade

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 13,291
  • Tell me something you don't know...
    • View Profile
    • Renegade Minds
    • Donate to Member
Thanks Renegade. Unfortunately with that little data (relatively speaking), it's not a direct comparison. I also have a very beefy machine, actually a bit beefier than yours. ;) And have 16GB of RAM. Most of the time I can "spare" 2GB for my backup process, it just seems like I shouldn't have to. However...

Yeah, 128 GB isn't really a lot for backups. All the heavy-lifting is from the NAS. It's just easier for me to have RAID being the "backup" system. It's a very different approach to backups than having a software solution.
Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker

Shades

  • Member
  • Joined in 2006
  • **
  • Posts: 2,939
    • View Profile
    • Donate to Member
As far as I know Windows doesn´t allocate more than 2GByte of RAM to any process. Unless you have specially build executables (Large Address Aware) and boot your Windows system with some extra parameters.

Been there, done that, don´t recommend these tricks to anyone, unless they absolutely need it. Your system will become noticably slower after a while and will keep detoriating until it becomes too slow. Ah well, serves you right for surpassing the Windows memory manager.

Tested this setup once with Excel (2003 till 2010) in a scripting environment and I was baffled how (any) Excel is able to suck all resources from a 8GByte i7 so quickly.

JavaJones

  • Review 2.0 Designer
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 2,739
    • View Profile
    • Donate to Member
I'm running a 64 bit version of Windows 7. Provided the application I'm running is 64 bit, it can allocate as much memory as I have available. I don't recall whether I'm running 64 bit Java (CrashPlan is programmed in Java, unfortunately), nor whether CrashPlan itself would need to be specifically programmed to take advantage of 64 bit memory space or if simply running in 64 bit Java would do the trick (I'm guessing the latter). But in any case my memory limit in the config files is 2048MB, and it's not going over that.

- Oshyan

TaoPhoenix

  • Supporting Member
  • Joined in 2011
  • **
  • Posts: 4,642
    • View Profile
    • Donate to Member
I just recently backed up about 150 gigs by hand, over a few days. I kept getting "file name too long" errors and since it was low importance  data I just skipped those pieces. Do these programs copy "filename too long" files?

Shades

  • Member
  • Joined in 2006
  • **
  • Posts: 2,939
    • View Profile
    • Donate to Member
Nope, Windows won't allow any process to consume more than 2 GByte. Believe me, I got a 64 GByte RAM computer that also has 64 processors to its knees, because of that.

Doesn't matter if the OS and/or the application is 32-bit or 64-bit. The Windows memory manager won't let you..unless you step in and take over from it. See KB article 833721 for startup parameter '/3GB'. Still, you are only allowed to consume to a maximum of 3 GByte RAM per process.

Only after enabling that startup parameter on a 64-bit OS and having a 64-bit compiled application you can go over the 3 GByte limit to go to a 4 GByte limit. Expect to run your system into the ground sooner than later though.

As far as I know both the 32-bit and 64-bit Windows memory manager sees that there is 4 GByte of RAM, even if your PC has less physical RAM. Whatever is not there, will be delegated to the hard disk (swap-file). 2GByte of this RAM is for non-Windows processes only, the other 2GByte is also open to Windows kernel processes.

With the /3G parameter you limit the Windows kernel to just 1 GByte and whatever non-Windows process is allowed to use 3 GByte. Nether you or the Windows memory manager is going to be pleased with this. So, if you have an application that consumes 2 GByte, that application is clearly doing something so wrong it doesn't deserve a place on your hard disk in the first place.

Yes Excel, I'm looking at you...

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5,644
    • View Profile
    • Donate to Member
Only after enabling that startup parameter on a 64-bit OS and having a 64-bit compiled application you can go over the 3 GByte limit to go to a 4 GByte limit. Expect to run your system into the ground sooner than later though.

As far as I know both the 32-bit and 64-bit Windows memory manager sees that there is 4 GByte of RAM, even if your PC has less physical RAM. Whatever is not there, will be delegated to the hard disk (swap-file). 2GByte of this RAM is for non-Windows processes only, the other 2GByte is also open to Windows kernel processes.

With the /3G parameter you limit the Windows kernel to just 1 GByte and whatever non-Windows process is allowed to use 3 GByte. Nether you or the Windows memory manager is going to be pleased with this. So, if you have an application that consumes 2 GByte, that application is clearly doing something so wrong it doesn't deserve a place on your hard disk in the first place.

From Comparison of 32-bit and 64-bit memory architecture for 64-bit editions of Windows XP and Windows Server 2003

System PTEs
A pool of system Page Table Entries (PTEs) that is used to map system pages such as I/O space, Kernel stacks, and memory descriptor lists. 64-bit programs use a 16-terabyte tuning model (8 terabytes User and 8 terabytes Kernel). 32-bit programs still use the 4-GB tuning model (2 GB User and 2 GB Kernel). This means that 32-bit processes that run on 64-bit versions of Windows run in a 4-GB tuning model (2 GB User and 2GB Kernel). 64-bit versions of Windows do not support the use of the /3GB switch in the boot options. Theoretically, a 64-bit pointer could address up to 16 exabytes. 64-bit versions of Windows have currently implemented up to 16 terabytes of address space.

From RAM allocation for applications in Windows 7 x64/ Where is the 3GB switch for x86 apps?

Memory allocation is set automatically.

If a 32-bit application is compiled with the IMAGE_FILE_LARGE_ADDRESS_AWARE switch set it is allocated a 4GB address space in 64-bit Windows. If not, it is allocated 2GB.

For 64-bit applications, if IMAGE_FILE_LARGE_ADDRESS_AWARE is set when compiled - the default is set - it can use up to 8TB. If IMAGE_FILE_LARGE_ADDRESS_AWARE is cleared it can use up to 2GB.

Shades

  • Member
  • Joined in 2006
  • **
  • Posts: 2,939
    • View Profile
    • Donate to Member
64-bit compilers are default set to enable large address aware. OK, a hiatus is filled. VS2010 and higher most likely, I'm sure. The CodeGear RAD Studio (C++) version I have to work with doesn't even come as a 64-bit compiler, so all limits still apply  :( 


superboyac

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 6,347
    • View Profile
    • Donate to Member
Well, I've been planning forever to build myself some kind of file server for backing up all my stuff.  The total amount to be backed up may be anywhere from 4-10 TB, and I will be using double or triple redundancy.  I've posted here all over the place about it.  Eventually when I figure it out, i will post a schematic diagram here of how the whole thing works.

Here's a thread about it with lots of good contributions:
https://www.donation...ex.php?topic=20801.0

JavaJones

  • Review 2.0 Designer
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 2,739
    • View Profile
    • Donate to Member
As far as I know almost everything you just said about memory limits in modern Windows is wrong. :huh:

I *routinely* run applications using more than 4GB of RAM (nevermind 2GB). I do *not* have the "/3gb switch" enabled (that was for old 32 bit windows). 64 bit applications are not the ones that need "/largeaddressaware" compile flag, it's for 32 bit apps that want to access more than 2GB (but no more than 4GB). On 32 bit Windows OSs, 32 bit apps compiled with /largeaddressaware can use up to 3GB of memory *when the /3gb switch is enabled*. On 64 bit Windows OSs, 32 bit apps using /largeaddressaware can use up to 4GB. 64 bit apps can use a huge amount more memory than 4GB.
http://blogs.msdn.co...05/06/01/423817.aspx

Edit: I see 4wd essentially beat me to it. :D

- Oshyan

Shades

  • Member
  • Joined in 2006
  • **
  • Posts: 2,939
    • View Profile
    • Donate to Member
I don't run 64-bit, because a) compiler doesn't support it and b) i was asked to by the maintainers of the 64 processor system I mentioned earlier. They run a optimized Windows 2008 server 64-bit version with my 32-bit application.

So if I am wrong there, so be it, my bad. from the links that 4wd informed me of, it states that 64-bit applications are compiled with the /LargeAddressAware setting by default enabled.

I would not be contesting your opinions as I do, if I hadn't seen it fail with my own eyes. Let me explain a bit.

The software I work on needs to calculate a lot and has to it really fast, serious money is involved (buying, selling auctioning, predicting and billing of energy usage on time) and is vital for the continuity of the Dutch, German, Belgium and (small part of) Great Britain energy grid.

It runs as expected/desired in 32-bit environments using plain/basic Microsoft and Oracle API's. With this software it is possible to create services that each can contain several scripted processes (yes, this software has its own scripting language) for automatic generation and processing of EDINE, EDIGAS, EDIEL type of messages through Exchange, POP3 or SOAP (XML), generating and processing of reports created in HTML, Excel, CSV, TXT, XML etc.

Because of the huge datasets that this 64 processor system has to pull/store into database schemas this limit of 2GByte per process is very, very real. And with my own eyes I saw that system f.ck up majorly because of that limit. And with Large AddressAware setting it still f.cked up, but now at 3GByte. Hence my assumption that the same misery of 32-bit limitations also was implemented in 64-bit. I didn't bother to do the research, sue me.

However, according to the online documentation you and 4wd pointed me to, I was wrong, so thanks for filling in that gap.   

However, since most of the companies that make use of this software are (forced to) upgrading to newer server OS's and more misery is my part. there was even an error message from the 2008 R2 OS that it lost connection to the hard disk (assuming it developed an error) because of me accessing a database on that hard disk in after 2 hours in on an onslaught of SOAP calls.

Needless to say, that 500GByte database was lost, Oracle was not able to repair itself anymore and one weekend more was spent reuploading that sucker. Anyway, I needed to continue my test, so I pulled my old W2003 Oracle server that contained an older version of that database and guess what, it handled the onslaught with ease for 5 days straight.

To me, 64-bit OS's may be nice in theory, in practice it is job security for IT and not so much for serious work. Believe whatever kool-aid you like/read, I will do the same and we will be happier for it.  :)

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,153
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Shades, LargeAddressAware does give you a full 4GB address space to play with on a 64bit OS, but that address space has to be used for more stuff than heap memory, not least limited to DLLs. You say your app is huge and you're dealing with Oracle databases? There's likely a large chunk of memory ripped out already there.

Also: the company I work for have a bunch of hardcore Oracle DBAs, and seeing the stuff that's on our internal mailing list - including bugs as well as license prices and the way whOracle deals with customers, and how fragile it seems to be? I'm surprised anybody is willing to touch that product.
- carpe noctem

Shades

  • Member
  • Joined in 2006
  • **
  • Posts: 2,939
    • View Profile
    • Donate to Member
Most days I actually like the Oracle products (so, you see I am delusional). But there are also days that I can curse them into hell, because of a stupidity that . You are definitely right about the business practices of Oracle, I think they actually believe that when you install their software you are "pimping" your PC.

At the time (about 10 years ago) there was a test done with this software and several databases (MSSQL 2000, Oracle 9, Firebird ?, PostgreSQL) and Oracle 9 came out as the best overall option. Never looked back since, to be honest, just kept upgrading Oracle.

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,629
    • View Profile
    • Donate to Member
just kept upgrading Oracle.
That's what they are hoping for all their customers to do :tellme:

Sounds like the motto of a certain PC/OS maker that also sells quite a lot of smartphones, using a fruit for a logo... :-[

communityfair

  • Participant
  • Joined in 2012
  • *
  • Posts: 3
    • View Profile
    • Donate to Member
Great collection of information, I am glad to find your article. Keep sharing such type of information in future too.

JavaJones

  • Review 2.0 Designer
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 2,739
    • View Profile
    • Donate to Member
I finally had to let go of CrashPlan due to the memory use issue, primarily. I am now using iDrive. Details if you're interested:
https://www.donation....msg381832#msg381832

- Oshyan