ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

Seeking experiences from people backing up relatively large personal data sets

(1/4) > >>

JavaJones:
I am currently dealing with some issues with CrashPlan, the combined online and local backup service I reviewed and selected last year for my personal backup needs: https://www.donationcoder.com/forum/index.php?topic=26224.0

One of the problems I am seeing is really high memory use, 1.5-2GB for the backup process (running as a service) at peak. It starts out lower but climbs over the course of a day or two to about that level, then hangs there, presumably as a result of performing more complex operations on the large data set, e.g. encryption, deduplication, versioning, etc.

Now until recently I've been reasonably happy with CrashPlan, but my confidence has definitely been shaken lately. I'm not seeking actual recommendations for other options just yet, but I'm starting the research process. A big part of that is trying to determine whether what I am experiencing is anywhere close to normal *considering my data backup needs*. It may simply be that I'm asking too much of the system and need to get more reasonable, hehe. So what I would love is to hear from other people who are doing fairly large backups to *online* systems, ideally with the following features/characteristics (or close to):


* Data set at least 1TB, preferably around 2TB (my full data set is 1.9TB at present)
* Number of files at least 1 million, ideally 1.5 million or more (I have 1.5 million files backed up at present)
* Combined local and online backup (online backup is an important component; if you're only doing local, your info may be valuable, but it makes it not a direct comparison with CrashPlan)
* Encryption (being done locally)
* Deduplication being done on the backup set(s)
* Continuous backup/file system monitoring (this is not a critical requirement as I do not absolutely need the feature, but this is the way CrashPlan runs, so it would make it most directly comparable
* File versioning
The info I'm looking for is 1: What software are you using, 2: How often/on what schedule does it run, 3: How much data are you backing up, both in terms of number of files, and total size, 4: How much memory does the process (or processes) use at peak and on average, 5: How much CPU does the backup process use when actively backing up.

Hearing from other CrashPlan users with similar circumstances to myself would certainly be useful. It's very possible that the combination of data size, number of files, and features such as deduplication and file versioning simply make such high memory use somewhat inevitable (or a much slower backup by paging out to disk a lot more). If so, then it's time for me to think about getting rid of some features like possibly versioning (or try reducing length of version history perhaps). But I won't know until I can get some reference points as to whether this seems normal under the circumstances. Trying a bunch of different backup systems myself seems somewhat unfeasible as most would make me pay for uploading more than a fraction of my data, and online backup is a critical component of this.

Any info you can provide on your experiences would be great. Thanks!

- Oshyan

Renegade:
Not sure if my experience will be helpful, but, here goes...

1: What software are you using,
Acronis True Image
FreeNAS - Dedicated NAS box (HP Microserver)

2: How often/on what schedule does it run,
Acronis runs in non-stop mode.

3: How much data are you backing up, both in terms of number of files, and total size,
Complete system. 128 GB SSD (going to up this to 256 SSD when I have time to stick the drive in)

4: How much memory does the process (or processes) use at peak and on average,
Negligible. I never even notice it. So, I couldn't even tell you. I have 16 GB RAM in this box, so memory is rarely ever an issue.

5: How much CPU does the backup process use when actively backing up.
Again, never even notice it running. I have an AMD Phenom II X6 1090T CPU, which has a good amount of power.

Here's how my hardware stacks up in the WEI:



Acronis backs up to a dedicated 2 TB external drive. This protects my system and any files I've not backed up to the FreeNAS box.

A large amount of my storage is on the FreeNAS box. It has 4 drives in RAID 5. (Not optimal, but whatever - it works.)

I periodically *MOVE* files from my system to the FreeNAS box. They have RAID redundancy there, and I also have the 2 TB backup as well for the entire system and any files I've not backed up to the FreeNAS.

Jibz:
I don't know much about the specifics of CrashPlan, but lets try a little back of the envelope math :-*.

1.5 million files, an average path length of 64 (given unicode paths), a 128-bit hash, date/time, attributes, and a little room for other bookkeeping -- lets say 128 bytes per file, that's around 200 MB for the file list.

While 128 bytes per file could be on the low side, this does not really look like the cause.

2 TB of data, CrashPlan does data de-duplication at the block level, let's guess 16k blocks, that's 128 million blocks. Presumably we need to store a hash and some kind of ID for each block -- lets say 16 bytes per block, that's 2 GB of data.

If the program holds all of that in memory while doing the backup, I guess that could explain the memory usage you are seeing.

If you have all your data in one large backup set, you could try dividing it up into multiple (for instance, I have one set for my images, which is fairly large, but only runs once a day, and a couple for other stuff like my user profile and work folder, which run every 15 minutes). But since it looks like the de-duplication is done across one entire machine, this might not help that much. It could be that data sets of the size you have are rare enough that they prioritize speed over memory usage.

JavaJones:
Thanks Renegade. Unfortunately with that little data (relatively speaking), it's not a direct comparison. I also have a very beefy machine, actually a bit beefier than yours. ;) And have 16GB of RAM. Most of the time I can "spare" 2GB for my backup process, it just seems like I shouldn't have to. However...

Jibz, I appreciate the angle you took, and it was something I was thinking about as well but didn't really know how to quantify. From your "back of the napkin" calculations indeed the memory use could be justifiable for deduplication. I'm kind of tempted to disable that if I can and see what happens. I do have 2 separate backup sets, 1 for photos (like you, though I haven't changed its frequency of backup, and maybe I should), and one for everything else. The photos are by far the largest backup set, about 2/3s of the data.

So, I'll try to tweak a few things, but would still love to hear some feedback from others with similar backup needs/scenarios, especially anyone using one of the other "unlimited" online backup services with 1+TB of data, e.g. Carbonite, Backblaze, etc.

- Oshyan

Renegade:
Thanks Renegade. Unfortunately with that little data (relatively speaking), it's not a direct comparison. I also have a very beefy machine, actually a bit beefier than yours. ;) And have 16GB of RAM. Most of the time I can "spare" 2GB for my backup process, it just seems like I shouldn't have to. However...
-JavaJones (November 23, 2012, 02:12 AM)
--- End quote ---

Yeah, 128 GB isn't really a lot for backups. All the heavy-lifting is from the NAS. It's just easier for me to have RAID being the "backup" system. It's a very different approach to backups than having a software solution.

Navigation

[0] Message Index

[#] Next page

Go to full version