I have the following use case: 4TB of graphics design work (collections of high res images, project files etc.). Less than 1% changes from week to week. The data set also grows by less than 1% from week to week. I would like to use File backup to Azure archive tier for this setup. And I am thinking that real-time backup makes the most sense. I am also interested in only keeping the last 3 versions of a file. I also need daily email reports indicating backup passed/failed. What would be the most efficient way to setup this job? Would real-time backup create a huge performance hit on the server that houses this data? How will the real-time job be able to iterate over the entire data set 24/7 (or is my functional understanding of standard backup jobs misleading me here)? Any input would be greatly appreciated!
Most likely you'll notice a big performance hit during the scanning procedure, so it's better to set up a standard scheduled backup for this data set.
The first backup will take a lot of time to complete, but after that only the changes will be uploaded to the storage side and everything should be good.
Ok, so the reason why I was interested in real-time is because I have a similar client who is using File backup for a file server's data set. As you can see here, the scan still takes an enormously long time to only backup a few files! (The client has about a 100Mbps upstream, so that is definitely not a bottleneck.) I was hoping for an alternative route to avoid this lengthy scan time for the new use case.
It's not a bandwidth problem. Scanning large quantities of files will naturally take a lot of time, and if you use real-time backup that would also be taxing for your system, that's why I would personally recommend using stand scheduled backup plan.
The other side-effect you may run into is that you want to keep the last 3 versions, but if you're in active editing sessions of files that you are saving frequently while editing, you may end up with the same file being sent up to the cloud over and over with little changes between them. Meaning, if you needed to go back to yesterday's edit, it may be gone already because of the retention settings. I think it's probably better as Matt stated to consider fixed schedules, especially if they can be scheduled to run during off-hours when the user is off the PC and the broadband connection has less contention than it might during business hours.
Yeah, makes sense. How can I get the scan speed optimized? Today I enabled Fast NTFS option on the screenshot'd client above to see how that improves the situation.
Let is know if that helps. May depend on how many files you have in the folders you are backing up. The numbers you posted do not look healthy. Where are these files you’re backing up? Local disk, NAS, file server?
Have you ever tested the read speed off the SAN. Not saying that's the issue, but I've seen poor performing SANs many times in my history with customers. You could try a copy of a large file file off the SAN to the NUL device to test read speed from a command prompt.
Hey guys, me again with same goal of trying to squeeze every last drop of performance out of this use case. I am going with your recommendation of NOT using real-time backup but rather scheduled. David, I don't have an easy way to test the SAN speed, but I honestly don't believe that is the bottleneck as we are talking SAS disks in RAID10 with VMware VMs. I would like to focus on some other possibilities.
I'm still getting the following results on this file server. As you can see backing up 173 files to the tune of 1.16MB on 4/27/2019 to a local NAS took 7 hours! Backing up 720 files to the tune of 91.28MB to Azure took 13.5 hours.
So I increased the RAM and virtual cores provisioned to the file server. Now the VM has 8GB RAM and 6 virtual cores. However, I am seeing CloudBerry still not using all that is available, hence a likely bottleneck. Can you take a look at these settings and let me know what you recommend?
Are we talking VMware virtual disks and you are using the CloudBerry agent installed on one or more of the virtual machines? If so, is this shared VM storage or dedicated virtual disks on the VMs themselves?
I think more importantly: Have you open a support case? If not, I'd encourage you to continue this conversation with the support team (reference this thread when contacting support). All we can do here is speculate (which will not be fun for you if you're looking for a quick answer).
The support team needs to understand the environment in more detail, examine the logs to see where the issues are located, and can then make CloudBerry configuration recommendations that might help or provide you with information if it turns out to be a non-CloudBerry issue.
In your screenshots, thread count can only help when either there is available CPU cycles we can leverage for compression and encryption and there is available broadband / network to the storage. But it requires we can read data faster to support the added threads. If data reads are slow, then adding threads will likely have no effect on back speed since that is not the bottleneck.
But I still recommend opening a support case first.
James, I'm having a similar issue, but with hundreds of thousands of small audio files. Cloud Berry scans everything for changes, looking like its going to take hours to accomplish before uploading only MBs of files. Any luck with your progress? Many thanks.
I was reading somewhere in another forum about v 6.0.1.66 working better for the scanning and that the latest version ending .22 is no good. I'm trying to chase down that version and will let you know.
Trying to figure out as well why during scheduled backup, it seems to scan through all files sorting by name, instead of sorting by date modified. I understand Cloudberry Backup uses date modified to know which files it needs to backup during a scheduled backup, but if it were to sort by date modified instead, it would actually scan more quickly, no? I've got 290,000 small(less than 1MB) that Cloud Berry has to scan through to get to the new stuff to backup, such as if the new thing to backup starts with the letter Z. When I did a test with by adding a new folder named starting with the letter A, the scan caught that new folder immediately on the scheduled scan, but then kept going through the rest of the files to find new stuff. You guys seem to need logs to understand my issue more, but I don't know how logs would help you when this is a fundamental problem with the scheduled scanning of the backup source.
Eric, if scanning is an issue, the logs will help identify as such. Are you using the Fast NTFS Scan option? If not, you can try it. Sending the logs is a 1 minute process from the product - Tools - Diagnostic - please send them for review.
@David Gugick
Done. I also accidentally hit stop on the first backup. It is Needing to now scan through 100,000 files before continuing on with the backup. Pretty crazy...
I know this is an old thread, but I am trying the latest Cloudberry version before I completely give up on this software. I don't understand why it takes 7.5 hours to scan 650 GB of used space on my SSD when only 128 MB of files are discovered to be changed since the last scan.
If you want proof that this is ludicrous, RoboCopy (a tool that comes with Windows) can scan every file on my drive for changes (which I have done because I have a mirrored backup drive) in about 35 mins and that is with it only set to use 1 thread!
I love the cloud functions, but Cloudberry has never been a serious contender because of the abysmal scan speeds.
I think you’d have to detail your settings for backup as a scan should not take that long. VSS options, Fast NTFS Scan, and anything else from your backup plan worth reporting and we’ll take a look. A support case would likely yield the fastest resolution (Tools - Diagnostic option).