Comprehensive data protection for all workloads
Post Reply
npoIT
Influencer
Posts: 14
Liked: never
Joined: Jan 27, 2012 1:29 pm
Full Name: npoIT
Contact:

Windows share permission changes slows backup

Post by npoIT »

Hello -

We run reverse incremental backups with CBT enabled on some VMs with a number of volumes containing several TB of files. We haven't been able to nail down a clear pattern, but it is starting to seem that (at least sometimes) when someone modifies the security settings on a folder containing a large number of files (e.g. add a new user), our next incremental backup takes a very long time to process and actually creates a rather large incremental file. Does that make any sense to anyone? Has anyone seen any similar behavior? I have confirmed that there haven't been any significant changes in the files themselves (e.g. no one has added or changed a bunch of files).

Thanks.
tsightler
VP, Product Management
Posts: 6009
Liked: 2843 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Windows share permission changes slows backup

Post by tsightler »

Veeam doesn't backup of files, it backs up changed blocks. If you have a very large directory structure and you change permissions then this ACL has to be populated throughout the file tree, thus changing a lot of blocks.
npoIT
Influencer
Posts: 14
Liked: never
Joined: Jan 27, 2012 1:29 pm
Full Name: npoIT
Contact:

Re: Windows share permission changes slows backup

Post by npoIT »

I understand that Veem backs up blocks not files. My question stems from the fact that I am under the impression that NTFS stores the ACLs in the MFT (Master File Table), not the files themselves, thus I would not expect that permission changes would affect all that many blocks. It would seem that I have the wrong expectation though so I am wondering if anyone can explain this in more detail. Any ideas as to how to reduce this type of churn would also be appreciated.
chrisdearden
Veteran
Posts: 1531
Liked: 226 times
Joined: Jul 21, 2010 9:47 am
Full Name: Chris Dearden
Contact:

Re: Windows share permission changes slows backup

Post by chrisdearden »

Have you got shadow copies enabled on the folder ?
npoIT
Influencer
Posts: 14
Liked: never
Joined: Jan 27, 2012 1:29 pm
Full Name: npoIT
Contact:

Re: Windows share permission changes slows backup

Post by npoIT »

No shadow copies.
npoIT
Influencer
Posts: 14
Liked: never
Joined: Jan 27, 2012 1:29 pm
Full Name: npoIT
Contact:

Re: Windows share permission changes slows backup

Post by npoIT »

I am still having trouble getting this VM backed up. I have some new information, but I don't know if it is relevant or not.

We have had this sort of problem in the past and it has been resolved by running another full backup. Given that, we think it is some sort of CBT issue. I browsed the datastore today and noticed that although the size of the *-ctk.vmdk file on all of my other volumes is exactly 8000.50 KB, on the problem volume it is currently 7782.90 KB. I don't know how big it normally is on that volume, but that seems odd. Perhaps for whatever reason it has become corrupt and that is why the backup throughput is so slow?

If the ctk file is corrupted, then even if I let it run to completion I would have to think at least the backup for that volume would be corrupted. This is a nearly 2TB volume in a job that backs up 10TB in 2 VMs containing a total of ~29 volumes. There is a lot of free space included in that 10TB, but running a full backup on the whole job still takes a long time. I searched the forum and the responses in this thread http://forums.veeam.com/viewtopic.php?f=24&t=9755 seem to indicate that perhaps one cannot just delete the ctk file (which is what Vmware seems to suggest in at least some cases - supposedly it will just be rebuilt if it is missing).

I should probably re-run a full backup anyway, but before I do that I'd like to test to see if deleting/resetting the ctk file and/or running a full just on that volume clears up this problem. Any thoughts would be appreciated. While it isn't impossible, taking this host offline for the enable/disable CBT procedure (as described in the form post above) could impact a lot of people/operations (even off-hours) so I would like to avoid it if at all possible.
npoIT
Influencer
Posts: 14
Liked: never
Joined: Jan 27, 2012 1:29 pm
Full Name: npoIT
Contact:

Re: Windows share permission changes slows backup

Post by npoIT »

OK, now that I have had my coffee I see that this is a red herring. All of my other datastores are the same size but this one is slightly smaller and when I look at other VMs with other datastore sizes, the ctk file seems to be proportionally sized. Mea culpa.

I set up a new job for this one VM with just this one volume in it and I am doing a full backup now which is running at normal throughput rates. After that runs I will try the incremental job again and see if that improves. If not, I will run that as a full backup and see if that works properly. I will report back when I have the results.

I am still interested to know if/how the ctk file can be reset for one datastore though.
npoIT wrote:I am still having trouble getting this VM backed up. I have some new information, but I don't know if it is relevant or not.

We have had this sort of problem in the past and it has been resolved by running another full backup. Given that, we think it is some sort of CBT issue. I browsed the datastore today and noticed that although the size of the *-ctk.vmdk file on all of my other volumes is exactly 8000.50 KB, on the problem volume it is currently 7782.90 KB. I don't know how big it normally is on that volume, but that seems odd. Perhaps for whatever reason it has become corrupt and that is why the backup throughput is so slow?

If the ctk file is corrupted, then even if I let it run to completion I would have to think at least the backup for that volume would be corrupted. This is a nearly 2TB volume in a job that backs up 10TB in 2 VMs containing a total of ~29 volumes. There is a lot of free space included in that 10TB, but running a full backup on the whole job still takes a long time. I searched the forum and the responses in this thread http://forums.veeam.com/viewtopic.php?f=24&t=9755 seem to indicate that perhaps one cannot just delete the ctk file (which is what Vmware seems to suggest in at least some cases - supposedly it will just be rebuilt if it is missing).

I should probably re-run a full backup anyway, but before I do that I'd like to test to see if deleting/resetting the ctk file and/or running a full just on that volume clears up this problem. Any thoughts would be appreciated. While it isn't impossible, taking this host offline for the enable/disable CBT procedure (as described in the form post above) could impact a lot of people/operations (even off-hours) so I would like to avoid it if at all possible.
npoIT
Influencer
Posts: 14
Liked: never
Joined: Jan 27, 2012 1:29 pm
Full Name: npoIT
Contact:

Re: Windows share permission changes slows backup

Post by npoIT »

The full backup on this volume completed with normal throughput as expected. I then re-ran the incremental backup and, perhaps not surprisingly, it was still slow. I am in the process of running the full vm backup now and it is proceeding as expected.

So to restate my original post:

(A) Changing permissions on a large number of files is causing my incremental backups to run very slowly. Not only do they start running slowly, but they get progressively slower as the job continues and it seems that the job might never finish. It definitely looks like it wouldn't finish within a 24 hour backup window. The disk that is causing the problem is 1.9TB and it is about half empty. After running the incremental backup for 6+ hours it is only 4% done and it is processing at only 4 Mb/s. A full backup of that volume runs at ~55 MB/s and completes within 6 1/2 hours. I don't understand why changing file permissions would cause this, but I would like to so that I could try to avoid being forced to run a full backup in these situations.

(B) Whatever the cause of this problem might be, I would like to know if there is any relatively easy way to work around it for one specific disk within a job. That is, is there some way that I can force a full backup for just one volume within a job but have the remainder of the job run an incremental backup as usual? Could I just delete the *-ctk.vmdk file for that volume?

This could easily be worked around if there was just one particular volume that was sometimes problematic, but that is not the case. The only thing I can think of right now is to break this job up into one job per VM. I could break one VM into more than one job, but I'd rather not as a volume could be added to the VM without it being properly added to a backup job.
tsightler
VP, Product Management
Posts: 6009
Liked: 2843 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Windows share permission changes slows backup

Post by tsightler »

What do the bottleneck statistics show? This will give you a really good clue as to the actual problem.

You can always reset CBT on a VM by following the instructions in KB 1113.
npoIT
Influencer
Posts: 14
Liked: never
Joined: Jan 27, 2012 1:29 pm
Full Name: npoIT
Contact:

Re: Windows share permission changes slows backup

Post by npoIT »

tsightler wrote:What do the bottleneck statistics show? This will give you a really good clue as to the actual problem.

You can always reset CBT on a VM by following the instructions in KB 1113.
The bottleneck says target, but that seems unlikely. This is extremely slow for a SAN backup and the system is clearly capable of more throughput (as evidenced by the subsequent full backup). Only one job is running at a time. We are running reverse incrementals, so perhaps there is some housekeeping involved that is causing the problem (see below).

Thanks for the KB link, but that is what I am trying to avoid as it is difficult to shut down these VMs - even off hours. I've read at least one VMware KB item where they recommend resetting the CBT by deleting the file and they say it will just be recreated if it is missing (presumably from that point forward), but I suspect Veeam has internal CBT database entries that would be very unhappy if I did that.

We are now wondering if this might be an issue with the internal file database and not the VMware *-ctk.vmdk file. The volume in question literally has about a million files on it in many directories, so it is possible that the permission change caused a lot of overhead in the processing. Rather than just guess, I am going to go ahead and open a ticket, send the logs, and see what support can ascertain. We still have one job left that includes this volume and can still be used to demonstrate this issue. I will need to get that rerun over the coming weekend though, so our window of opportunity is relatively brief.
foggy
Veeam Software
Posts: 21069
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Windows share permission changes slows backup

Post by foggy »

npoIT wrote:(A) Changing permissions on a large number of files is causing my incremental backups to run very slowly. Not only do they start running slowly, but they get progressively slower as the job continues and it seems that the job might never finish. It definitely looks like it wouldn't finish within a 24 hour backup window. The disk that is causing the problem is 1.9TB and it is about half empty. After running the incremental backup for 6+ hours it is only 4% done and it is processing at only 4 Mb/s. A full backup of that volume runs at ~55 MB/s and completes within 6 1/2 hours. I don't understand why changing file permissions would cause this, but I would like to so that I could try to avoid being forced to run a full backup in these situations.
Large increment is expected if you use the Propagate permissions to child objects setting as this causes change of large number of files in all subfolders. Besides, NTFS reserves 12.5% of the disk space for MFT by default. Changing permissions on large number of files could also result in significant MFT size growth. Add to this reverse incremental backup mode, which requires 3x number of I/O operations comparing to the full backup due to random reads and writes and you get much slower speed of backup. With 100GB change rate per VM you are likely to have incremental job running longer than the full one.
npoIT
Influencer
Posts: 14
Liked: never
Joined: Jan 27, 2012 1:29 pm
Full Name: npoIT
Contact:

Re: Windows share permission changes slows backup

Post by npoIT »

Thanks for the reply. I did not realize that the MFT could be so large. It really doesn't seem like adding a permission for "this folder only" on a folder that only contains 3 subfolders should make a massive change to it though, but I can't be 100% certain that was all that changed as I did not make the change myself.

With that said, I have gone over this case with support and they agree that something about this seems very strange, but they haven't come up with any explanation so far. I do have one additional point of information to add. We had 2 jobs that showed these same symptoms. On one of them we reran a full and it performed as expected and has been running reverse incrementals just fine since then. On the other one we switched it from reverse incremental to forward incremental with synthetic fulls and it was able to back up the "problematic" volume in reasonable amount of time and with good throughput. That strikes me as very odd, but perhaps it won't to someone who understand the details of how these options change the nature of the job. The support tech I am working with is looking into that and is going to get back to me.

We have decided forward incrementals with synthetic fulls are the best option for us anyway, so we will be switching over to them ASAP. I will be interested to see if this issue crops up again after we make that change or if the forward incrementals just don't behave the same way.

foggy wrote: Large increment is expected if you use the Propagate permissions to child objects setting as this causes change of large number of files in all subfolders. Besides, NTFS reserves 12.5% of the disk space for MFT by default. Changing permissions on large number of files could also result in significant MFT size growth. Add to this reverse incremental backup mode, which requires 3x number of I/O operations comparing to the full backup due to random reads and writes and you get much slower speed of backup. With 100GB change rate per VM you are likely to have incremental job running longer than the full one.
Post Reply

Who is online

Users browsing this forum: Bing [Bot], Google [Bot], ybarrap2003 and 240 guests