-
- Influencer
- Posts: 14
- Liked: never
- Joined: Jan 27, 2012 1:29 pm
- Full Name: npoIT
- Contact:
Windows share permission changes slows backup
Hello -
We run reverse incremental backups with CBT enabled on some VMs with a number of volumes containing several TB of files. We haven't been able to nail down a clear pattern, but it is starting to seem that (at least sometimes) when someone modifies the security settings on a folder containing a large number of files (e.g. add a new user), our next incremental backup takes a very long time to process and actually creates a rather large incremental file. Does that make any sense to anyone? Has anyone seen any similar behavior? I have confirmed that there haven't been any significant changes in the files themselves (e.g. no one has added or changed a bunch of files).
Thanks.
We run reverse incremental backups with CBT enabled on some VMs with a number of volumes containing several TB of files. We haven't been able to nail down a clear pattern, but it is starting to seem that (at least sometimes) when someone modifies the security settings on a folder containing a large number of files (e.g. add a new user), our next incremental backup takes a very long time to process and actually creates a rather large incremental file. Does that make any sense to anyone? Has anyone seen any similar behavior? I have confirmed that there haven't been any significant changes in the files themselves (e.g. no one has added or changed a bunch of files).
Thanks.
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: Windows share permission changes slows backup
Veeam doesn't backup of files, it backs up changed blocks. If you have a very large directory structure and you change permissions then this ACL has to be populated throughout the file tree, thus changing a lot of blocks.
-
- Influencer
- Posts: 14
- Liked: never
- Joined: Jan 27, 2012 1:29 pm
- Full Name: npoIT
- Contact:
Re: Windows share permission changes slows backup
I understand that Veem backs up blocks not files. My question stems from the fact that I am under the impression that NTFS stores the ACLs in the MFT (Master File Table), not the files themselves, thus I would not expect that permission changes would affect all that many blocks. It would seem that I have the wrong expectation though so I am wondering if anyone can explain this in more detail. Any ideas as to how to reduce this type of churn would also be appreciated.
-
- Veteran
- Posts: 1531
- Liked: 226 times
- Joined: Jul 21, 2010 9:47 am
- Full Name: Chris Dearden
- Contact:
Re: Windows share permission changes slows backup
Have you got shadow copies enabled on the folder ?
-
- Influencer
- Posts: 14
- Liked: never
- Joined: Jan 27, 2012 1:29 pm
- Full Name: npoIT
- Contact:
Re: Windows share permission changes slows backup
No shadow copies.
-
- Influencer
- Posts: 14
- Liked: never
- Joined: Jan 27, 2012 1:29 pm
- Full Name: npoIT
- Contact:
Re: Windows share permission changes slows backup
I am still having trouble getting this VM backed up. I have some new information, but I don't know if it is relevant or not.
We have had this sort of problem in the past and it has been resolved by running another full backup. Given that, we think it is some sort of CBT issue. I browsed the datastore today and noticed that although the size of the *-ctk.vmdk file on all of my other volumes is exactly 8000.50 KB, on the problem volume it is currently 7782.90 KB. I don't know how big it normally is on that volume, but that seems odd. Perhaps for whatever reason it has become corrupt and that is why the backup throughput is so slow?
If the ctk file is corrupted, then even if I let it run to completion I would have to think at least the backup for that volume would be corrupted. This is a nearly 2TB volume in a job that backs up 10TB in 2 VMs containing a total of ~29 volumes. There is a lot of free space included in that 10TB, but running a full backup on the whole job still takes a long time. I searched the forum and the responses in this thread http://forums.veeam.com/viewtopic.php?f=24&t=9755 seem to indicate that perhaps one cannot just delete the ctk file (which is what Vmware seems to suggest in at least some cases - supposedly it will just be rebuilt if it is missing).
I should probably re-run a full backup anyway, but before I do that I'd like to test to see if deleting/resetting the ctk file and/or running a full just on that volume clears up this problem. Any thoughts would be appreciated. While it isn't impossible, taking this host offline for the enable/disable CBT procedure (as described in the form post above) could impact a lot of people/operations (even off-hours) so I would like to avoid it if at all possible.
We have had this sort of problem in the past and it has been resolved by running another full backup. Given that, we think it is some sort of CBT issue. I browsed the datastore today and noticed that although the size of the *-ctk.vmdk file on all of my other volumes is exactly 8000.50 KB, on the problem volume it is currently 7782.90 KB. I don't know how big it normally is on that volume, but that seems odd. Perhaps for whatever reason it has become corrupt and that is why the backup throughput is so slow?
If the ctk file is corrupted, then even if I let it run to completion I would have to think at least the backup for that volume would be corrupted. This is a nearly 2TB volume in a job that backs up 10TB in 2 VMs containing a total of ~29 volumes. There is a lot of free space included in that 10TB, but running a full backup on the whole job still takes a long time. I searched the forum and the responses in this thread http://forums.veeam.com/viewtopic.php?f=24&t=9755 seem to indicate that perhaps one cannot just delete the ctk file (which is what Vmware seems to suggest in at least some cases - supposedly it will just be rebuilt if it is missing).
I should probably re-run a full backup anyway, but before I do that I'd like to test to see if deleting/resetting the ctk file and/or running a full just on that volume clears up this problem. Any thoughts would be appreciated. While it isn't impossible, taking this host offline for the enable/disable CBT procedure (as described in the form post above) could impact a lot of people/operations (even off-hours) so I would like to avoid it if at all possible.
-
- Influencer
- Posts: 14
- Liked: never
- Joined: Jan 27, 2012 1:29 pm
- Full Name: npoIT
- Contact:
Re: Windows share permission changes slows backup
OK, now that I have had my coffee I see that this is a red herring. All of my other datastores are the same size but this one is slightly smaller and when I look at other VMs with other datastore sizes, the ctk file seems to be proportionally sized. Mea culpa.
I set up a new job for this one VM with just this one volume in it and I am doing a full backup now which is running at normal throughput rates. After that runs I will try the incremental job again and see if that improves. If not, I will run that as a full backup and see if that works properly. I will report back when I have the results.
I am still interested to know if/how the ctk file can be reset for one datastore though.
I set up a new job for this one VM with just this one volume in it and I am doing a full backup now which is running at normal throughput rates. After that runs I will try the incremental job again and see if that improves. If not, I will run that as a full backup and see if that works properly. I will report back when I have the results.
I am still interested to know if/how the ctk file can be reset for one datastore though.
npoIT wrote:I am still having trouble getting this VM backed up. I have some new information, but I don't know if it is relevant or not.
We have had this sort of problem in the past and it has been resolved by running another full backup. Given that, we think it is some sort of CBT issue. I browsed the datastore today and noticed that although the size of the *-ctk.vmdk file on all of my other volumes is exactly 8000.50 KB, on the problem volume it is currently 7782.90 KB. I don't know how big it normally is on that volume, but that seems odd. Perhaps for whatever reason it has become corrupt and that is why the backup throughput is so slow?
If the ctk file is corrupted, then even if I let it run to completion I would have to think at least the backup for that volume would be corrupted. This is a nearly 2TB volume in a job that backs up 10TB in 2 VMs containing a total of ~29 volumes. There is a lot of free space included in that 10TB, but running a full backup on the whole job still takes a long time. I searched the forum and the responses in this thread http://forums.veeam.com/viewtopic.php?f=24&t=9755 seem to indicate that perhaps one cannot just delete the ctk file (which is what Vmware seems to suggest in at least some cases - supposedly it will just be rebuilt if it is missing).
I should probably re-run a full backup anyway, but before I do that I'd like to test to see if deleting/resetting the ctk file and/or running a full just on that volume clears up this problem. Any thoughts would be appreciated. While it isn't impossible, taking this host offline for the enable/disable CBT procedure (as described in the form post above) could impact a lot of people/operations (even off-hours) so I would like to avoid it if at all possible.
-
- Influencer
- Posts: 14
- Liked: never
- Joined: Jan 27, 2012 1:29 pm
- Full Name: npoIT
- Contact:
Re: Windows share permission changes slows backup
The full backup on this volume completed with normal throughput as expected. I then re-ran the incremental backup and, perhaps not surprisingly, it was still slow. I am in the process of running the full vm backup now and it is proceeding as expected.
So to restate my original post:
(A) Changing permissions on a large number of files is causing my incremental backups to run very slowly. Not only do they start running slowly, but they get progressively slower as the job continues and it seems that the job might never finish. It definitely looks like it wouldn't finish within a 24 hour backup window. The disk that is causing the problem is 1.9TB and it is about half empty. After running the incremental backup for 6+ hours it is only 4% done and it is processing at only 4 Mb/s. A full backup of that volume runs at ~55 MB/s and completes within 6 1/2 hours. I don't understand why changing file permissions would cause this, but I would like to so that I could try to avoid being forced to run a full backup in these situations.
(B) Whatever the cause of this problem might be, I would like to know if there is any relatively easy way to work around it for one specific disk within a job. That is, is there some way that I can force a full backup for just one volume within a job but have the remainder of the job run an incremental backup as usual? Could I just delete the *-ctk.vmdk file for that volume?
This could easily be worked around if there was just one particular volume that was sometimes problematic, but that is not the case. The only thing I can think of right now is to break this job up into one job per VM. I could break one VM into more than one job, but I'd rather not as a volume could be added to the VM without it being properly added to a backup job.
So to restate my original post:
(A) Changing permissions on a large number of files is causing my incremental backups to run very slowly. Not only do they start running slowly, but they get progressively slower as the job continues and it seems that the job might never finish. It definitely looks like it wouldn't finish within a 24 hour backup window. The disk that is causing the problem is 1.9TB and it is about half empty. After running the incremental backup for 6+ hours it is only 4% done and it is processing at only 4 Mb/s. A full backup of that volume runs at ~55 MB/s and completes within 6 1/2 hours. I don't understand why changing file permissions would cause this, but I would like to so that I could try to avoid being forced to run a full backup in these situations.
(B) Whatever the cause of this problem might be, I would like to know if there is any relatively easy way to work around it for one specific disk within a job. That is, is there some way that I can force a full backup for just one volume within a job but have the remainder of the job run an incremental backup as usual? Could I just delete the *-ctk.vmdk file for that volume?
This could easily be worked around if there was just one particular volume that was sometimes problematic, but that is not the case. The only thing I can think of right now is to break this job up into one job per VM. I could break one VM into more than one job, but I'd rather not as a volume could be added to the VM without it being properly added to a backup job.
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: Windows share permission changes slows backup
What do the bottleneck statistics show? This will give you a really good clue as to the actual problem.
You can always reset CBT on a VM by following the instructions in KB 1113.
You can always reset CBT on a VM by following the instructions in KB 1113.
-
- Influencer
- Posts: 14
- Liked: never
- Joined: Jan 27, 2012 1:29 pm
- Full Name: npoIT
- Contact:
Re: Windows share permission changes slows backup
The bottleneck says target, but that seems unlikely. This is extremely slow for a SAN backup and the system is clearly capable of more throughput (as evidenced by the subsequent full backup). Only one job is running at a time. We are running reverse incrementals, so perhaps there is some housekeeping involved that is causing the problem (see below).tsightler wrote:What do the bottleneck statistics show? This will give you a really good clue as to the actual problem.
You can always reset CBT on a VM by following the instructions in KB 1113.
Thanks for the KB link, but that is what I am trying to avoid as it is difficult to shut down these VMs - even off hours. I've read at least one VMware KB item where they recommend resetting the CBT by deleting the file and they say it will just be recreated if it is missing (presumably from that point forward), but I suspect Veeam has internal CBT database entries that would be very unhappy if I did that.
We are now wondering if this might be an issue with the internal file database and not the VMware *-ctk.vmdk file. The volume in question literally has about a million files on it in many directories, so it is possible that the permission change caused a lot of overhead in the processing. Rather than just guess, I am going to go ahead and open a ticket, send the logs, and see what support can ascertain. We still have one job left that includes this volume and can still be used to demonstrate this issue. I will need to get that rerun over the coming weekend though, so our window of opportunity is relatively brief.
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Windows share permission changes slows backup
Large increment is expected if you use the Propagate permissions to child objects setting as this causes change of large number of files in all subfolders. Besides, NTFS reserves 12.5% of the disk space for MFT by default. Changing permissions on large number of files could also result in significant MFT size growth. Add to this reverse incremental backup mode, which requires 3x number of I/O operations comparing to the full backup due to random reads and writes and you get much slower speed of backup. With 100GB change rate per VM you are likely to have incremental job running longer than the full one.npoIT wrote:(A) Changing permissions on a large number of files is causing my incremental backups to run very slowly. Not only do they start running slowly, but they get progressively slower as the job continues and it seems that the job might never finish. It definitely looks like it wouldn't finish within a 24 hour backup window. The disk that is causing the problem is 1.9TB and it is about half empty. After running the incremental backup for 6+ hours it is only 4% done and it is processing at only 4 Mb/s. A full backup of that volume runs at ~55 MB/s and completes within 6 1/2 hours. I don't understand why changing file permissions would cause this, but I would like to so that I could try to avoid being forced to run a full backup in these situations.
-
- Influencer
- Posts: 14
- Liked: never
- Joined: Jan 27, 2012 1:29 pm
- Full Name: npoIT
- Contact:
Re: Windows share permission changes slows backup
Thanks for the reply. I did not realize that the MFT could be so large. It really doesn't seem like adding a permission for "this folder only" on a folder that only contains 3 subfolders should make a massive change to it though, but I can't be 100% certain that was all that changed as I did not make the change myself.
With that said, I have gone over this case with support and they agree that something about this seems very strange, but they haven't come up with any explanation so far. I do have one additional point of information to add. We had 2 jobs that showed these same symptoms. On one of them we reran a full and it performed as expected and has been running reverse incrementals just fine since then. On the other one we switched it from reverse incremental to forward incremental with synthetic fulls and it was able to back up the "problematic" volume in reasonable amount of time and with good throughput. That strikes me as very odd, but perhaps it won't to someone who understand the details of how these options change the nature of the job. The support tech I am working with is looking into that and is going to get back to me.
We have decided forward incrementals with synthetic fulls are the best option for us anyway, so we will be switching over to them ASAP. I will be interested to see if this issue crops up again after we make that change or if the forward incrementals just don't behave the same way.
With that said, I have gone over this case with support and they agree that something about this seems very strange, but they haven't come up with any explanation so far. I do have one additional point of information to add. We had 2 jobs that showed these same symptoms. On one of them we reran a full and it performed as expected and has been running reverse incrementals just fine since then. On the other one we switched it from reverse incremental to forward incremental with synthetic fulls and it was able to back up the "problematic" volume in reasonable amount of time and with good throughput. That strikes me as very odd, but perhaps it won't to someone who understand the details of how these options change the nature of the job. The support tech I am working with is looking into that and is going to get back to me.
We have decided forward incrementals with synthetic fulls are the best option for us anyway, so we will be switching over to them ASAP. I will be interested to see if this issue crops up again after we make that change or if the forward incrementals just don't behave the same way.
foggy wrote: Large increment is expected if you use the Propagate permissions to child objects setting as this causes change of large number of files in all subfolders. Besides, NTFS reserves 12.5% of the disk space for MFT by default. Changing permissions on large number of files could also result in significant MFT size growth. Add to this reverse incremental backup mode, which requires 3x number of I/O operations comparing to the full backup due to random reads and writes and you get much slower speed of backup. With 100GB change rate per VM you are likely to have incremental job running longer than the full one.
Who is online
Users browsing this forum: Amr Sadek and 72 guests