-
- Influencer
- Posts: 11
- Liked: never
- Joined: Sep 04, 2013 3:47 pm
- Contact:
Issues with large File Server backups
Hey all,
I have a client that has a job that processes roughly 7TB worth of data in 7 different file server VMs. We currently are running a reverse daily incremental on the VMs. We have had Veeam setup for around a month or so now, and have had nothing but issues with this job. It began with running out of space on the repository due to the large size, which we resolved by moving to a new repository dedicated to this one job. After that, we had a Zlib decompression error which we resolved doing a new active full backup. This wasn't the most undesirable solution as we lost our entire chain of backups before this.
We now are having a new error processing one VM - "Client error: end of file Failed to process [srcCopyLocal] command. Exception from server: RLE decompression error: [526296] bytes decoded to [524288]. Failed to process [srcCopyLocal] command.
It seems as though the only explanation for these repeated failures are network packet loss where the backup ends up corrupt. Anyone correct me if I'm wrong assuming that. Is there any way we can set up a job so that in case there's a failure during on the backups, we don't have to keep running new active fulls and losing our backup chain? Or is there an option to do a better backup verification before confirming success so the job doesn't move forward with a corrupt point that it is working off of? Would changing the storage optimization to a WAN target help?
FYI, currently running inline deduplication, dedupe-friendly compression, LAN target storage optimization, and am running deduplication on the target Win Server 2012 repository box.
Thanks in advance,
Jordan
case# 00462685
I have a client that has a job that processes roughly 7TB worth of data in 7 different file server VMs. We currently are running a reverse daily incremental on the VMs. We have had Veeam setup for around a month or so now, and have had nothing but issues with this job. It began with running out of space on the repository due to the large size, which we resolved by moving to a new repository dedicated to this one job. After that, we had a Zlib decompression error which we resolved doing a new active full backup. This wasn't the most undesirable solution as we lost our entire chain of backups before this.
We now are having a new error processing one VM - "Client error: end of file Failed to process [srcCopyLocal] command. Exception from server: RLE decompression error: [526296] bytes decoded to [524288]. Failed to process [srcCopyLocal] command.
It seems as though the only explanation for these repeated failures are network packet loss where the backup ends up corrupt. Anyone correct me if I'm wrong assuming that. Is there any way we can set up a job so that in case there's a failure during on the backups, we don't have to keep running new active fulls and losing our backup chain? Or is there an option to do a better backup verification before confirming success so the job doesn't move forward with a corrupt point that it is working off of? Would changing the storage optimization to a WAN target help?
FYI, currently running inline deduplication, dedupe-friendly compression, LAN target storage optimization, and am running deduplication on the target Win Server 2012 repository box.
Thanks in advance,
Jordan
case# 00462685
-
- Product Manager
- Posts: 20400
- Liked: 2298 times
- Joined: Oct 26, 2012 3:28 pm
- Full Name: Vladimir Eremin
- Contact:
Re: Issues with large File Server backups
I’m wondering what type of repository it is. Common Windows repository or CIFS share?
I’m asking, because starting from version 6.5, we have had, indeed, inline network traffic validation. In general, with this functionality blocks that become corrupted during network transfer are resent automatically.
Traffic is validated between two Veeam agents. Though, it’s not the case with CIFS share that can’t run Veeam transport service.
Thanks.
I’m asking, because starting from version 6.5, we have had, indeed, inline network traffic validation. In general, with this functionality blocks that become corrupted during network transfer are resent automatically.
Traffic is validated between two Veeam agents. Though, it’s not the case with CIFS share that can’t run Veeam transport service.
Thanks.
-
- Influencer
- Posts: 11
- Liked: never
- Joined: Sep 04, 2013 3:47 pm
- Contact:
Re: Issues with large File Server backups
The repository is a common windows repository.
-
- Influencer
- Posts: 11
- Liked: never
- Joined: Sep 04, 2013 3:47 pm
- Contact:
Re: Issues with large File Server backups
I have now experienced this issue with a different Veeam job. I have a current case open now for this one, #00478940. I've searched the forums for an answer but haven't come across much besides my own post here.
Can anyone provide assistance? At first it seems like a network data transfer corruption but with inline data verification I would hope that wouldn't be the issue.
Can anyone provide assistance? At first it seems like a network data transfer corruption but with inline data verification I would hope that wouldn't be the issue.
-
- Expert
- Posts: 179
- Liked: 8 times
- Joined: Jul 02, 2013 7:48 pm
- Full Name: Koen Teugels
- Contact:
Re: Issues with large File Server backups
did you use a vmxnet3 adaptor?
-
- Influencer
- Posts: 11
- Liked: never
- Joined: Sep 04, 2013 3:47 pm
- Contact:
Re: Issues with large File Server backups
Yes, well, we weren't sure if it was only the proxy that was necessary to change the network adapter or not. So all of the 60+ VMs have the default e1000e adapter while the proxies have the vmxnet3 adapter.
I was told that with the inline data verification that this shouldn't be an issue though.
I was told that with the inline data verification that this shouldn't be an issue though.
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: Issues with large File Server backups
I'd probably be more suspicious of the Windows 2012 dedupe as I've seen several users report strange "corruption" style issues when using it. I'm not sure that's the problem, but there have been several threads regarding strange corruption issues and in almost all cases they were using Windows 2012 dedupe with the repository:
http://forums.veeam.com/viewtopic.php?f ... 94&start=0
If you're using reverse incremental there's probably not a big advantage to using Windows 2012 dedupe anyway, just use Veeam compression and you'll probably get very similar savings.
http://forums.veeam.com/viewtopic.php?f ... 94&start=0
If you're using reverse incremental there's probably not a big advantage to using Windows 2012 dedupe anyway, just use Veeam compression and you'll probably get very similar savings.
-
- Influencer
- Posts: 11
- Liked: never
- Joined: Sep 04, 2013 3:47 pm
- Contact:
Re: Issues with large File Server backups
I am using reverse incremental for the job, and am currently deduping almost 2tb of data using Windows 2012 dedup capabilitiestsightler wrote:If you're using reverse incremental there's probably not a big advantage to using Windows 2012 dedupe anyway, just use Veeam compression and you'll probably get very similar savings.
I originally implemented this after reading the following article
http://www.veeam.com/blog/how-to-get-un ... ation.html
I'll attempt to turn off deduplication and see if there's any resolution. It's just unfortunate this occurred because the article, written/endorse by Veeam, strongly pushes and suggests using this option for the highest performance of your backup repository.
-
- Influencer
- Posts: 11
- Liked: never
- Joined: Sep 04, 2013 3:47 pm
- Contact:
Re: Issues with large File Server backups
So all, I hope that I can get some more thoughts on this issue.
To recap:
- I am backing up to a Windows Server 2012 repository with volume level deduplication enabled.
- I received the RLE decompression error with per job deduplication enabled AND disabled.
- Reducing compression from a higher setting to the dedup-friendly level didn't help the situation.
I have opened a ticket regarding this issue a few times now with the ending suggestion each time being one of the above options being the resolution. I end up restarting a backup chain, and of course this resolves the issue right away, but it has happened again after an amount of time. This will be the 2nd or 3rd time I've experienced this with one of our jobs.
With the article that I linked in the post above, I was confident in enabling a volume AND job based deduplication. The results I'm seeing are phenomenal without much space is being saved. Is disabling the volume level deduplication the only option for moving forward troubleshooting this issue or is there any other options I can explore tweaking? I would love to not have to remove the volume deduplication due to the increased retention length this allows us to have.
Thanks in advanced for any input.
Also - I have opened a new case regarding this #00487453
To recap:
- I am backing up to a Windows Server 2012 repository with volume level deduplication enabled.
- I received the RLE decompression error with per job deduplication enabled AND disabled.
- Reducing compression from a higher setting to the dedup-friendly level didn't help the situation.
I have opened a ticket regarding this issue a few times now with the ending suggestion each time being one of the above options being the resolution. I end up restarting a backup chain, and of course this resolves the issue right away, but it has happened again after an amount of time. This will be the 2nd or 3rd time I've experienced this with one of our jobs.
With the article that I linked in the post above, I was confident in enabling a volume AND job based deduplication. The results I'm seeing are phenomenal without much space is being saved. Is disabling the volume level deduplication the only option for moving forward troubleshooting this issue or is there any other options I can explore tweaking? I would love to not have to remove the volume deduplication due to the increased retention length this allows us to have.
Thanks in advanced for any input.
Also - I have opened a new case regarding this #00487453
-
- Expert
- Posts: 230
- Liked: 41 times
- Joined: Feb 18, 2011 5:01 pm
- Contact:
Re: Issues with large File Server backups
7TB across 7 files servers? My first reaction would be to split it into multiple jobs, even one job per server, to reduce the size of your backup. Aside from speeding up the backup process, it would help you troubleshoot this issue. It would also keep you from having to do an active full on all 7TB and 7 servers if a job bombs.
-
- Influencer
- Posts: 11
- Liked: never
- Joined: Sep 04, 2013 3:47 pm
- Contact:
Re: Issues with large File Server backups
Thank you for the response.
The issue that I am most recently posting about is regarding a job to backup a set of lync servers totaling around 500gb. For the file servers, there is a little under 7 tb of data spread out over 7 different file server VMs. The job that backs up these file servers are split into two parts, which point to the same repository.
I appreciate your input regarding splitting the file servers into multiple jobs though and I think I will do this. You make a good point that this prevents me from happening to need to rerun an active full on multiple file servers. The only down side is that if I do this, I will have to end up rerunning each job afterwards thus creating a new active full on the repository which we currently do not have space for.
The issue that I am most recently posting about is regarding a job to backup a set of lync servers totaling around 500gb. For the file servers, there is a little under 7 tb of data spread out over 7 different file server VMs. The job that backs up these file servers are split into two parts, which point to the same repository.
I appreciate your input regarding splitting the file servers into multiple jobs though and I think I will do this. You make a good point that this prevents me from happening to need to rerun an active full on multiple file servers. The only down side is that if I do this, I will have to end up rerunning each job afterwards thus creating a new active full on the repository which we currently do not have space for.
-
- Service Provider
- Posts: 182
- Liked: 48 times
- Joined: Sep 03, 2012 5:28 am
- Full Name: Yizhar Hurwitz
- Contact:
Re: Issues with large File Server backups
Hi.
I also endorse splitting such large jobs (the 7tb job) into multiple smaller jobs, in your case 1 file server per job.
Having several smaller VBK files will make your experience much better and safer - as the scope for corruption (and also troubleshooting scope) is more focused.
Also it will help win2012 dedup to cope with smaller files.
I also suspect that win2012 dedup might be related to the problems so suggest trying first on a plain non-deduped target volume, then only after you have good results, add dedup to the picture on the same or different volume.
I suggest starting with a fresh NTFS volume if possible, instead of trying to rehydrate the exsiting data. If you don't want to wipe your existing backups you can take a different windows PC with enough free space (even a pc with single sata drive for testing), and configure it as another repository. Just make sure that it is configured with 1 concurrent job max.
Then target a single job to that repository, and check backup results for several days.
Yizhar
I also endorse splitting such large jobs (the 7tb job) into multiple smaller jobs, in your case 1 file server per job.
Having several smaller VBK files will make your experience much better and safer - as the scope for corruption (and also troubleshooting scope) is more focused.
Also it will help win2012 dedup to cope with smaller files.
I also suspect that win2012 dedup might be related to the problems so suggest trying first on a plain non-deduped target volume, then only after you have good results, add dedup to the picture on the same or different volume.
I suggest starting with a fresh NTFS volume if possible, instead of trying to rehydrate the exsiting data. If you don't want to wipe your existing backups you can take a different windows PC with enough free space (even a pc with single sata drive for testing), and configure it as another repository. Just make sure that it is configured with 1 concurrent job max.
Then target a single job to that repository, and check backup results for several days.
Yizhar
Who is online
Users browsing this forum: Google [Bot], oscarm and 161 guests