Comprehensive data protection for all workloads
Post Reply
jhladish
Influencer
Posts: 11
Liked: never
Joined: Sep 04, 2013 3:47 pm
Contact:

Issues with large File Server backups

Post by jhladish »

Hey all,

I have a client that has a job that processes roughly 7TB worth of data in 7 different file server VMs. We currently are running a reverse daily incremental on the VMs. We have had Veeam setup for around a month or so now, and have had nothing but issues with this job. It began with running out of space on the repository due to the large size, which we resolved by moving to a new repository dedicated to this one job. After that, we had a Zlib decompression error which we resolved doing a new active full backup. This wasn't the most undesirable solution as we lost our entire chain of backups before this.

We now are having a new error processing one VM - "Client error: end of file Failed to process [srcCopyLocal] command. Exception from server: RLE decompression error: [526296] bytes decoded to [524288]. Failed to process [srcCopyLocal] command.

It seems as though the only explanation for these repeated failures are network packet loss where the backup ends up corrupt. Anyone correct me if I'm wrong assuming that. Is there any way we can set up a job so that in case there's a failure during on the backups, we don't have to keep running new active fulls and losing our backup chain? Or is there an option to do a better backup verification before confirming success so the job doesn't move forward with a corrupt point that it is working off of? Would changing the storage optimization to a WAN target help?

FYI, currently running inline deduplication, dedupe-friendly compression, LAN target storage optimization, and am running deduplication on the target Win Server 2012 repository box.

Thanks in advance,
Jordan

case# 00462685
veremin
Product Manager
Posts: 20400
Liked: 2298 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Issues with large File Server backups

Post by veremin »

I’m wondering what type of repository it is. Common Windows repository or CIFS share?

I’m asking, because starting from version 6.5, we have had, indeed, inline network traffic validation. In general, with this functionality blocks that become corrupted during network transfer are resent automatically.

Traffic is validated between two Veeam agents. Though, it’s not the case with CIFS share that can’t run Veeam transport service.

Thanks.
jhladish
Influencer
Posts: 11
Liked: never
Joined: Sep 04, 2013 3:47 pm
Contact:

Re: Issues with large File Server backups

Post by jhladish »

The repository is a common windows repository.
jhladish
Influencer
Posts: 11
Liked: never
Joined: Sep 04, 2013 3:47 pm
Contact:

Re: Issues with large File Server backups

Post by jhladish »

I have now experienced this issue with a different Veeam job. I have a current case open now for this one, #00478940. I've searched the forums for an answer but haven't come across much besides my own post here.

Can anyone provide assistance? At first it seems like a network data transfer corruption but with inline data verification I would hope that wouldn't be the issue.
kte
Expert
Posts: 179
Liked: 8 times
Joined: Jul 02, 2013 7:48 pm
Full Name: Koen Teugels
Contact:

Re: Issues with large File Server backups

Post by kte »

did you use a vmxnet3 adaptor?
jhladish
Influencer
Posts: 11
Liked: never
Joined: Sep 04, 2013 3:47 pm
Contact:

Re: Issues with large File Server backups

Post by jhladish »

Yes, well, we weren't sure if it was only the proxy that was necessary to change the network adapter or not. So all of the 60+ VMs have the default e1000e adapter while the proxies have the vmxnet3 adapter.

I was told that with the inline data verification that this shouldn't be an issue though.
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Issues with large File Server backups

Post by tsightler »

I'd probably be more suspicious of the Windows 2012 dedupe as I've seen several users report strange "corruption" style issues when using it. I'm not sure that's the problem, but there have been several threads regarding strange corruption issues and in almost all cases they were using Windows 2012 dedupe with the repository:

http://forums.veeam.com/viewtopic.php?f ... 94&start=0

If you're using reverse incremental there's probably not a big advantage to using Windows 2012 dedupe anyway, just use Veeam compression and you'll probably get very similar savings.
jhladish
Influencer
Posts: 11
Liked: never
Joined: Sep 04, 2013 3:47 pm
Contact:

Re: Issues with large File Server backups

Post by jhladish »

tsightler wrote:If you're using reverse incremental there's probably not a big advantage to using Windows 2012 dedupe anyway, just use Veeam compression and you'll probably get very similar savings.
I am using reverse incremental for the job, and am currently deduping almost 2tb of data using Windows 2012 dedup capabilities

I originally implemented this after reading the following article
http://www.veeam.com/blog/how-to-get-un ... ation.html

I'll attempt to turn off deduplication and see if there's any resolution. It's just unfortunate this occurred because the article, written/endorse by Veeam, strongly pushes and suggests using this option for the highest performance of your backup repository.
jhladish
Influencer
Posts: 11
Liked: never
Joined: Sep 04, 2013 3:47 pm
Contact:

Re: Issues with large File Server backups

Post by jhladish »

So all, I hope that I can get some more thoughts on this issue.

To recap:
- I am backing up to a Windows Server 2012 repository with volume level deduplication enabled.
- I received the RLE decompression error with per job deduplication enabled AND disabled.
- Reducing compression from a higher setting to the dedup-friendly level didn't help the situation.

I have opened a ticket regarding this issue a few times now with the ending suggestion each time being one of the above options being the resolution. I end up restarting a backup chain, and of course this resolves the issue right away, but it has happened again after an amount of time. This will be the 2nd or 3rd time I've experienced this with one of our jobs.

With the article that I linked in the post above, I was confident in enabling a volume AND job based deduplication. The results I'm seeing are phenomenal without much space is being saved. Is disabling the volume level deduplication the only option for moving forward troubleshooting this issue or is there any other options I can explore tweaking? I would love to not have to remove the volume deduplication due to the increased retention length this allows us to have.

Thanks in advanced for any input.

Also - I have opened a new case regarding this #00487453
zoltank
Expert
Posts: 230
Liked: 41 times
Joined: Feb 18, 2011 5:01 pm
Contact:

Re: Issues with large File Server backups

Post by zoltank » 1 person likes this post

7TB across 7 files servers? My first reaction would be to split it into multiple jobs, even one job per server, to reduce the size of your backup. Aside from speeding up the backup process, it would help you troubleshoot this issue. It would also keep you from having to do an active full on all 7TB and 7 servers if a job bombs.
jhladish
Influencer
Posts: 11
Liked: never
Joined: Sep 04, 2013 3:47 pm
Contact:

Re: Issues with large File Server backups

Post by jhladish »

Thank you for the response.

The issue that I am most recently posting about is regarding a job to backup a set of lync servers totaling around 500gb. For the file servers, there is a little under 7 tb of data spread out over 7 different file server VMs. The job that backs up these file servers are split into two parts, which point to the same repository.

I appreciate your input regarding splitting the file servers into multiple jobs though and I think I will do this. You make a good point that this prevents me from happening to need to rerun an active full on multiple file servers. The only down side is that if I do this, I will have to end up rerunning each job afterwards thus creating a new active full on the repository which we currently do not have space for.
yizhar
Service Provider
Posts: 182
Liked: 48 times
Joined: Sep 03, 2012 5:28 am
Full Name: Yizhar Hurwitz
Contact:

Re: Issues with large File Server backups

Post by yizhar »

Hi.

I also endorse splitting such large jobs (the 7tb job) into multiple smaller jobs, in your case 1 file server per job.
Having several smaller VBK files will make your experience much better and safer - as the scope for corruption (and also troubleshooting scope) is more focused.
Also it will help win2012 dedup to cope with smaller files.

I also suspect that win2012 dedup might be related to the problems so suggest trying first on a plain non-deduped target volume, then only after you have good results, add dedup to the picture on the same or different volume.
I suggest starting with a fresh NTFS volume if possible, instead of trying to rehydrate the exsiting data. If you don't want to wipe your existing backups you can take a different windows PC with enough free space (even a pc with single sata drive for testing), and configure it as another repository. Just make sure that it is configured with 1 concurrent job max.
Then target a single job to that repository, and check backup results for several days.

Yizhar
Post Reply

Who is online

Users browsing this forum: Google [Bot], oscarm and 161 guests