DataDomain replication performance issue

Availability for the Always-On Enterprise

Re: DataDomain replication performance issue

Veeam Logoby Mike Resseler » Mon Oct 31, 2016 7:08 am

Hi Ferrus,

I have no information on this at this moment but I will try to contact some people to find out more. I will get back to you as soon as possible

Mike
Mike Resseler
Veeam Software
 
Posts: 3000
Liked: 354 times
Joined: Fri Feb 08, 2013 3:08 pm
Location: Belgium, the land of the fries, the beer, the chocolate and the diamonds...
Full Name: Mike Resseler

Re: DataDomain replication performance issue

Veeam Logoby ferrus » Mon Oct 31, 2016 10:00 am

Thanks for that. I'm going to hold off on creating any new MTree's for now.

Let me know if we can be of any assistance.
Our performance still reflects the graph on the previous page.
ferrus
Veeam ProPartner
 
Posts: 105
Liked: 18 times
Joined: Thu Dec 03, 2015 3:41 pm
Location: UK

Re: DataDomain replication performance issue

Veeam Logoby Mike Resseler » Mon Oct 31, 2016 8:02 pm

Ferrus,

It was discussed with DEV and QC today, but unfortunately it seems there is indeed something with the base file relationships which causes the Data Domain VSR capability to break. So yes, this means that the DD replication takes longer at this point in time. There is a discussion going on right now between Veeam and EMC to see where a potential solution can be defined, but I'm afraid it might take some time

Sorry, but at least I hope it helps a bit in your struggle that you have today and that you know we are aware of it

M/
Mike Resseler
Veeam Software
 
Posts: 3000
Liked: 354 times
Joined: Fri Feb 08, 2013 3:08 pm
Location: Belgium, the land of the fries, the beer, the chocolate and the diamonds...
Full Name: Mike Resseler

Re: DataDomain replication performance issue

Veeam Logoby ferrus » Mon Oct 31, 2016 8:40 pm

Mike

That's actually great news.

We've had a call in with EMC since July, which has been frustrating us more and more.
At the back of our minds we suspected it was a Veeam issue, or an incorrect configuration in our estate. We put the call in to them to help us identify the source of the problem.
Their response so far has been to deny that a problem exists with our replication, and to try and prove that everything is the same as it ever was - despite us demonstrating that in the hours following the Veeam upgrade we went from having clean replication an hour or two following the completion of the copy jobs - to a constant backlog of between 200-700TB, for the last 3-4 months.
Last week they closed the call (again), with no mention of this issue.

Now we know there's a problem, and it's being investigated - the pressure is off us slightly.
We know the replication worked fantastically in Veeam 8 - with a bandwidth 10x smaller, so I'm hoping it can return.

Let us know if you need access to our estate for any testing, etc.
ferrus
Veeam ProPartner
 
Posts: 105
Liked: 18 times
Joined: Thu Dec 03, 2015 3:41 pm
Location: UK

Re: DataDomain replication performance issue

Veeam Logoby Mike Resseler » Tue Nov 01, 2016 5:07 am

Ferrus,

Thank you for your kind offering... We might take you up on it ;-) (Me or a product manager will PM you)

Thanks

Mike
Mike Resseler
Veeam Software
 
Posts: 3000
Liked: 354 times
Joined: Fri Feb 08, 2013 3:08 pm
Location: Belgium, the land of the fries, the beer, the chocolate and the diamonds...
Full Name: Mike Resseler

Re: DataDomain replication performance issue

Veeam Logoby adb98 » Fri Nov 04, 2016 2:14 pm

I had the same issue and here is the solution that worked with me.

So I had worked with a guy from EMC support named Joseph. He is LV3 united states Sev 1 support. This guy is the big guns they call out when everyone is stuck (Getting ready to make a big purchase with EMC always helps when needing support :D). Anyway I was also having a replication issues after going to Veeam 9. Not sure what changed in Veeam and causes it but we figured out a solution with the help of Joe in EMC support and fixed it.

***Our Solution***
We found that there are only so many streams on a DD. Which we knew but we did not know is that there is a limit to how many streams an Mtree can use which I couldn't believe no one in lower support knew. You can have 196 streams like our new DD4200 but each Mtree has a limit on how much it can use for replication. So when you hit the limit it is capped and will not go any faster so you never fully saturate your link or push it to what it can do.

We had one giant Mtree for Veeam as we have other backup products. What Joe with EMC support did was create 6 more Mtrees. He then setup DDBoost for these Mtrees and we worked to spread out our jobs between them. We disabled the job and used fast copy on the DD to move the data over to the new Mtree. Once that was done we created new repositories for the data in the new Mtree and then in the job, we changed the path to point to the new repository. We then enabled the job and ran it and ensure all is ok. We then set up new replication points for the new Mtrees.

This kinda freaks Veeam out a little and puts the old backup directory and job in Disk (imported) but that is ok because when you use fast copy it will be a mirror copy of the data so it thinks that is old data. If you look at the job though it has all the data in the new location. I gave it a good week and then deleted the old job from Disk(Imported). Joe said it was always wise to just give it a few days. Then I went thru and deleted the old repository if it was not being used by anything else.

By doing this my replication has not had an issue.
adb98
Influencer
 
Posts: 14
Liked: 1 time
Joined: Thu Jul 21, 2016 5:03 pm
Full Name: Aaron B

Re: DataDomain replication performance issue

Veeam Logoby DeadEyedJacks » Fri Nov 04, 2016 3:37 pm

Just for reference:

We have two DD4200s with twelve source mtrees and twelve mirrors offsite on a DD4500, plus two DD2200 with six mtrees mirrored offsite to the same DD4500.
This was by design as aware of the replication streams by mtree being a potential bottleneck and if DD4x00s support ~128 mtrees why would you stick everything in a single ddboost container?
We do intend to periodically create additional mtrees as we foresee potential issues with extended resync or recover times as mtrees grow.

Veeam Backup system was built on v9 and so far has 458TB of source VM data backed up and replicated offsite, a notional 6PB of pre compression restore points.
So far replication performing inline with WAN link speeds which range from 100Mbs through 1Gbs to 10Gbs.

DDOS code base is mix of 5.6.0 thru 5.7.1
Microsoft, NetApp, Symantec, Veeam, Veritas and VMware certified professional
MCTS, MCSE, NCDA, NCIE-BR, ASC, SCS, VMTSP, VTS, VCP
DeadEyedJacks
Veeam ProPartner
 
Posts: 68
Liked: 7 times
Joined: Mon Oct 12, 2015 2:55 pm
Location: UK
Full Name: DeadEyedJacks

Re: DataDomain replication performance issue

Veeam Logoby ndolson » Sun Nov 06, 2016 10:44 pm 1 person likes this post

We had a very similar problem after upgrading to v9 - I believe we were one of the first support tickets on this issue with EMC. Something did indeed change with the way Veeam was marking data in v9. It resulted in our "pre compressed bytes sent" value on mtree replication to bloat incredibly - something like a petabyte needed to be sent before the appliances would be in sync, which would never happen over a 1 Gb link. Given the size of our environment, it'd have been mathematically impossible to have that volume of data to replicate. Unfortunately, our only option for quick resolution was to wipe the file system on our remote DD appliance, bring it back to our primary site, and perform a collection replication to resync locally. There were some new settings in the backup jobs that weren't present prior to v9 that Veeam recommended we have enabled, which EMC said the opposite was true and those should *not* be enabled. I don't recall specifically what those were, but our EMC SR# was 80283034 and the Veeam SR# was 01802332 and you may be able to reference those cases for details. About a day after EMC recommended we change our job settings to THEIR best practices, the Veeam "best practices" article was updated to reflect. This was a huge PITA for us, hopefully you can find a resolution that doesn't involve dropping your remote DD's data. I'm still a little salty about it.
ndolson
Influencer
 
Posts: 13
Liked: 2 times
Joined: Thu Jan 08, 2015 3:56 pm
Full Name: Neal

Re: DataDomain replication performance issue

Veeam Logoby mk2311 » Mon Nov 07, 2016 10:06 am

We have two DD990's. In April this year, we upgraded the DDOS from v5.5.2 to v5.7.1.10.

At the same time we upgraded Veeam from v8 to v9

The DD started alerting on the replication lag messages started about a week later. At one point, it was in excess of 100tb

EMC said it was a Veeam Issue. Veeam said it was an EMC issue

To resolve, Veeam recommended that we change the backup jobs 'Advanced Settings / Maintenance' option. We checked the 'Defragment And Compact Full Backup File' and run this weekly on selected days. Gradually, the lag count dropped and eventually cleared, but has recently re-appeared, but this may be down to some huge filesystem backups we have started running (20tb plus)
mk2311
Novice
 
Posts: 3
Liked: never
Joined: Tue Apr 28, 2015 2:12 pm
Full Name: Jeff White

Re: DataDomain replication performance issue

Veeam Logoby Mike Resseler » Mon Nov 07, 2016 10:17 am

Hi Neal,

First, I'm very sorry to hear about those problems that you had. It is indeed a PITA. Thanks for the additional information that you give us. We can use all the information to investigate. Really appreciated!

Mike
Mike Resseler
Veeam Software
 
Posts: 3000
Liked: 354 times
Joined: Fri Feb 08, 2013 3:08 pm
Location: Belgium, the land of the fries, the beer, the chocolate and the diamonds...
Full Name: Mike Resseler

Re: DataDomain replication performance issue

Veeam Logoby ndolson » Mon Nov 07, 2016 2:56 pm

ndolson wrote:We had a very similar problem after upgrading to v9 - I believe we were one of the first support tickets on this issue with EMC. Something did indeed change with the way Veeam was marking data in v9. It resulted in our "pre compressed bytes sent" value on mtree replication to bloat incredibly - something like a petabyte needed to be sent before the appliances would be in sync, which would never happen over a 1 Gb link. Given the size of our environment, it'd have been mathematically impossible to have that volume of data to replicate. Unfortunately, our only option for quick resolution was to wipe the file system on our remote DD appliance, bring it back to our primary site, and perform a collection replication to resync locally. There were some new settings in the backup jobs that weren't present prior to v9 that Veeam recommended we have enabled, which EMC said the opposite was true and those should *not* be enabled. I don't recall specifically what those were, but our EMC SR# was 80283034 and the Veeam SR# was 01802332 and you may be able to reference those cases for details. About a day after EMC recommended we change our job settings to THEIR best practices, the Veeam "best practices" article was updated to reflect. This was a huge PITA for us, hopefully you can find a resolution that doesn't involve dropping your remote DD's data. I'm still a little salty about it.

To correct this statement, it was the "pre-compressed bytes remaining" metric that was skewed when we monitored replication stats.

Thanks Mike, looking forward to finding out what the fix is, even though we're no longer affected by it.
ndolson
Influencer
 
Posts: 13
Liked: 2 times
Joined: Thu Jan 08, 2015 3:56 pm
Full Name: Neal

Re: DataDomain replication performance issue

Veeam Logoby ferrus » Fri Nov 11, 2016 4:20 pm

Thanks to everyone who has posted. The symptoms everyone's reporting match ours completely.
Unfortunately, I don't think we got through to the best support people at EMC!
Seems there's a lot more of us with the issue now, than when I first posted this thread. With a bit of hope, this might lead to a speedier resolution.

--------------------------

Just wondering if there's anything is the causes of the problem - that might also affect DD Health Check performance?

I know it's a bit of a stretch, but since upgrading we've also had much longer monthly Health Check times.
I initially put this down to the increases in our job sizes rather than anything to do with the upgrade, but it's becoming a real problem.

Our largest copy job - the Exchange backup, is around 14TB is size. The Health Check to the fibre connected DD2500, is now well into it's 8th day.
We've lost all those daily copy jobs, and two weekly's while it's been running.

The further It progresses, it appears to get slower as well. It's stil only at 90% at them moment, and I don't think it has increased by a percent today.
At the current rate, this would mean losing half of every month, just for the previous two week's health check.

I'd be grateful for any advice.
ferrus
Veeam ProPartner
 
Posts: 105
Liked: 18 times
Joined: Thu Dec 03, 2015 3:41 pm
Location: UK

Re: DataDomain replication performance issue

Veeam Logoby ndolson » Sun Nov 13, 2016 10:29 pm

Is it because the CPU is busier than normal due to it trying to crunch the replication delta between your appliances due to the bloated "pre compressed bytes remaining" value? I noticed the weekly data reclamation task was taking much longer than it previously did while this issue was affecting us...because the CPU was trying to compare the "new" [erroneous] data with what was on the destination appliance.
ndolson
Influencer
 
Posts: 13
Liked: 2 times
Joined: Thu Jan 08, 2015 3:56 pm
Full Name: Neal

Re: DataDomain replication performance issue

Veeam Logoby ferrus » Thu Nov 17, 2016 12:47 pm

So after 330+ hours of the Exchange Backup Job Health Check (just under 14 days) - the operation was stopped at 99%, by a Windows Update automatic reboot :evil:
Regardless of how it ended, it's unworkable to have a Health Check that runs that long. It's held back 13 nightly Backup Copy's, and 2 or 3 archived weekly/monthly restore points - of some of our most important data.

I took a look at the CPU of the DD. Not sure what a normal benchmark figure is - but it certainly wasn't using all of the processor capacity.

Not sure if this is related to the replication performance issue - or even if the issue is Veeam or EMC.
Can anyone from Veeam provide any pointers?
ferrus
Veeam ProPartner
 
Posts: 105
Liked: 18 times
Joined: Thu Dec 03, 2015 3:41 pm
Location: UK

Re: DataDomain replication performance issue

Veeam Logoby ndolson » Mon Nov 21, 2016 3:20 pm

What model and number of disks do you have? I don't recall specifically the CPU utilization of ours when we were dealing with this issue, but unless you have a large number of disks in the appliance, I could see a scenario where the capabilities of the drives would max out well before the CPU's hit max utilization.
ndolson
Influencer
 
Posts: 13
Liked: 2 times
Joined: Thu Jan 08, 2015 3:56 pm
Full Name: Neal

PreviousNext

Return to Veeam Backup & Replication



Who is online

Users browsing this forum: Cicadymn, dschuler, Wintermute, Yahoo [Bot] and 63 guests