V6 replica digests errors?

danieln · Post by **danieln** » Dec 02, 2011 6:05 pm this post

Hi,

I've reseeded twice the VMs for DR. Now, after making some changes, throws up some errors and I am not really prepared to reseed the whole stuff all over again.
Anyone has an idea what is happening?

The replication goes through as normal up to the end when I receive "Failed to process [saveDiskDigestsCli]".

Any help please... I was just ready to re deploy the DR, now it looks I will have to re seed.

Thank you,
Daniel.

Post by **tsightler** » Dec 02, 2011 6:47 pm this post

I would highly suggest opening a support case to investigate, however, with V6, even in the absolute worst case (seed data corrupted) you can always simply create a new job and map to the existing replica on the remote target. In other words, as long as the remote replica still exist in some form, it can always be re-used as a seed.

danieln · Post by **danieln** » Dec 02, 2011 7:07 pm this post

Hi,

I just did that, I do not hope a quick answer as V6 probably caused a lot of tickets to be opened.

True I do not have to reseed, but the Digest re-creation will take forever, maybe as much as a new reseed. The DR environment is currently on site.
I had this issue after V5 to V6 upgrade, the replicas were as sluggish as a new seed would be as. It was doing nothing, just scanning the destination for hours.

I will give it a try anyways...

Thank you,
Daniel.

danieln · Post by **danieln** » Dec 02, 2011 7:22 pm this post

Wow... that is interesting. I will loose lots (actually ALL since is a fairly recent replica and the initial VMDK is probably zero) of the replicas if I try to recreate the job and use existing replicas.

"12/2/2011 2:20:19 PM :: VM disk size was changed since last sync, deleting all restore points"
"12/2/2011 2:20:19 PM :: Deleting all restore points"

So back to re seeding ...

Post by **tsightler** » Dec 02, 2011 7:35 pm this post

I've generally found that calculating digests is still somewhat faster than a full replication pass, even when the target is local. Calculating full digests places much less CPU load on the proxies, and is a read-only operation so you can run many of the tasks simultaneously. But your right, it might take just as long.\

You mention in your first post "after making some changes" is when the problem occurred. It might help to understand what those changes were.

Post by **tsightler** » Dec 02, 2011 7:41 pm this post

danieln wrote:Wow... that is interesting. I will loose lots (actually ALL since is a fairly recent replica and the initial VMDK is probably zero) of the replicas if I try to recreate the job and use existing replicas.

"12/2/2011 2:20:19 PM :: VM disk size was changed since last sync, deleting all restore points"
"12/2/2011 2:20:19 PM :: Deleting all restore points"

Deleting the restore points will still use the existing VM's as replicas and will not lose anything. How can the initial VMDK be "zero"? That's what seeding is, getting an initial baseline VMDK. Deleting the restore points will simply merge the data from those restore points into the VMDK prior to resizing the replicas disks.

But my question is, did you actually make changes the size of your source disks since you performed the seed? If not, them I'd be concerned that you may be doing something procedurally incorrect.

danieln · Post by **danieln** » Dec 02, 2011 8:00 pm this post

The changes were regarding dedupe and compression settings to make them more suitable for WAN, plus choosing the proper proxy server. Cannot recall anything else.
I think initial VMDK is zero as there were not enough restore points to roll over into the original VMDK, which, by Veeam documentation, is zero initially at first seed. Did not read the user guide thoroughly, I must admit.

Anyways, with the disks after veeam removed ALL snapshots, I did a brand new snapshot and attempt to power on the replica straight from ESX.
VMWare complains adn does not power it on:
Reason: The parent virtual disk has been modified since the child was created.
Cannot open the disk '/vmfs/volumes/d28a33a7-6ee4c8a9/<SERVER NAME>/<SERVER NAME>-000003.vmdk' or one of the snapshot disks it depends on.

So this time I clearly have to reseed.

So clearly I lost my replica...

Post by **tsightler** » Dec 02, 2011 8:13 pm this post

If the documentation says this, well, I'd have to think it is wrong. The initial seed fills the VMDK with data. Further replications create snapshots and then rolls these snapshots into the base as they are removed. Very simple.

Anyway, my concern is, why, after making such minor changes, was the system so confused that it was thinking you changed the sizes of the disks? My concern was that you might be attempting to replicate to ESX hosts direclty, rather than using vCenter, and then moved servers around or otherwise made changes to the configuration that would cause Veeam to loose track of which source VM points to which target VM.

danieln · Post by **danieln** » Dec 03, 2011 12:34 am this post

Hi,

Well... with the documentation... I read it briefly trying to find pointers what went wrong today... so I may be wrong.
There have been no changes, the replica is from vcenter (3 ESX/ESXi 4.1) to an standalone ESX4.1 (not i) as target, as it was in V5 I've upgraded. No VMotions or editing the guests during this time. Clearly not all 15 machines, 2TB, it happened with.

At this point, since the DR environment is local, reseeding is the faster option to get back in sync. The disk digests (whatever that one means) take too long, almost as long as full replica so I see no point in trying to reuse the current replicas and risk to still have them bad anyways. It may prove worth doing if DR is offsite and bandwidth is limited... Yes ,it works in my case by removing last 1-2 snapshots and re syncing from there in a new job. Nevertheless, even normal sized disks take almost 1 hour to 'Digests'.

What I am afraid is: I may deploy this to the DR site and be in the same spot days from now, and this time the DR is far away with limited bandwidth.
I hope Veeam support will figure out what I/VeeamB&R did wrong from the logs, if they will ever come back to me.

I am tired... 3 re-seeding in the past week for various reasons, some not veeam related. This is going to be the forth one. I hope it is my lucky number.

I should have stick to the rule of thumb: never upgrade to a major version until at least first service pack appears. This is my advice for all the veeam-ers out here. Wait out a little bit longer if you can and let others smash their heads against the wall. I was eager to upgrade since V5 was almost useless for replica for DR purposes, now I have blood on my forehead.

Thank you,
Daniel.

Post by **Gostev** » Dec 03, 2011 10:18 am this post

@Daniel, currently there are no known issues are around new v6 replication. I am getting great feedback from people who had already deployed it here and on Twitter, besides internally in Veeam we have been running it for months. So I suggest that you contact support for troubleshooting since your issue could very well be environmental, or having the product incorrectly deployed. For example, long digest calculation may be resulted in backup proxy data access performance.

That said, I will not argue that your rule of thumb is generally good for any software, and I personally always follow it

Post by **tsightler** » Dec 03, 2011 3:05 pm this post

Also, he is replicating to standalone ESX, that means that any event that would cause the VM-ids to change would cause his replication to work incorrectly. The reason I'm concerned about this is because of the message he posted earlier:

"12/2/2011 2:20:19 PM :: VM disk size was changed since last sync, deleting all restore points"
"12/2/2011 2:20:19 PM :: Deleting all restore points"

He mentioned that he didn't change the size of any source VMs, which makes me think that somehow Veeam was not mapping the same source VMs to the same target VMs. I certainly don't know if that's what actually happened, but looking at the symptoms makes me think that the VM-ids on the ESX host changed for some reason.

danieln · Post by **danieln** » Dec 03, 2011 6:30 pm this post

Thank you for your help.

I am concerned with this happening again when my DR will be re-deployed offsite. I hope Veeam support may guide me towards what the issue was.

I wonder: what 'digests' is? Is it a read of the destination replica to compare against source? Is it only destination SAN responsible for the speed or both source and destination?
I know my storage is not the fastest and I use network mode only as I have had issues (mostly because of my impatience) when using hot add... but what is kind of normal digests speeds people in here have experienced?
In my case 80 GB disk, less than half actually used, results in around 30 to 40 minutes digests. A large disk 500GB + took 7 + hours to digests and reuse the seed while a full re-seed from scratch would have been 4 hours or so(I am currently on LAN speeds).

Regarding now know issues on v6: it hit me hard with incremental backups running out of space after upgrade. So I was very concerned this week: I had to erase all of backups and replicas, so for 2 days my environment was totally unprotected. Probably the issue was in some docs Veeam has and since I did not RTFM prior, it is my fault ...
Everyone knows backups and replicas are a waste of time and resources.
Restores are important. Luckily, during this time, I did not need any restores.

I am a little frustrated of going on in circles with my DR but I have to admit V6 it seems faster. Just today a full backup of a large disk took 5 hours while prior was 8 hours.
Replicas: I am not sure, I will see when it will be over WAN but I do expect improvements in there too.
I like the idea of proxies so the load can be spread on any idle machines.
I like as well the way replicas are made now using ESX snapshots. I feel confident DR is going to work if ever needed while on v5 I was not so sure...
@Anton: you are right, not all is bad. There are plenty of things V6 has to brag about.

danieln · Post by **danieln** » Dec 03, 2011 8:07 pm this post

Sorry to bother again.
Anyone experience this digests issue while reusing older replicas?

I've ran a little experiment: one linux box powered down. A replica was present prior. Tried to create a new replica job using the existent replica as a seed.
It looks to me like the digests does not do anything other than wasting time, at least in my setup. Basically the digest went through but the whole disk has been read all over again anyways. This is a powered off guest replica. It should have transferred 0, not 5 GB.
Here is a copy/paste of the screen:
12/3/2011 2:32:09 PM :: Queued for processing at 12/3/2011 2:32:09 PM
12/3/2011 2:32:09 PM :: Required resources have been assigned
12/3/2011 2:32:12 PM :: VM processing started at 12/3/2011 2:32:11 PM
12/3/2011 2:32:12 PM :: VM size: 20.0 GB (5.0 GB used)
12/3/2011 2:32:16 PM :: Using source proxy replicadb01 [nbd]
12/3/2011 2:32:18 PM :: Using target proxy drsql01 [nbd]
12/3/2011 2:32:20 PM :: Discovering replica VM
12/3/2011 2:32:31 PM :: Preparing replica VM
12/3/2011 2:32:41 PM :: Creating snapshot
12/3/2011 2:32:48 PM :: Processing configuration
12/3/2011 2:33:06 PM :: Creating helper snapshot
12/3/2011 2:33:10 PM :: Calculating digests Hard disk 1 (20.0 GB)
12/3/2011 2:41:11 PM :: Hard Disk 1 (20.0 GB) => here it stated it read 5 GB, exactly the active data on disk.
12/3/2011 2:43:27 PM :: Deleting helper snapshot
12/3/2011 2:43:37 PM :: Removing snapshot
12/3/2011 2:43:40 PM :: Finalizing
12/3/2011 2:43:40 PM :: Applying retention policy
12/3/2011 2:44:47 PM :: Busy: Source 99% > Proxy 52% > Network 0% > Target 0%
12/3/2011 2:44:47 PM :: Primary bottleneck: Source
12/3/2011 2:44:47 PM :: Processing finished at 12/3/2011 2:44:47 PM

Reseeding the replica of the same linux box all over again from zero was faster (5 minutes versus 13):
12/3/2011 2:56:18 PM :: Queued for processing at 12/3/2011 2:56:18 PM
12/3/2011 2:56:18 PM :: Required resources have been assigned
12/3/2011 2:56:21 PM :: VM processing started at 12/3/2011 2:56:20 PM
12/3/2011 2:56:21 PM :: VM size: 20.0 GB (5.0 GB used)
12/3/2011 2:56:25 PM :: Using source proxy replicadb01 [nbd]
12/3/2011 2:56:27 PM :: Using target proxy drsql01 [nbd]
12/3/2011 2:56:27 PM :: Discovering replica VM
12/3/2011 2:56:34 PM :: Creating snapshot
12/3/2011 2:56:40 PM :: Processing configuration
12/3/2011 2:57:03 PM :: Creating helper snapshot
12/3/2011 2:57:08 PM :: Hard Disk 1 (20.0 GB) -> same amount read, 5 GB
12/3/2011 2:59:35 PM :: Deleting helper snapshot
12/3/2011 2:59:46 PM :: Removing snapshot
12/3/2011 2:59:50 PM :: Finalizing
12/3/2011 2:59:50 PM :: Applying retention policy
12/3/2011 3:00:59 PM :: Busy: Source 96% > Proxy 57% > Network 4% > Target 45%
12/3/2011 3:00:59 PM :: Primary bottleneck: Source
12/3/2011 3:00:59 PM :: Processing finished at 12/3/2011 3:00:59 PM

Anyone has any pointers what I may be doing wrong here?

Thank you.

Post by **Gostev** » Dec 03, 2011 8:22 pm this post

Just what I said above:

Gostev wrote:long digest calculation may be resulted in backup proxy data access performance

For some reason, your DR site's host has very poor management interface performance (few times worse than production host). As you can see from the bottleneck analysis numbers, the DR site's backup proxy sits spends all of it time for requested data to be returned by host. Network mode performance issues are quite common with VMware - could be host version, host hardware, or networking issue.

We always recommend using virtual proxy servers (capable of hot add) for best performance. Especially if you are planning to replicate over LAN (in which case, you should not bother with seeding in the first place). However, if your intent is to eventually replicate over WAN, then VM data access won't matter much for backup proxy servers, as overall job performance will still be primarily limited by your WAN bandwidth.

Thanks.

danieln · Post by **danieln** » Dec 03, 2011 9:55 pm this post

Well, I can understand that the linux put together SAN may be slow on the DR. Nevertheless, I see no explanation why the digests does nothing other than wasting time. To be honest I do not know what digests is.
With or without digest, the same amount of data (I mean ALL disk content) is being read from the source disk, the actual sync/read from source is identical in time spent: 2 minutes 30 seconds approximate. I would have expected to use CBT and read only changes.
Over the wire it seems the data transfer is lower so it may help when over the WAN re-re-seeding occurs. Actually I will be trying the same experiment over the WAN when DR will be deployed.

Sorry, Hot add has caused me a few times to have vmdks and snapshots left opened and locked and growing. Quite a headache it has been a few rare occasions requiring me to reboot ESX servers as I could not find a way to unlock the files. I know there is a howto in this forum about this but I couldn't find a way to identify what process was locking the files.
Plus the difference in speed was not impressive in my case. My production SANs are probably not the fastest around.

Regarding my original concern, why my replica & Digests went wrong: Veeam support just replied. It looks that by changing the block size (which I guess Storage Optimizations->WAN target means) I would be required to erase all the digests files so it can be recreated with the new block size. Maybe it is another case of RTFM not done by me.

Thank you!

Post by **Gostev** » Dec 03, 2011 11:18 pm this post

danieln wrote:Nevertheless, I see no explanation why the digests does nothing other than wasting time. To be honest I do not know what digests is.

Oh, I did not realize you don't know what the digest calculation process is. Digest is essentially table with hashes of all virtual disk blocks. To create this, the proxy need to read the whole disk. Digest is then sent over to the source site to compare replica VM disks with the source VM disks, and make sure the seeded VM disks are exactly identical and not corrupted. As soon as replica disks are validated, CBT kicks in and the following runs are incremental only.

danieln wrote:Hot add has caused me a few times to have vmdks and snapshots left opened and locked and growing. Quite a headache it has been a few rare occasions requiring me to reboot ESX servers as I could not find a way to unlock the files. I know there is a howto in this forum about this but I couldn't find a way to identify what process was locking the files.

This was in early days of hot add, currently we have the code in place that prevents this from happening. Anyway, using hot add would definitely speed up the data retrieval, and thus all the operations including digests calculation.

danieln · Post by **danieln** » Dec 10, 2011 4:00 am this post

Hi,

I've done the test: re-seeding the a small machine. Well: it helps. Unfortunately my disk speeds keeps it still slow.
Anyways: re-seeding the same small machine results in same speeds over LAN or over WAN if changes are close to 0.

I have a question though: is there any particular reason on re-seeding, the hashes on local and remote are not run in parallel.

As I see it now:
- remote proxy reads remote seed (entirely), compute the hash = quite some time
- after this, local proxy reads source (entirely) and computes the hash = quite some time.
- hashes are probably compared somewhere after by either source proxy or veeam server and therefore only changed block are sent over WAN = tiny amounts, depending on changes.

Wondering why the step 1 and 2 does not go in parallel. In my case, large machines, would take 4 hours + 4 hours + peanuts. It could take 4 hours + peanuts. Maybe in SP1 this parallel processing can be implemented. So far the replication seems to be a linear path: Step 1 to 2 to 3 ... and therefore it wastes time.

Other than that: replica seems way faster than in V5: exchange use to take 7 hours +. Now is done in 3 hours.

All the best. Daniel.

R&D Forums

V6 replica digests errors?

Re: V6 replica digests errors?

Re: V6 replica digests errors?

Re: V6 replica digests errors?

Re: V6 replica digests errors?

Re: V6 replica digests errors?

Re: V6 replica digests errors?

Re: V6 replica digests errors?

Re: V6 replica digests errors?

Re: V6 replica digests errors?

Re: V6 replica digests errors?

Re: V6 replica digests errors?

Re: V6 replica digests errors?

Re: V6 replica digests errors?

Re: V6 replica digests errors?

Re: V6 replica digests errors?

Re: V6 replica digests errors?

Who is online