Comprehensive data protection for all workloads
HaroldC
Novice
Posts: 5
Liked: never
Joined: Jun 21, 2011 6:05 pm
Contact:

Single VM failing

Post by HaroldC »

I've got a case with support but I thought I'd check here just in case someone has run into a similar issue.

I have one very large VM, well actually it has a second Virtual hard drive that is very large, sized at 1TB with 750GB used. I have a single job that only process's this one VM. It gets through the first Hard Disk and then fails after several hours processing the second hard disk, the large one with this error:

Code: Select all

6/19/2012 4:01:02 AM :: Error: Client error: Timed out to wait for free pre-read buffer.
Unable to retrieve next block transmission command. Number of already processed blocks: [439254].
I make small changes, like the backup proxy or repository and then retry the job only to have it fail after 16 hours of processing.

We have a 3 proxies that are Server 2003 machines, The veeam backup server and a separate backup proxy that is a Windows 2008 server.
Gostev
Chief Product Officer
Posts: 31815
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Single VM failing

Post by Gostev »

Please include your support case ID.
unnhem
Lurker
Posts: 1
Liked: never
Joined: Oct 18, 2011 10:49 am
Full Name: Niklas Unnhem
Contact:

Re: Single VM failing

Post by unnhem »

Please, if you find anything that fixes the error post here.
I've an ongoing, long running, support request about the same kind of error. (ID#5195744).
HaroldC
Novice
Posts: 5
Liked: never
Joined: Jun 21, 2011 6:05 pm
Contact:

Re: Single VM failing

Post by HaroldC »

My support case id is 5198986.

I'm really at a loss. All of my other VM's and backup jobs work fine. This one just doesn't.
Cokovic
Veteran
Posts: 295
Liked: 59 times
Joined: Sep 06, 2011 8:45 am
Full Name: Haris Cokovic
Contact:

Re: Single VM failing

Post by Cokovic »

I dont know if it could be of any help for you. But recently i had a major issue with one of our VMs that i couldn't get backed up. This VM has a total of 29 VMDKs with provisioned space in total of 13.5TB. Always at around Harddisk 17 or 18 i got a SAN transport error and failback to network mode failed too. I had a support case with Veeam and also with VMware open and we finally solved it. In the end it was a setting on the corresponding host cause of the big size of this VM. And we have too alot of 750GB harddisks within this VM. It' just a guess but try to increase the MaxHeapSize on your VMWare host where this failing VM is running on. Per default its set to 80MB and can be increased up to 256MB on ESXi 5. You will find it if you click your host in vSphere Client --> Configuration --> Advanced Configuration --> VMFS3. it's really just a guess but that did the trick for us. Since then my backups are running fine for this VM. After changing this value you have to reboot the host.
cmcc82
Novice
Posts: 5
Liked: never
Joined: Jan 02, 2012 2:51 pm
Full Name: Celia Cristaldo
Contact:

Re: Single VM failing

Post by cmcc82 »

Cokovic
If i decided to change this setting at Advanced Settings, I don't have to restart any esx right?
Thanks
Cokovic
Veteran
Posts: 295
Liked: 59 times
Joined: Sep 06, 2011 8:45 am
Full Name: Haris Cokovic
Contact:

Re: Single VM failing

Post by Cokovic »

No. You have to restart the ESX server where you change this value as it will only take effect after a reboot. If you have VMotion licensed this shouldn't be a problem.
jeremyh8
Enthusiast
Posts: 81
Liked: 11 times
Joined: Jun 17, 2012 1:28 am
Full Name: Jeremy Harrison
Contact:

Re: Single VM failing

Post by jeremyh8 »

did this resolve your issue?
pendragoncrw
Enthusiast
Posts: 38
Liked: 3 times
Joined: Jun 14, 2010 3:06 am
Full Name: C White
Contact:

Re: Single VM failing

Post by pendragoncrw »

I am experiencing the same problem as the original poster also with a VM that has some large drives: 40gb, 800gb, 500gb, 600gb. My other job with 29 VM's in it and a few larger ones runs without any problems. Support case is 00169169, but I understand I need to wait for Level 2 support now (response times were a little slow yesterday due to snow as I understand it).

We are running ESXi 5.0 U1 which according to VMWare should be able to have 8tb of open virtual disk on a single host with the default 80mb heap size. The total storage in use by the host holding our troublesome VM is a little of 4tb and there have been no problems with numerous normal and storage VMotions so I'm not convinced the heap size setting is really the culprit. Our other 2 ESXi hosts have on average 1tb of active VMFS storage.

Any ideas or updates while support chews on this?

Thanks,
Chris
foggy
Veeam Software
Posts: 21139
Liked: 2141 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Single VM failing

Post by foggy »

According to the OP's case, the error has gone after consolidating the VM.
pendragoncrw
Enthusiast
Posts: 38
Liked: 3 times
Joined: Jun 14, 2010 3:06 am
Full Name: C White
Contact:

Re: Single VM failing

Post by pendragoncrw »

Before I started any of the backup jobs, this VM did need a consolidate and it was performed. There were no active or orphaned snapshots for this VM prior to our first attempt to back it up. Would you still recommend trying a consolidate?

I reviewed the "Needs Consolidation" status in vCenter for all my VM's and they are all "No"

Thanks,
Chris
Vitaliy S.
VP, Product Management
Posts: 27377
Liked: 2800 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Single VM failing

Post by Vitaliy S. »

If there are no snapshots then there is nothing to consolidate. Looks like you have a slightly different issue.
pendragoncrw
Enthusiast
Posts: 38
Liked: 3 times
Joined: Jun 14, 2010 3:06 am
Full Name: C White
Contact:

Re: Single VM failing

Post by pendragoncrw »

Forcing transport mode to network and it looks like we are having better luck. If network mode works out, support has instructed me to try in VA mode with the virtual disk it keeps hanging on excluded. After that test, a possible clone/migration may be in the future.

Thanks for the help.

Chris
pendragoncrw
Enthusiast
Posts: 38
Liked: 3 times
Joined: Jun 14, 2010 3:06 am
Full Name: C White
Contact:

Re: Single VM failing

Post by pendragoncrw »

Just to update.

Tried running the backup with proxy forced to network mode. It made it through the disk it usually had problems at (disk 2), but slowed to a crawl on disk 3 (a few hundred megs every 45min to an hour). I'm waiting for the job to time-out and will then try cloning the VM and backing up the clone.

First time I've ever had a problem like this with a VM. Any thoughts are welcome.

Thanks,
Chris
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Single VM failing

Post by tsightler »

Can you tell me a little more details about the job setup? For example, are you making any changes to the job settings, for example, block size (Local, LAN, WAN). Is the repository SMB/CIFS? How much memory do you have on the proxy and repository. How big is the VBK when you start having performance problems?
pendragoncrw
Enthusiast
Posts: 38
Liked: 3 times
Joined: Jun 14, 2010 3:06 am
Full Name: C White
Contact:

Re: Single VM failing

Post by pendragoncrw »

Job 1: 30 VM's, about 2.5tb total storage compresses and de-dupes down to 600gb for a full to same repository, proxy, etc. and settings as below. This job runs flawlessly and includes our 1tb Exchange 2010 database server.
Job 2: Single problem VM
Job is setup as reverse incremental with 30 restore points retained.
Proxy is 8gig RAM VM with 4 vCPUs. It and the repository server hardly break a sweat.
Repository is Windows 2008 storage server with 6gb RAM, 8 cores, running Veeam Agent (not being used as NFS/CIFS/ISCSI device)
Compression is set to LAN, de-dupe left at default.

Doesn't seem to be any consistency on the failure point. In VA mode, it would die somewhere on the second disk (typically), once it died on the first disk, today in network mode it was on the 3rd disk. No other VM operations are affected. I'm also not sure why Veeam takes so long to fail out.

Making enough space to clone it tonight although I might re-size the disks inside the VM and use the VMWare Converter to get a fresh copy of it. This is one of our main production servers and it is hard to justify playing with it so much when it works fine for everything except being backed up by Veeam. This weekend will be my only chance to get significant downtime with it for quite a while so it's either fix it this weekend or put an alternative backup method in place.

Again, this is very atypical of my experience with Veeam (have it at about 20 individual clients), but it sure hurts when it happens. The only thing unique about this VM is that it had a rough history with snapshots before I came to it which required a few sessions of take a manual snapshot, delete all, shut down the VM, try again, etc. but it's clean now.
pendragoncrw
Enthusiast
Posts: 38
Liked: 3 times
Joined: Jun 14, 2010 3:06 am
Full Name: C White
Contact:

Re: Single VM failing

Post by pendragoncrw »

Job finally failed out around 11:45pm (6hr 24min) spent on the third disk. Same error "Timed out to wait for free pre-read buffer."

I guess if network mode did not change anything, it's time to look at the VM itself.

Chris
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Single VM failing

Post by tsightler »

pendragoncrw wrote:Proxy is 8gig RAM VM with 4 vCPUs. It and the repository server hardly break a sweat.
Repository is Windows 2008 storage server with 6gb RAM, 8 cores, running Veeam Agent (not being used as NFS/CIFS/ISCSI device)
Compression is set to LAN, de-dupe left at default.
OK, some good info there, specifically the part about storage optimization being set to LAN (512K Blocks) and server memory. I'd really like to know how big the VBK is after it crashes. I'm guessing it's going to be larger than 1TB. If so, you'll probably need to set the block size back to the storage optimization of Local (1MB) and try again.
pendragoncrw
Enthusiast
Posts: 38
Liked: 3 times
Joined: Jun 14, 2010 3:06 am
Full Name: C White
Contact:

Re: Single VM failing

Post by pendragoncrw »

The reported size of the VBK file in explorer is usually between 300gb and 400gb when it heads south, much smaller than the VBK generated by our working job when it runs a full (which has the same block size).
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Single VM failing

Post by tsightler »

OK, not as big as I expected. I might still try running the job with Local instead of LAN optimization just to eliminate the VeeamAgent memory consumption from being a possible issue.

I know you mentioned that you don't think the heap size is you likely issue, but it might be worth checking the stats by running this command from the ESXi console:

Code: Select all

memstats -r heap-stats | grep "\(vmfs\)\|\(size\)"
Never hurts to be sure. Do you happen to have any other place that you can use as a repository for a test, perhaps even using the same storage server as a CIFS share? Just to change some things up. BTW, what do the realtime bottleneck stats show while the job is running? Do you have a support case opened?
pendragoncrw
Enthusiast
Posts: 38
Liked: 3 times
Joined: Jun 14, 2010 3:06 am
Full Name: C White
Contact:

Re: Single VM failing

Post by pendragoncrw »

I did check the heap stats using the command you listed and the counters were in spec according to the KB and forum articles from VMWare. I kept an eye on it when the job started, when it slowed down, and at the end. The counter numbers were more or less similar to the good job when it was backing up our 1tb Exchange 2010 database server.

On the bad job, the real-time bottleneck stats always show "source" as the bottleneck (between 65 and 85% of the time).
On the good job, the real-time bottleneck stats always show "target" as the bottleneck (between 85-95% of the time) with source coming in second around 45%.

I have a Synology device I can use to test with a different repository, just got to clear some space on it. Unfortunately, with a VM this big, testing an individual variable is a time consuming process.

Hopefully I hear back from support today (case was opened on Wednesday morning) since I added last night's logs to it.

Chris
pendragoncrw
Enthusiast
Posts: 38
Liked: 3 times
Joined: Jun 14, 2010 3:06 am
Full Name: C White
Contact:

Re: Single VM failing

Post by pendragoncrw »

Based on support's recommendation, I cloned the VM and tried to backup the clone while it was turned off. Same behavior and I can clearly see the drop-off in SAN activity when reads drop to almost nil. I'm going to VMotion it to a different host and try again.

The production VM works perfectly fine (as do all other VM's on that host) so I am very perplexed with what is going on.

Chris
mnaveedishtiaq
Lurker
Posts: 1
Liked: never
Joined: Jul 18, 2012 8:49 am
Full Name: Muhammad Naveed
Contact:

Re: Single VM failing

Post by mnaveedishtiaq »

Hi All,

was experiencing similar sort of error with one of theVM. Tried may options and atlast resolved the issue with following work around after 1 month effort.

Create a local backup of VM.
copy the same to secondary site.
Seed the backup with Production site.

please try the same, maybe this would work in your case as well.

Regards,

Muhammad Naveed
Mopad
Novice
Posts: 4
Liked: never
Joined: Apr 29, 2010 6:56 pm
Full Name: Benjamin
Contact:

Re: Single VM failing

Post by Mopad »

I had and am having the same problem. I have two backup jobs. One points to a local backup repository, and the other to a offsite repository. The local job was the first to start having the same issues described in this thread. It kept failing on a certain vm. The vm has two vmdk's. 1 is 50GB and the other is 1.8TB. It always kept failing on the 1.8TB vmdk. The backupjob would run fine on the VM untill it hit the 70-72% mark. It would then lock up the repository server and then hang for hours (sometimes up to 40-60 hours). Running a full active backup would complete successfully. Any incremental after the full would fail. I was running WS2008R2 on both repositories. I also created a new backup job, and the new job would still fail on the same vm when a incremental was ran.

I finally blew away my local repository server and installed win7 32bit. My local onsite backups have working ever since re installing win7 on the local repository.

Now my offsite repository is having the same exact problem.

Whats really weird is the backup job with the offsite repository was failing before Christmas break (I work in a k-12 school). During break when no one was here the job completes successfully 11 days straight. The day everyone came back, the job starts failing again.
Vitaliy S.
VP, Product Management
Posts: 27377
Liked: 2800 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Single VM failing

Post by Vitaliy S. »

Hi Benjamin, can you please tell me what our support team says on that behavior? Have you logged a ticket?
Mopad
Novice
Posts: 4
Liked: never
Joined: Apr 29, 2010 6:56 pm
Full Name: Benjamin
Contact:

Re: Single VM failing

Post by Mopad »

Here is what support told me to do.

Let's check that job does not stuck on the user profiles. Open properties for the job "Offsite", click Next 3 times, click on Advanced button, highlight MAHELStaff, click on Edit, open Indexing tab, exclude the whole "C:", or just disable indexing.
Please let us know the results of job.

I doubt that will help any since the job is not failing on the vmdk that holds my c: drive.

Support case # 00168695. There won't be much activity since the job was completing successfully during christmas break. But I will start giving feeback on the recommendations from support.

Here is the support case for the first one I opened but is now closed. 00159629. Its closed cause I re installed the backup repository OS. I don't really feel like doing that to the offsite repository....
Mopad
Novice
Posts: 4
Liked: never
Joined: Apr 29, 2010 6:56 pm
Full Name: Benjamin
Contact:

Re: Single VM failing

Post by Mopad »

After applying supports suggestions, the job is still failing. Logs have been uploaded.
goldsmith
Influencer
Posts: 14
Liked: never
Joined: Oct 20, 2010 9:41 am
Full Name: Neil Whitehead

Re: Single VM failing

Post by goldsmith »

We are also experiencing this problem with one of our 2003 VMs (about 1TB total storage), Error: Client error: Timed out to wait for traffic control event.

Any suggestions are more than welcome.
Vitaliy S.
VP, Product Management
Posts: 27377
Liked: 2800 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Single VM failing

Post by Vitaliy S. »

The best way to troubleshoot this would be to open a support ticket with our technical team.
goldsmith
Influencer
Posts: 14
Liked: never
Joined: Oct 20, 2010 9:41 am
Full Name: Neil Whitehead

Re: Single VM failing

Post by goldsmith »

Might be worth looking at KB940349 from microsoft as it is an update for 2003 vss, I will try applying this patch tonight and let you know the result.

This is a replication job and it is rather large so it may take a day or 2 to post an update.
Post Reply

Who is online

Users browsing this forum: Google [Bot] and 88 guests