DR Testing from AWS Object Storage, failing to restore SQL DBs.

ConradGoodman · Post by **ConradGoodman** » May 06, 2021 8:44 am this post

Case Opened: #04794533

We are currently doing DR testing.

Scenario: restoring application items (All SQL DBs from a source Windows Agent on physical host) to a new physical Windows Agent host.

If I perform this restore from the performance tier, all 7TB of the DBs restore without any issues.

If I perform the same restore by putting the performance tier into maintenance mode and forcing to restore from capacity tier, I get the same failure but at differing points throughout the restore process.

The failure can occurr in restoring one database, or another. 1:30m into the job or 8h into the job.

The consistency is that the target server for the restore logs out a bunch of warnings and errors to the system log, and then the job fails wherever it is, and hard fails the rest of the DBs.

All events are in the system log, an example of each of them (140,137,50)

Code: Select all

Log Name:      System
Source:        Microsoft-Windows-Ntfs
Date:          05/05/2021 22:34:28
Event ID:      140
Task Category: None
Level:         Warning
Keywords:      (8)
User:          SYSTEM
Computer:     x
Description:
The system failed to flush data to the transaction log. Corruption may occur in VolumeId: C:\VeeamFLR\Ux_e6c6f41a\Volume0, DeviceName: \Device\HarddiskVdkVolume33.
({Drive Not Ready}
The drive is not ready for use; its door may be open. Please check drive %hs and make sure that a disk is inserted and that the drive door is closed.)
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Ntfs" Guid="{3ff37a1c-a68d-4d6e-8c9b-f79e8b16c482}" />
    <EventID>140</EventID>
    <Version>0</Version>
    <Level>3</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000008</Keywords>
    <TimeCreated SystemTime="2021-05-05T22:34:28.008137000Z" />
    <EventRecordID>14777</EventRecordID>
    <Correlation />
    <Execution ProcessID="11832" ThreadID="9600" />
    <Channel>System</Channel>
    <Computer>x</Computer>
    <Security UserID="S-1-5-18" />
  </System>
  <EventData>
    <Data Name="VolumeId">C:\VeeamFLR\x_e6c6f41a\Volume0</Data>
    <Data Name="DeviceName">\Device\HarddiskVdkVolume33</Data>
    <Data Name="Error">0xc00000a3</Data>
  </EventData>
</Event>

Code: Select all

Log Name:      System
Source:        Ntfs
Date:          05/05/2021 22:34:28
Event ID:      137
Task Category: (2)
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      x
Description:
The default transaction resource manager on volume C:\VeeamFLR\x_e6c6f41a\Volume0 encountered a non-retryable error and could not start.  The data contains the error code.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Ntfs" />
    <EventID Qualifiers="49156">137</EventID>
    <Level>2</Level>
    <Task>2</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2021-05-05T22:34:28.009429900Z" />
    <EventRecordID>14778</EventRecordID>
    <Channel>System</Channel>
    <Computer>x</Computer>
    <Security />
  </System>
  <EventData>
    <Data>
    </Data>
    <Data>C:\VeeamFLR\x_e6c6f41a\Volume0</Data>
    <Binary>1C0004000200300002000000890004C000000000A30000C000000000000000000000000000000000A30000C0</Binary>
  </EventData>
</Event>

Code: Select all

Log Name:      System
Source:        Ntfs
Date:          05/05/2021 22:34:50
Event ID:      50
Task Category: None
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      x
Description:
{Delayed Write Failed} Windows was unable to save all the data for the file . The data has been lost. This error may be caused by a failure of your computer hardware or network connection. Please try to save this file elsewhere.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Ntfs" />
    <EventID Qualifiers="32772">50</EventID>
    <Level>3</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2021-05-05T22:34:50.167607900Z" />
    <EventRecordID>14863</EventRecordID>
    <Channel>System</Channel>
    <Computer>x</Computer>
    <Security />
  </System>
  <EventData>
    <Data>
    </Data>
    <Data>
    </Data>
    <Binary>04000400020030000000000032000480000000006E0200C0000000000000000000000000000000006E0200C0</Binary>
  </EventData>
</Event>

The pattern is roughly 12 x Event ID 140, 1 x Event ID 137, repeating for 2 mins and then 12 x Event ID 50, at which point the job fails.

Can anyone shine any light on this?

Post by **HannesK** » May 06, 2021 9:19 am this post

Hello,

Can anyone shine any light on this?

yes, support

Please remember that the forums are mainly run by product management and posting a case on the forum a few minutes after creating it doesn't really change anything. Support has to figure it out with the logfiles.

Guessing makes little sense here.

Best regards,
Hannes

ConradGoodman · Post by **ConradGoodman** » May 06, 2021 9:50 am this post

Fair enough, occasionally an issue is wider known and other members have helped me by sharing their experiences.

Cheers

Conrad

ConradGoodman · Post by **ConradGoodman** » May 14, 2021 8:00 am this post

Guys

This has been going on now for 12 days. I have had countless to and fro via email with a level 1 support engineer, and asked multiple times to escalate this, as currently we cannot prove that we can restore our DBs from AWS Capacity tier.

What I am trying to achieve is simple - restore the bulk of our SQL data from capacity tier, while performance tier is in maintenance mode.

I have this morning received the following:

"On reviewing the capacity tier restore scenarios, it seems that we can't have all performance extents in Maintenance mode:
https://helpcenter.veeam.com/docs/backu ... ml?ver=110
This DR test restore scenario doesn't seem to be covered by the user guide and unsupported by Veeam.

Also here ( https://helpcenter.veeam.com/docs/backu ... ml?ver=110 ) we find:
In particular, you can promptly restore data from the capacity tier in case of disaster without creating a scale-out backup repository anew. For more information about this feature, see Importing Object Storage Backups.

So, a proper way to test this would be to have a fresh Veeam installation (or have current Veeam connect to a fresh database to simulate a fresh install) and import the Object Storage and restore:
https://helpcenter.veeam.com/docs/backu ... ml?ver=110"

Nowhere within those links does it state that I can't do what I'm trying to do, indeed it was a suggestion from a veeam employee that this is how to force a restore from AWS.

Given we've been trying to do this for nearly 2 weeks, it feels unacceptable to come up with such a comment so far down the line.

Secondly....The first link.

"If the entire scale-out backup repository becomes unavailable, Veeam Backup & Replication restores data from the capacity tier only.
For example, both performance extents that store required backup files to restore a virtual machine are not available. In such a scenario, Veeam Backup & Replication restores data from the capacity tier only."

The quote above suggests that if all performance extents are not available (in this case, we only have 1, and it is in maintenance mode)....VBR restores from capacity tier only.

I need this case escalated. We simply cannot have a situation where our DR backups cannot be tested or restored from like this!

Post by **HannesK** » May 14, 2021 9:32 am this post

Hello,
I need to check where the "The performance or capacity extents must not be in Maintenance mode." comes from. Because that always worked fine for me in the past.

Escalations can be done via the "Talk to manager" button. But I'm afraid that would probably also end again at the "The performance or capacity extents must not be in Maintenance mode."

I will check about the reasons and come back on this (that can take some time).

Best regards,
Hannes

ConradGoodman · Post by **ConradGoodman** » May 14, 2021 9:35 am this post

Thanks Hannes.

Secondly, we did try and download the backup file from capacity tier to the performance tier, but this only download 500GB from AWS, it sythesized the rest locally, so not a fair test that our AWS backups are OK.

However which way, we need to ensure that what we have in AWS is restorable, without building a new Veeam server!

Post by **HannesK** » May 14, 2021 9:39 am this post

A new Veeam server is for free from Veeam side. So that's a valid option from my point of view. But I'm still curious where the limitation comes from.

ConradGoodman · May 14, 2021 10:14 am

Free from your point of view, but given we need a spare SQL server to restore our DBs to, and now potentially an unbudgeted spare veeam server to download backups onto... It's far from free for us.

soncscy · Post by **soncscy** » May 16, 2021 8:12 pm this post

Why not just use the same server then?

As I get it, you can create a second DB on the same server (call it "Veeam_Test" or something), point Veeam at it using their utility: https://helpcenter.veeam.com/docs/backu ... ml?ver=110

Add in just some local storage and the bucket you've offloaded to into a scale out repo, and perform your test.

Yes, you need to stop services for a bit, but it would at least validate it, no?

ConradGoodman · Post by **ConradGoodman** » May 17, 2021 7:28 am this post

Because that server is constantly backing up our datacenter SQL logs.

soncscy · Post by **soncscy** » May 17, 2021 8:13 am this post

Mmm, but doesn't Maintenance mode do the same interruption? Or the SQL backups target a different repository?

ConradGoodman · Post by **ConradGoodman** » May 17, 2021 8:20 am this post

The latter. We have a standard repository for local backups, and copy backups from there to the SOBR that get offloaded. T-logs don't get offloaded.

ConradGoodman · Post by **ConradGoodman** » May 17, 2021 2:04 pm this post

FYI. Has been confirmed by the L2 engineer that no such limitation exists, I can restore the backup from the capacity tier directly.

Post by **HannesK** » May 20, 2021 9:06 am this post

looks like you are making progress with the issue. I also cannot remember that limitation, but it will take some time to verify. I will come back once I have more answers.

ConradGoodman · Post by **ConradGoodman** » May 20, 2021 9:52 am this post

Thanks Hannes, guess you are following the ticket.

Happy to report that the latest regkey solved the issue, and even close to doubled our restore ingress speed from AWS to the Veeam server.

Have a few queries open on the ticket still, but glad to get there in the end, even if it took an undesirable amount of time and attempts to get there!

Post by **HannesK** » Jun 22, 2021 3:00 pm this post

sorry for the delay.

the documentation is correct as the "The performance or capacity extents must not be in Maintenance mode." only refers to

1. Rescan a scale-out backup repository.
For more information on how to rescan a scale-out backup repository, see Rescanning Scale-Out Repositories.

2. Copy data from the capacity tier to the performance tier.
For more information on how to copy data, see Copying to Performance Tier.

The support engineer just misunderstood it and your scenario is covered in the section below

A performance extent in a scale-out backup repository may become unavailable or be in maintenance mode. To restore data in such case, you can use any method described in Data Recovery.

ConradGoodman · Post by **ConradGoodman** » Jun 24, 2021 9:04 am this post

Thanks for the clarification.

R&D Forums

DR Testing from AWS Object Storage, failing to restore SQL DBs.

Re: DR Testing from AWS Object Storage, failing to restore SQL DBs.

Re: DR Testing from AWS Object Storage, failing to restore SQL DBs.

Re: DR Testing from AWS Object Storage, failing to restore SQL DBs.

Re: DR Testing from AWS Object Storage, failing to restore SQL DBs.

Re: DR Testing from AWS Object Storage, failing to restore SQL DBs.

Re: DR Testing from AWS Object Storage, failing to restore SQL DBs.

Re: DR Testing from AWS Object Storage, failing to restore SQL DBs.

Re: DR Testing from AWS Object Storage, failing to restore SQL DBs.

Re: DR Testing from AWS Object Storage, failing to restore SQL DBs.

Re: DR Testing from AWS Object Storage, failing to restore SQL DBs.

Re: DR Testing from AWS Object Storage, failing to restore SQL DBs.

Re: DR Testing from AWS Object Storage, failing to restore SQL DBs.

Re: DR Testing from AWS Object Storage, failing to restore SQL DBs.

Re: DR Testing from AWS Object Storage, failing to restore SQL DBs.

Re: DR Testing from AWS Object Storage, failing to restore SQL DBs.

Re: DR Testing from AWS Object Storage, failing to restore SQL DBs.

Who is online