Veeam Case IDs: #07456899, 07456835, 07456820, 07454900, 07448663
Original Issue: Full VM Restore from an vendors offsite backup object storage repository failed
Discovered bigger issue: Total data integrity across us-central-1 region all buckets effected
Questions:
1. How can Veeam check data integrity of the backups in these repositories in its entirety
2. How can you get reports of these checks i.e Health Checks, VeeamOne, etc.
3. Will orphaned backups be checked
This highlights the importance of validating the integrity of all your backups. i.e. if you have backups in us-central-1 and did not get notified of this event like I did. Test restore your backups because no errors were given in Veeam as more and more backups were copied up to this repository.
Error Restore job failed Error: S3 error: We encountered an internal error. Please retry the operation again later.
Error Restore job failed Error: Code: InternalError
Error Restore job failed Error: Unable to retrieve next block transmission command. Number of already processed blocks: [17454].
Error Restore job failed Error: Failed to download disk ' '.
Error Restore job failed Error: Agent failed to process method {DataTransfer.SyncDisk}.
Kudos to the Object Storage Specialist that quickly identified the issue and where to go to validate the issue.
Log Error Details:
[08.10.2024 14:35:17.182] <01> Error (3) Failed to restore vm. Name: [RDS]
[08.10.2024 14:35:17.182] <01> Error (3) S3 error: We encountered an internal error. Please retry the operation again later. (Veeam.Backup.Common.CCppComponentException)
[08.10.2024 14:35:17.182] <01> Error (3) Code: InternalError (Veeam.Backup.Common.CCppComponentException)
[08.10.2024 14:35:17.182] <01> Error (3) in c++: Request ID: B016901F2450827E:A Other: Detail: 'unexpected status: 500 Internal Server Error', HostId: 'exnMRe/FTC6YKOccUGgXb8z/FQJlzYuB4ooxF1rpY0tZ6EMWGmFPlDhVOdbWGyTjZq6xCL/5g3rF', CMReferenceId: 'MTcyODQyMzMwODUxMCAzOC45MS40Mi4xMDkgQ29uSUQ6OTM3NDgzMTcwL0VuZ2luZUNvbklEOjg3MjkyODEvQ29yZToyMQ=='
[08.10.2024 14:35:17.182] <01> Error (3) in c++: CS3VersioningUtils::DownloadFileVersionAsync async task has failed, path [/Veeam/Backup/Veeam/Clients/{2cef1f7c-eaf1-4ee9-a1d4-c432e0ca971d}/9472af2b-8d69-449c-8fa1-823128ebf308/CloudStg/Data/{52b1b0ee-90c1-407f-bd5f-c108b3d7ecc2}/{18fdc2e5-499a-4c6e-9d41-6564bfa44aea}/21657_bf84382f27588807d7bb35959bd84b5c_c2852082b01b94dc076d3b02a56173d8], version [001697073994195151438-wS0kHUouLS], offset [0], length [0]
[08.10.2024 14:35:17.182] <01> Error (3) in c++: Failed to request content for cloud data block
[08.10.2024 14:35:17.182] <01> Error (3) in c++: Data area could not be serialized
Actually, I do not think that the "Please retry the operations again later" will yield a favorable result.
I say the above because I noticed the Wasabi Region is us-central-1, and it is known that Wasabi had some data issues in the region going back 3 or 4 months.
Anyway, it is best to reach out to Wasabi Support and extend to them the following items for further investigation; as to why Veeam is unable to serialized/enumerate the data blocks:
Request ID: B016901F2450827E:A
CMReferenceId: 'MTcyODQyMzMwODUxMCAzOC45MS40Mi4xMDkgQ29uSUQ6OTM3NDgzMTcwL0VuZ2luZUNvbklEOjg3MjkyODEvQ29yZToyMQ=='
Verification:
Venders Responces:
Response (3rd Response)
Thank you for contacting Vendors Name Support. I had a look at the 500 level HTTP responses on GET requests for bucket ' ' for this month and was able to determine that the errors are due to an issue on one of our storage subsystems in the us-central-1 region. I am including a summary of the issue as well as the update that was sent out yesterday which includes lists of the affected objects.
The purpose of this update is to provide details on a data integrity incident that affected a small number of customers that were using Vendors Name us-central-1 (Plano, TX) storage region. This update provides the details on what happened, how Vendors Name will prevent it from happening again, and recommendations Vendors Namehas for the affected customers.
On 30 August 2024, in Vendors Name us-central-1 region, an incident occurred where an input-output module in a storage system became inoperable and prevented access to a number of disks that it served. Simultaneously, the system software managing the data storage disks improperly took multiple other disks offline. As a result of this behavior, a number of objects that were being served by this portion of disks within the region were impacted in a manner that makes these objects not recoverable at this time.
Vendors Namehas made the appropriate adjustments in the Vendors Namesoftware that controls our hard disk management to prevent this problem from happening again. This problem has not occurred in any other Vendors Name storage region.
Recommendations
Although Vendors Name is working with Vendors Name file system engineers to repair the affected data, this work will not be completed before 23 September 2024 (and could possibly take longer). In addition, Vendors Name is not yet able to guarantee the repair will be successful. For this reason, Vendors Name is recommending that affected customers re-upload your data to Vendors Name using standard backup or upload procedures.
We apologize for the impact of this problem on your operations. Please let us know if you have any questions.
Hello,
Following up on our previous email we would like to inform you that all work to repair affected data in the us-central-1 region has been completed. Unfortunately, our team was unable to recover all impacted objects that were affected on 08/30/2024, and we have compiled several lists of objects for your team to review. These lists include:
1. Total impacted objects No information given
2. Total recovered objects No information given
3. Total non-recovered objects No information given
For any objects that are non-recoverable, we recommend to re-upload to your bucket(s) if possible. If these objects are part of a backup, it is recommended to run a FULL backup from your backup application so that your backup chain is not missing any important data. We greatly apologize for any inconvenience this may have caused your team, and if you have any questions, please reach back out via this case so that we can further assist.
Regards,
Response (4th Response)
I checked the buckets you have in the us-central-1 and, indeed, these are affected by the issue we saw in that region.
I have added your buckets to the internal investigation ticket.
Our engineering team is working on this issue around the clock as a top priority.
We apologize for the impact of this problem on your operations. Please let us know if you have any questions.
-
- Influencer
- Posts: 11
- Liked: 9 times
- Joined: Feb 16, 2021 7:42 pm
- Full Name: John Watson
- Contact:
-
- Veeam Software
- Posts: 2617
- Liked: 611 times
- Joined: Jun 28, 2016 12:12 pm
- Contact:
Re: Full VM Restore Restore job failed Error: S3 error: We encountered an internal error.
Hi John,
Very sorry to hear about this challenge with your S3 provider. I will cut right to the chase on your questions, and then a few comments.
2. The SureBackup job can be configured to send emails and SNMP notifications for the results. Similarly, you can use the Recovery Verification report to check the results of your SureBackup jobs from VeeamOne
3. You can use SureBackup like in 1, but you need to first take an extra step and create a "dummy" job that uses the repository your orphaned backups are on, then map the backup (step 3) and point it to the orphaned backups you see. Do not set a schedule for the job as we don't want it to run. I tested this quick and you don't need to map each VM 1:1, just set any VM in the backup job and move on to the mapping step. We will NOT be running the backup job, we just need it to exist so we can link it to the SureBackup job.
I see there are still on-going discussions with our Advanced Support regarding the situation -- let's see what they're able to offer regarding the situation before jumping on the SureBackup jobs, though based on your description I do need to set the stage that potentially not much can be done, but best to wait for Support's conclusion.
> Kudos to the Object Storage Specialist that quickly identified the issue and where to go to validate the issue.
I am very glad that Veeam Support's Object Storage team was able to help get you on the right track right away, and thank you for the kind words, I'll make sure this feedback gets to the specialist.
And thank you for sharing the details on this situation; very sorry to hear that it happened. Let's keep working with Support, as I see it there is a planned remote session with Advanced Support to review the situation and determine what options are available now.
Very sorry to hear about this challenge with your S3 provider. I will cut right to the chase on your questions, and then a few comments.
1. Use a SureBackup job and set Backup verification and content scan only for the Backup Verification Mode. This will check all the backups linked to a job.Questions:
1. How can Veeam check data integrity of the backups in these repositories in its entirety
2. How can you get reports of these checks i.e Health Checks, VeeamOne, etc.
3. Will orphaned backups be checked
2. The SureBackup job can be configured to send emails and SNMP notifications for the results. Similarly, you can use the Recovery Verification report to check the results of your SureBackup jobs from VeeamOne
3. You can use SureBackup like in 1, but you need to first take an extra step and create a "dummy" job that uses the repository your orphaned backups are on, then map the backup (step 3) and point it to the orphaned backups you see. Do not set a schedule for the job as we don't want it to run. I tested this quick and you don't need to map each VM 1:1, just set any VM in the backup job and move on to the mapping step. We will NOT be running the backup job, we just need it to exist so we can link it to the SureBackup job.
I see there are still on-going discussions with our Advanced Support regarding the situation -- let's see what they're able to offer regarding the situation before jumping on the SureBackup jobs, though based on your description I do need to set the stage that potentially not much can be done, but best to wait for Support's conclusion.
> Kudos to the Object Storage Specialist that quickly identified the issue and where to go to validate the issue.
I am very glad that Veeam Support's Object Storage team was able to help get you on the right track right away, and thank you for the kind words, I'll make sure this feedback gets to the specialist.
And thank you for sharing the details on this situation; very sorry to hear that it happened. Let's keep working with Support, as I see it there is a planned remote session with Advanced Support to review the situation and determine what options are available now.
David Domask | Product Management: Principal Analyst
-
- Service Provider
- Posts: 74
- Liked: 9 times
- Joined: Jul 22, 2014 3:25 pm
- Full Name: Nick Lynn
- Contact:
Re: Full VM Restore Restore job failed Error: S3 error: We encountered an internal error.
Thank you for posting about this as it helped me solve the issue! The only reason I caught that there was an issue was because my health checks were failing for the copy jobs.
We opened a Veeam case #03450920 but the tech did not get very far with the logs and wanted a fresh set but because the Health Check runs once a month the case was closed.
Also shame on Wasabi for not doing a better job notifying their customers about this issue!
We opened a Veeam case #03450920 but the tech did not get very far with the logs and wanted a fresh set but because the Health Check runs once a month the case was closed.
Also shame on Wasabi for not doing a better job notifying their customers about this issue!
Who is online
Users browsing this forum: No registered users and 14 guests