All instances of the storage metadata are corrupted

jgiacone · Post by **jgiacone** » Aug 31, 2016 2:29 pm this post

****
[error] All instances of the storage metadata are corrupted.
****

Has anyone else ran into this error during incremental backups? We have several Ubuntu workstations that we are testing the Veeam Agent for Linux on, and this error happens on only one of them. We are taking a full backup of the machine. The first backup runs fine. However, every subsequent backup, returns the above error after the volume snapshot is created, and the backup fails. If we created a new job, everything works fine, until the next backup, then the same error pops up. An interesting note, if we do a file level backup (instead of the entire system), and select just certain folders, this does not occur.

Another thing to note, the full backup that actually succeeds, cannot be restored to another machine. The machine reports the same metadata error during the restore. I've dug around, and can't find much about this, other than a Veeam KB article explaining what metadata is and how Veeam uses it, and apparently how you are screwed if it gets corrupted. Any help or insight is appreciated.

Post by **nielsengelen** » Aug 31, 2016 3:12 pm this post

Could you try installing BETA2 which is available now and see if this still happens?

jgiacone · Post by **jgiacone** » Sep 07, 2016 2:14 pm this post

Thanks, I installed BETA2 on all of our test machines. All are working fine, except the machine that had the 'metadata' issue. That machine no longer gets a the metadata issue, but now has a different issue:

Code: Select all

2016-09-07 06:00:02 Job started at 2016-09-07 06:00:02
     2016-09-07 06:00:02 Starting backup
     2016-09-07 06:00:10 Creating volume snapshot                                                                                                         00:00:02
     2016-09-07 06:00:23 [error] Retrieved less bytes from the storage [0] than required [5632]. Offset: [1457563136]. File: [/tmp/veeam/...
     2016-09-07 06:00:23 [error] Failed to restore file from local backup. VFS link: [summary.xml]. Target file: [MemFs://RestoreText_{defdb0e0-3b68-47...
     2016-09-07 06:00:23 [error] Agent failed to process method {DataTransfer.RestoreText}.
     2016-09-07 06:00:23 [error] Failed to perform backup

And this is in the logs:

*************************

Code: Select all

ERR |Job has failed.
[07.09.2016 06:00:23] <140363314071296> lpbcore| >>  |Retrieved less bytes from the storage [0] than required [5632]. Offset: [1457563136]. File: [/tmp/veeam/.../BackupJob1_2016-09-03T060014.vib].
[07.09.2016 06:00:23] <140363314071296> lpbcore| >>  |--tr:Failed to retrieve V4 data block. Block offset: [1457563136]. Compressed size: [5173]. Original size: [16739].
[07.09.2016 06:00:23] <140363314071296> lpbcore| >>  |--tr:Failed to retrieve restore block [0] for file [summary.xml].
[07.09.2016 06:00:23] <140363314071296> lpbcore| >>  |--tr:Failed to read block [0] from restore point
[07.09.2016 06:00:23] <140363314071296> lpbcore| >>  |--tr:Next asynchronous read request cannot be processed.
[07.09.2016 06:00:23] <140363314071296> lpbcore| >>  |--tr:Asynchronous data reader has failed.
[07.09.2016 06:00:23] <140363314071296> lpbcore| >>  |--tr:Failed to process conveyored task.
[07.09.2016 06:00:23] <140363314071296> lpbcore| >>  |Failed to restore file from local backup. VFS link: [summary.xml]. Target file: [MemFs://RestoreText_{defdb0e0-3b68-4703-990b-65424a1203af}]. CHMOD mask: [32528].

*************************

Any ideas? Again, this is only happening on one machine, but it also happens to be the same machine that had the corrupted metadata issue.

Thanks again,
Joe

Post by **PTide** » Sep 07, 2016 2:25 pm this post

Hi,

Please describe the machine that is failing the backup job, here are things we need to know:

- disks configuration (lsblk -af output)
- dmesg -T output
- Is any RAID present in the system
- Backup mode and target (shared folder/USB/Local)
- archived /var/log/veeam directory

Could you please upload everything to the dropbox?

Thank you

premeau · Post by **premeau** » Sep 07, 2016 2:34 pm this post

I am seeing this on a CentOS 6 x86 box backing up to a Windows 2012R2 CIFS share.

My Agent.Repair.Log:

Code: Select all

[07.09.2016 07:00:03] <3040869232> stg    | Checking metadata of the storage [/tmp/veeam/<desthost>Linux/<localhost> OmegaBackup/OmegaBackup_2016-09-06T203535.vbk]
[07.09.2016 07:00:03] <3040869232> stg    |   Opening storage [/tmp/veeam/<desthost>Linux/<localhost> OmegaBackup/OmegaBackup_2016-09-06T203535.vbk] in read-only mode.
[07.09.2016 07:00:03] <3040869232> stg    |     Opening storage file [HostFS:///tmp/veeam/<desthost>Linux/<localhost> OmegaBackup/OmegaBackup_2016-09-06T203535.vbk].
[07.09.2016 07:00:03] <3040869232> stg    |       Applying shared lock on file [/tmp/veeam/<desthost>Linux/<localhost> OmegaBackup/OmegaBackup_2016-09-06T203535.vbk].
[07.09.2016 07:00:03] <3040869232> stg    |     Creating size-quoted storage file. Current file size: [4646961152].
[07.09.2016 07:00:03] <3040869232> stg    |     Standard block size: "1048576".
[07.09.2016 07:00:03] <3040869232> stg    |     Loading metastore.
[07.09.2016 07:00:03] <3040869232> stg    |       Enumerating instances of metadata stored on disk.
[07.09.2016 07:00:03] <3040869232> stg    |         Version of the metadata stored in slot [4096]: [12].
[07.09.2016 07:00:03] <3040869232> stg    |         Version of the metadata stored in slot [53248]: [12].
[07.09.2016 07:00:03] <3040869232> stg    |       Storage current dedupe limit: [4194304].
[07.09.2016 07:00:03] <3040869232> stg    |       Initialized bank cache.
[07.09.2016 07:00:03] <3040869232> stg    |       Bank cache limits: 
[07.09.2016 07:00:03] <3040869232> stg    |         Dirty: [138]
[07.09.2016 07:00:03] <3040869232> stg    |         Clean: [162]
[07.09.2016 07:00:03] <3040869232> stg    |         Overall: [194]
[07.09.2016 07:00:03] <3040869232> stg    |       Dedupe disabled: [false].
[07.09.2016 07:00:03] <3040869232> stg    |       Prepared dynamic load bank factory, using bank cache.
[07.09.2016 07:00:03] <3040869232> stg    |       Loading snapshot from slot [4096].
[07.09.2016 07:00:19] <3040869232> stg    |       Loading snapshot from slot [4096]. Failed.
[07.09.2016 07:00:19] <3040869232> stg    |     Loading metastore. Failed.
[07.09.2016 07:00:19] <3040869232> stg    |   Opening storage [/tmp/veeam/<desthost>Linux/<localhost> OmegaBackup/OmegaBackup_2016-09-06T203535.vbk] in read-only mode. Failed.
[07.09.2016 07:00:19] <3040869232> stg    |   Closing storage file [HostFS:///tmp/veeam/<desthost>Linux/<localhost> OmegaBackup/OmegaBackup_2016-09-06T203535.vbk].
[07.09.2016 07:00:19] <3040869232> stg    |     Removing lock on file [/tmp/veeam/<desthost>Linux/<localhost> OmegaBackup/OmegaBackup_2016-09-06T203535.vbk].
[07.09.2016 07:00:19] <3040869232> stg    | Checking metadata of the storage [/tmp/veeam/<desthost>Linux/<localhost> OmegaBackup/OmegaBackup_2016-09-06T203535.vbk] Failed.
[07.09.2016 07:00:19] <3040869232> cli    | ERR |Failed to process method {Stg.CheckMetadataCorrupt}
[07.09.2016 07:00:19] <3040869232> cli    | >>  |Retrieved less bytes from the storage [0] than required [5246976]. Offset: [4662856704]. File: [/tmp/veeam/<desthost>Linux/<localhost> OmegaBackup/OmegaBackup_2016-09-06T203535.vbk].
[07.09.2016 07:00:19] <3040869232> cli    | >>  |--tr:Failed to parse metadata stored in slot [4096]
[07.09.2016 07:00:19] <3040869232> cli    | >>  |--tr:Unable to load metadata snapshot. Slot: [4096].
[07.09.2016 07:00:19] <3040869232> cli    | >>  |--tr:Failed to load metastore
[07.09.2016 07:00:19] <3040869232> cli    | >>  |--tr:Failed to load metadata partition.
[07.09.2016 07:00:19] <3040869232> cli    | >>  |--tr:Failed to open storage for read access. Storage: [/tmp/veeam/<desthost>Linux/<localhost> OmegaBackup/OmegaBackup_2016-09-06T203535.vbk].
[07.09.2016 07:00:19] <3040869232> cli    | >>  |--tr:Failed to check metadata of the storage [/tmp/veeam/<desthost>Linux/<localhost> OmegaBackup/OmegaBackup_2016-09-06T203535.vbk].
[07.09.2016 07:00:19] <3040869232> cli    | >>  |--tr:Storage check failed.
[07.09.2016 07:00:19] <3040869232> cli    | >>  |An exception was thrown from thread [3040869232].

And my lsblk output -- no RAID, just a single simple partitioned disk:

Code: Select all

NAME   FSTYPE LABEL UUID                                 MOUNTPOINT
loop0                                                    
loop1                                                    
loop2                                                    
loop3                                                    
loop4                                                    
loop5                                                    
loop6                                                    
loop7                                                    
sr0                                                      
sda                                                      
├─sda1 ext4         a4d3c8f0-2430-4515-99d3-c3747adea3e9 /boot
├─sda2 swap         8388d48a-d81a-4a03-be62-7fa47ef5b017 [SWAP]
└─sda3 ext4         9b5f9e0a-922c-4c20-9c95-d8d36b5f4e80 /

Post by **PTide** » Sep 08, 2016 12:36 pm this post

Hi SPremeau,

Could you please try to backup to some other destination? For example you can use USB drive or NFS share instead of CIFS.

Thank you.

jgiacone · Post by **jgiacone** » Sep 08, 2016 1:46 pm this post

Hi, thanks for the response. The requested files have been uploaded here:

https://drive.google.com/folderview?id= ... sp=sharing

-There is no RAID present in the system
-Backup mode is 'Entire Machine' and the target is a CIFS share on an Isilon storage array

Thanks,
Joe

Post by **PTide** » Sep 08, 2016 3:05 pm this post

Backup mode is 'Entire Machine' and the target is a CIFS share on an Isilon storage array

I assume that all other machines backup to the very same CIFS share, under the same credentials, is that correct?

premeau · Sep 08, 2016 8:24 pm

PTide wrote:Could you please try to backup to some other destination? For example you can use USB drive or NFS share instead of CIFS.

I was able to do a Full Backup and one incremental to an NFS share hosted by a OpenFiler VM.

Post by **Gostev** » Sep 08, 2016 11:46 pm this post

With B&R, this error would indicate an issue with backup storage, so as the very first troubleshooting test I would try backing up to a different target.

Post by **nielsengelen** » Sep 09, 2016 6:48 am this post

If you mount the CIFS share on that server manually can you perform normal file copies to it (take some large folders) without a problem?

What about writing big files using 'dd' ? Any moment where an issue occurs like timeout or a like?

Post by **PTide** » Sep 09, 2016 1:58 pm this post

Joseph,

One more question - do other Ubuntu machines have the same OS version and backup to the same destination (CIFS on Isilon ) as the failing one?

Thanks

jgiacone · Sep 12, 2016 2:45 pm

Hi all,

Thank you for the replies. I will try an answer these in order:

1.) "I assume that all other machines backup to the very same CIFS share, under the same credentials, is that correct?"

-Partially. All of the other machines backup to the same CIFS share, but not all of them use the same credentials. The machine with the error does use the same credentials as two other machines, one with the exact same OS level (Ubuntu 16.04), and another Windows PC. Both of these other backups run without issue.

2.) "With B&R, this error would indicate an issue with backup storage, so as the very first troubleshooting test I would try backing up to a different target."

-OK, we can definitely try that. However wouldn't the other 10+ machines, several with the same OS, experience the same problem, if the target is the issue?

3.) "If you mount the CIFS share on that server manually can you perform normal file copies to it (take some large folders) without a problem?"

-Yes, I can copy very large files without issues to the CIFS share, if I mount it manually.

4.) "One more question - do other Ubuntu machines have the same OS version and backup to the same destination (CIFS on Isilon ) as the failing one?"

-Yes, I setup another Ubuntu machine, with the same OS level, that uses the same credentials, to better try and rout out this issue. That machine's backup has been configured identically and backs up flawlessly. A week of backups, no issue.

Again, thanks for all of the replies. Please let me know if any other information is needed.

-Joe

Post by **PTide** » Sep 12, 2016 3:34 pm this post

Yes, I setup another Ubuntu machine, with the same OS level, that uses the same credentials, to better try and rout out this issue. That machine's backup has been configured identically and backs up flawlessly. A week of backups, no issue.

The machine with the error does use the same credentials as two other machines, one with the exact same OS level (Ubuntu 16.04) <...> Both of these other backups run without issue

These two facts make me think that the reason might be somewhere inbetween the server and the shared storage. It might be a long shot but could you please swap network connections on two servers: the failing one and the other one that uses the same credentials and the same OS but does not fail backups?

Thanks

Post by **Gostev** » Sep 12, 2016 11:24 pm this post

jgiacone wrote:2.) "With B&R, this error would indicate an issue with backup storage, so as the very first troubleshooting test I would try backing up to a different target."

-OK, we can definitely try that. However wouldn't the other 10+ machines, several with the same OS, experience the same problem, if the target is the issue?

Yes, I would expect backups for all machines affected in the similar manner.

jgiacone · Post by **jgiacone** » Sep 19, 2016 1:55 pm this post

Sorry for the belated response, other work duties pulled me away from this last week. Thanks for the replies. I switched the network connections, and I got the same results. The Ubuntu install that always fails, still failed with the same message, while the other one backed up without a problem. Since my last post, I have added more Windows and Linux (Ubuntu 16.04) machines to our backup. This machine in particular is still the only one that fails.

Thanks,
Joe

Post by **PTide** » Sep 19, 2016 2:54 pm this post

Apparently the problem is with the machine itself. Is it an option to install another distro on the problematic host and try to backup again?

jgiacone · Sep 27, 2016 2:52 pm

That is my suspicion as well. Yes, that is an option. However, it is not the preferred one. We are evaluating this as a potential replacement for our current physical backup product. I would rather track down the issue, then tell my other admins to rebuild a server that is working fine, just to get it to backup properly. That will be a hard sell. I will continue to work on this. If I find a solution, I will post it. Thanks all, for the advice.

-Joe

jgiacone · Post by **jgiacone** » Sep 29, 2016 8:34 pm this post

Interesting note on this, two days ago, I changed the backup settings from "full machine" to "volumes" and just selected all of the volumes. It has backed up fine for 3 days now.

jgiacone · Oct 07, 2016 6:29 pm

An update to this: I ran into something odd, I ran updates last Friday, on this Ubuntu box, and the backups haven't failed since. I have a week's worth of successful backups. On PTide's advice, I had created two additional backup jobs on this machine, another full and another volume backup, so we could see log information from new jobs on. After the updates, these jobs, and the ones that had been failing all started running successfully. Here are the logs, maybe someone curious will be able to find a solution in them. At this time, I assume the updates fixed something that was broken, but I am not sure what it was. I keep this box updated, as I have been doing with all of our Ubuntu boxes that I am testing.

https://drive.google.com/folderview?id= ... sp=sharing

Thanks everyone, for your suggestions.

Thanks again,
Joe

R&D Forums

All instances of the storage metadata are corrupted

Re: All instances of the storage metadata are corrupted

Re: All instances of the storage metadata are corrupted

Re: All instances of the storage metadata are corrupted

Re: All instances of the storage metadata are corrupted

Re: All instances of the storage metadata are corrupted

Re: All instances of the storage metadata are corrupted

Re: All instances of the storage metadata are corrupted

Re: All instances of the storage metadata are corrupted

Re: All instances of the storage metadata are corrupted

Re: All instances of the storage metadata are corrupted

Re: All instances of the storage metadata are corrupted

Re: All instances of the storage metadata are corrupted

Re: All instances of the storage metadata are corrupted

Re: All instances of the storage metadata are corrupted

Re: All instances of the storage metadata are corrupted

Re: All instances of the storage metadata are corrupted

Re: All instances of the storage metadata are corrupted

Re: All instances of the storage metadata are corrupted

Re: All instances of the storage metadata are corrupted

Who is online