I'm running a virtualized server environment over 4 locations in China. Backups are locally stored an replicated to HQ in China. At HQ I have backup copy to on the campus.
So far so good everything went smooth no problems in doing back replication or restore. Auditors are happy too.
Last Wednesday a colleague booted the central file server with 6 TB shared drives, user profiles and home directories by accident. Normally no big issue, but this time the server came up and the DATA disk was recognized as RAW and from the server manager no repair possible.
Now the nightmare started. I started a chkdsk /r and the 6 TB started to be scanned by the chkdsk.exe. Stage 1 - 4 was quite fast ca. 3 h and then stage 4 started and the ETA 900 h.
We opened a ticket at Microsoft with Prio 1 and the feedback was, let it run otherwise your data is gone...
But I was still confident because I have a backup and a copy of my backup. And I received zero backup problems the last weeks. So I started the to open veeam file explorer and I was shocked because I did not see D:\ drive... I opened next day bck and next and next... nothing no D:\ drive visible.
After 10 days I found the last total consistent file level backup... 10 days work LOST? So I waited another night and half day and found out if chkdsk.exe is continue to check the disk with 30 MB/s it will finish 2.5 days later... That was not acceptable. I started a instant VM recovery of the VM and the same issue. With a third party tool I booted and found that data is still available so I saw a little light in the dark.
After that I decided against Microsoft recommendation that I stopped chkdsk and surprise no data were readable... Last resort was to run chkdsk.exe /f after 3 h checking on the file level only and some repairing the data have been recovered. I started immediately a backup...
We investigated the issue. We found on the day when we had the last successful backup an event ID 55. Ntfs Master File Table Error.
We found Event ID 98 telling is that the DISK is healthy! After the accidental reboot the error 55 showed up again and we had a RAW disk.
Now it's the question how can Microsoft not put the disk immediately offline or tag it as dirty and why did not veeam B&R recognize that there is something suspicious. On to the other day there is no D:\ drive anymore but same amount of data...
Of course we will push Microsoft, but I also expect from my backup software the possibility to check if a single file restore is possible from all the disks...
Greetings from Shanghai