• At HKLM\Software\Wow6432Node\Veeam\Backup and Replication, create DWORD value SkipTapeAlerts = 1
• Extend the polling interval at HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Veeam\Veeam Backup and Replication\TapeDeviceWatcherTimeoutSec
• DWORD – polling interval in seconds (set to decimal 300 = five minutes)
Since then, it has been working fine. Like you though, my jobs would fail randomly whilst running.
I just tried Lightsout changes and unfortunately it still failed.
Then I tried Velimira changes and it failed as well.
When it does, I still can see the following error in Windows Event Viewer: The device, \Device\Tape0, is not ready for access yet.
I should have a new tape controller tomorrow for me to try. Unfortunately it was not possible to add it directly on the second bank (needs a second CPU apparently), so will have to remove a 10Gb NIC to make room for it
I will let you know how it goes.
Once again, thanks for all the contributors. It is great to see such a caring community.
Well, last night, as I removed our 10Gb Nic card to leave space for an extra tape controller, I rebooted the server and miracle, this morning I found that the tape copy worked for the first time!
I performed another tape copy from the other job that failed last week-end and it worked too!
I do not know what the fix was, either the HP WBEM either the registry change, but it looks like a reboot was necessary to activate it properly.
I am so grateful with this community to have solve this issue with its help. Thanks all.
If I may hijack this thread a little more: for those that use LTO6, what sort of processing rate do you get?
I was expecting more compared with our LTO5 using HP Dataprotector. However I noticed that HP DP uses more agents to read the disks, so more data are written to tape at the same time. Here that might might the bottleneck as there is only 1 file to write and the tape must be waiting for more data.
Saying that Veeam thinks the bottleneck is the target (tape drive) at 99%, so not too sure.
I've got an HP MSL2024 Tape Library connected via 8gb Fibre and I get 'Processing Rate' anywhere from between 150 MB/s to 80 MB/s..
Not too sure why its so inconsistent !! Seems like in the middle of the job, the 'throughput' remains almost 0 for many minutes before intermittently kicking in again !
20100 wrote:If I may hijack this thread a little more: for those that use LTO6, what sort of processing rate do you get?
I was expecting more compared with our LTO5 using HP Dataprotector. However I noticed that HP DP uses more agents to read the disks, so more data are written to tape at the same time. Here that might might the bottleneck as there is only 1 file to write and the tape must be waiting for more data.
Saying that Veeam thinks the bottleneck is the target (tape drive) at 99%, so not too sure.
Since Veeam data is already compressed and there is just a single big file it should be pretty easy for a single agent to stream at the full tape speed, which for LTO-6 is around 160MB/s. I would expect real world speeds in the 130-160MB/s range assuming everything else is optimal. Some people are used to seeing speeds of 300MB/s in other software, but that's writing data that is not already compressed. Since an LTO-6 tape device can do 2.5:1 compression in hardware it is capable of receiving uncompressed data at around 400MB/s (160MB/s x 2.5 compression = 400MB/s), but Veeam backups are generally already compressed at similar levels and thus you'll be much closer to the actual native speed instead.
jazzoberoi wrote:I'm doing a backup as i type this and again.. the throughput has dropped to 0kb for the last half hour !!
Can someone please shed some light on this ? What logs do i check to see whats going on ?
I don't know that this is the type of problem that can be resolved via the forum as it's pretty difficult to understand what's going on. Are there other backups running during this time as well, or just the backup to tape? I certainly wouldn't expect that behavior if there's nothing else going on.
Thanks for reply. Yes this is the type of speed I am getting, so it looks we are OK.
Obviously our previous backup solution does backup non compressed data which explain the speed difference.
It has also been confirmed with the amount of data on tape, which you already explained to me in a different thread.
tsightler wrote:
I don't know that this is the type of problem that can be resolved via the forum as it's pretty difficult to understand what's going on. Are there other backups running during this time as well, or just the backup to tape? I certainly wouldn't expect that behavior if there's nothing else going on.
No other backup is running. Checked the Task Manager and everything is normal, CPU is below 20% and plenty of RAM available..
jazzoberoi wrote:Hi, opened up a case # 00829368. Lets see what comes out of it !
I will follow the case as well as I'm definitely interested in what is causing this, it's certainly not something I would expect. In the vendor testing that I've done we've generally seen quite consistent performance for large files. The only time I ever see anything like what your posting is when backing up directories that have lots of smaller files but that doesn't seem to be the case in your screenshot. It will be very interesting to see what support has to say.
Hi, Jazz
Thanks for testing and provided the logs.
The first job it takes about 3 hours, the second job take about two and half hours , the difference between these two jobs are half hours.
The bottleneck is showing below
First job: Load: Source 8% > Proxy 13% > Network 11% > Target 99%
Second job: Load: Source 13% > Proxy 21% > Network 14% > Target 98%
Both bottlenecks are target, which is your process rate will depend on the target, 1% writting speed difference is acceptable.
You can also see where the proxy and network are on different load each time, where you have same backup size.
Seriously, !?
I know the difference between 2.5 and 3hrs is 'half hour'.. and i also know that the difference between 98% and 99% is 1%. I also know from the processing stats that its showing the Tape to be the bottleneck.
I did not lodge a support case and upload all the logs to only be told the obvious !?
What i want to know is why is there extended periods of absolute no activity ? Why is the data processing rate so low when everything is connected to Fibre and that Tape Drive is capable of handling speeds upto 160MB/s with close to 20 brand new LTO6 tapes. And why am i sometimes able to complete the job in 1.5 hours and sometimes it takes more than 3 hrs !?
Tape Drive is capable of handling speeds upto 160MB/s with close to 20 brand new LTO6 tapes. And why am i sometimes able to complete the job in 1.5 hours and sometimes it takes more than 3 hrs !?
We are aware of this problem and doing our best to get the performance improved. Specifically, we want to make the throughput more stable, and eliminate this “spiky’ issue in the next major releases. By the way, thank you for the logs - it helps a lot in our improvement investigations.
Regarding the support case handling, use ‘talk to the manager’ option in the customer portal anytime, whenever you feel it's needed.
Its also good to know that its a 'known issue' and is under investigation. Otherwise the worst part of lodging a support case in most cases is proving that there is indeed an issue, atleast now half the battle is won
Please let me know if you need me to upload any more logs to help you with your investigations... any indication on progress or eta will be highly appreciated !
Regarding the ‘talk to the manager’ option, i wasn't aware of such a function, but i wont need to use it anymore
Just received some new communication from support regarding my case.. Just wanted to know whether or not I'm still dealing with the same support person and should do as he says or wait for instructions from either Tom or d.popov ..
Hi Jazz,
Yes, please, keep working with the support – I’ve forwarded you case to QA and DEV teams: as I’ve posted before this seems like an issue we need to fix in the product. I’ll follow up on Monday once I get any response. Thanks!
After following the KB1887 - http://www.veeam.com/kb1887 and creating the DWORD "TapeDeviceWatcherTimeoutSec" with a value "300" , my problem was fixed.
Based on your case, I think we need to review the default timeout we are using now. However, its not going to be changed in the upcoming VBR update – more likely in the major release, just to avoid any collateral damage such correction may bring.