Comprehensive data protection for all workloads
Post Reply
inayama
Influencer
Posts: 15
Liked: never
Joined: Dec 08, 2016 12:02 am
Full Name: Tadashi Inayama
Contact:

Backup Copy Files Going Offline on Dell DR4300

Post by inayama »

Hello,

I am having an issue where the files copied from Backup Copy Jobs are getting offlined (with red x's) periodically on a CIFS share on our Dell DR4300 (not Rapid CIFS). I can reconnect with the files if I rescan the Backup Repo, but I have to disable all Backup Copy jobs before I rescan or I get errors stating that the files are locked by running session. The Backup Copy Jobs are running fine (seems to write fine), but if were to run FLR or Recovery against the files in the Backup Copy Jobs, I will get a read error, and I will have to manually rescan.

The Support Tech had me change the Maximum Number of Concurrent Connections on the Backup Repo (for DR4300) from 4 to 2, but the files ended up going offline.

We checked for any dropped packets between the gateway (the Veeam Backup) server, but did not find any in a 24 hour period.

I would also like to know if there are any way to automate the manual rescan via scripts so I can run it as a scheduled task.

Support Ticket # 01984964

Any help or suggestion is appreciated.

Thank you,
Tadahi

PTide
Product Manager
Posts: 5574
Liked: 531 times
Joined: May 19, 2015 1:46 pm
Contact:

Re: Backup Copy Files Going Offline on Dell DR4300

Post by PTide »

Hi,

What are your gateway sever settings for that repository?

Thank you

inayama
Influencer
Posts: 15
Liked: never
Joined: Dec 08, 2016 12:02 am
Full Name: Tadashi Inayama
Contact:

Re: Backup Copy Files Going Offline on Dell DR4300

Post by inayama »

Hello.

The gateway server for the Backup Repository is set as:

Following Server:
xxxxx (Backup Server)

It is the Veeam Backup Server.

Let me know if you need more info.

Thank you,
Tadashi

inayama
Influencer
Posts: 15
Liked: never
Joined: Dec 08, 2016 12:02 am
Full Name: Tadashi Inayama
Contact:

Re: Backup Copy Files Going Offline on Dell DR4300

Post by inayama »

Hello.

I also tried reducing the Maximum Number of Concurrent Connection to 1 and disabled the Align backup file data blocks option. The Decompress backup data blocks before storing option is enabled, the This repository is backed up by rotated hard drives option is disabled, and the Use per-VM backup files is enabled.

The files ended up going offline again.

Thank you,
Tadashi

PTide
Product Manager
Posts: 5574
Liked: 531 times
Joined: May 19, 2015 1:46 pm
Contact:

Re: Backup Copy Files Going Offline on Dell DR4300

Post by PTide »

Do you have any other machine that is closer to the share than the backup server? If yes then please try to configure it as a gateway and see if that resolves the issue.
I would also like to know if there are any way to automate the manual rescan via scripts so I can run it as a scheduled task.
Please check this PS cmdlet

Thank you.

inayama
Influencer
Posts: 15
Liked: never
Joined: Dec 08, 2016 12:02 am
Full Name: Tadashi Inayama
Contact:

Re: Backup Copy Files Going Offline on Dell DR4300

Post by inayama »

Hello,

The Veeam Backup Server is on the same vlan and IP subnet as the DR4300. Both the Veeam Backup Server and the DR4300 are on dual 10Gbps on the same FEX. Physically, the DR4300 is on top of the Veeam Backup Server.

I will also check the site you forwarded.

Thank you,
Tadashi

inayama
Influencer
Posts: 15
Liked: never
Joined: Dec 08, 2016 12:02 am
Full Name: Tadashi Inayama
Contact:

Re: Backup Copy Files Going Offline on Dell DR4300

Post by inayama »

Hello.

The Veeam Support Engineer suggested disabling the parallel processing, so we did that. But the files for the Backup Copy Jobs went offline again.

Thank you,
Tadashi

foggy
Veeam Software
Posts: 19285
Liked: 1741 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Backup Copy Files Going Offline on Dell DR4300

Post by foggy »

Looks strange, indeed. Have you looked at the storage itself? Does it throw any related events/messages in is logs at the time the backups go offline?

inayama
Influencer
Posts: 15
Liked: never
Joined: Dec 08, 2016 12:02 am
Full Name: Tadashi Inayama
Contact:

Re: Backup Copy Files Going Offline on Dell DR4300

Post by inayama »

Hello.

I did check the logs on Dell DR4300, and I did not see any corresponding errors on the DR4300. I also opened a case with Dell (SR: 940346626) and had a conf call with Veeam and Dell, but could not complete the root cause analysis or come to a resolution. Dell has the complete logs from DR4300, but I'm still waiting for a complete analysis.

Another interesting bit is that the files are locked from Windows stand point, meaning that the I can re-name the Veeam backup files via the share folders, but Veeam is seeing these files as off-line. The "locking" is done only at the Veeam level, not at the Windows OS level.

Veeam support requested more logs, so I uploaded them.

Our Veeam Solutions Architect approved our plan to run the Backup Copy Jobs to a CIFS Share presented by Dell DR4300 (or any other NAS Appliance), and we are currently backing up to Dell DR4200/DR4300 via NetBackup without any problems. And this problem is stopping us from releasing/certifying Veeam to production, so looking for a quick resolution and avoid re-architecting the storage for Veeam.

Thank you,
Tadashi

inayama
Influencer
Posts: 15
Liked: never
Joined: Dec 08, 2016 12:02 am
Full Name: Tadashi Inayama
Contact:

Re: Backup Copy Files Going Offline on Dell DR4300

Post by inayama »

Hello.

So I got a reply from Dell Support:

Tadashi,

I received an update on the log review. We are not seeing any disconnects caused by the DR. However, there are two possible registry edits that could help. Microsoft posted them as a way to alleviate dropped connections on the Windows side. They would have to be applied to every Windows server that is attaching to the DR. Those servers would have to be rebooted. Here are the KBs:

http://support.microsoft.com/kb/102067 and http://support.microsoft.com/en-us/kb/170359.

1.) Set the SessTimeout 3600 seconds. This key can be found (or created if not present) under

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\
Value Name: Sesstimeout
Data Type: REG_DWORD - Number
Value: 3600 (Decimal)

2) Set the TcpMaxDataRetransmissions to 64. This key can be found (or created if not present) under

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters
Value Name: TcpMaxDataRetransmissions
Data Type: REG_DWORD - Number
Value: 64 (Decimal)

I will try these out and see how it goes.

Thank you,
Tadashi

inayama
Influencer
Posts: 15
Liked: never
Joined: Dec 08, 2016 12:02 am
Full Name: Tadashi Inayama
Contact:

Re: Backup Copy Files Going Offline on Dell DR4300

Post by inayama »

Hello,

The registry edits did not work, the files went offline.

And I created two more Backup Repositories mounted to CIFS Shares from two additional Dell DR's (DR4000 and DR4100) and they also went offline.

Not sure what to try out next.

Any suggestions would be appreciated.

Thank you,
Tadashi

inayama
Influencer
Posts: 15
Liked: never
Joined: Dec 08, 2016 12:02 am
Full Name: Tadashi Inayama
Contact:

Re: Backup Copy Files Going Offline on Dell DR4300

Post by inayama »

Hello,

From the logs:

[09.11.2016 03:27:18] Error Unable to investigate free space: [09.11.2016 03:27:18] Error System.Exception: Heartbeat check failed for repo 'codd06' [09.11.2016 03:27:18] Error at Veeam.Backup.ResourceScanner.CRepositoryScanProcessor.Execute() [09.11.2016 03:27:18] Info [StorageDB] Marking storage \\CODD06\veeam\Backup Copy Job 3\qqcoqadwin511.vm-9682016-11-08T210000.vib id 67bcb217-d493-40f5-b010-85cfb86fb2b7 as RepositoryUnavailable (prev. value Available) [09.11.2016 03:27:18] Info [StorageDB] Marking storage \\CODD06\veeam\Copy Prod Fri 9pm Daily Rvrs Incr\qqcoqadwin538.vm-12262016-11-08T000000.vib id 779ae21e-3b28-4d41-b8c6-588fbc1b5a80 as RepositoryUnavailable

So keying into the why Veeam marks the files as offline:

Error Unable to investigate free space: [09.11.2016 03:27:18] Error System.Exception: Heartbeat check failed for repo 'codd06'

Does anyone have an insight into the " investigate free space" step? Is there a way to manually set the time or the interval to "investigate free space" so I can troubleshoot this step or to even stop this step so the files do not go offline?

Thank you,
Tadashi

foggy
Veeam Software
Posts: 19285
Liked: 1741 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Backup Copy Files Going Offline on Dell DR4300

Post by foggy »

Tadashi, these kind of questions are better to be addressed to support engineer, so please keep the investigation going.

inayama
Influencer
Posts: 15
Liked: never
Joined: Dec 08, 2016 12:02 am
Full Name: Tadashi Inayama
Contact:

Re: Backup Copy Files Going Offline on Dell DR4300

Post by inayama »

Hello.

So from Veeam and Dell Support, it looks like there is a compatibility issue:

Veeam as part of the heartbeat process queries the storage for free space "investigate free space." Dell DR4XXX treats these queries as "management process" which have lower priority than primary process like writing data to disk. So when Backup Copy Jobs are running, then DR4XXX answers the queries slower, sometimes too slow for Veeam so Veeam generates this "Unable to investigate free space....Heartbeat check failed for repo..." then marks all the files in that Backup Repo as offline.

We asked Veeam Support if there was a way to increase this timeout duration for "investigate free space" query. He said that there is a registry key to change the frequency of the test, but could not find how to increase the timeout.

Does anyone know how to increase the timeout duration for "investigate free space" query and any of the other Backup Repo Heartbeat Check process in general?

Thank you,
Tadashi

foggy
Veeam Software
Posts: 19285
Liked: 1741 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Backup Copy Files Going Offline on Dell DR4300

Post by foggy »

I'm not sure if there's a timeout specifically for this check. This should be an RPC call, so could you please verify whether you have the default RPC timeout value (RpcRequestTimeoutSec) redefined in the registry? The default is 3600 seconds, so should be sufficient.

inayama
Influencer
Posts: 15
Liked: never
Joined: Dec 08, 2016 12:02 am
Full Name: Tadashi Inayama
Contact:

Re: Backup Copy Files Going Offline on Dell DR4300

Post by inayama »

Hello.

I'm not sure where that registry key is in Win2k12 R2, but support did have me increase TcpMaxDataRetransmissions to 64.

They also had me change a registry key for heartbeat, so we'll see if that works.

Our initial POC with Veeam B&R 8.0 worked with the Dell DR4300 and we were able to re-confirm that it works with Veeam B&R 8.0. But we are already on vSphere 6.0 U2, which is not supported with Veeam B&R 8.0 U3. So something must have changed with these heartbeat process between 8 and 9.

Thank you,
Tadashi

Mike Resseler
Product Manager
Posts: 6146
Liked: 718 times
Joined: Feb 08, 2013 3:08 pm
Full Name: Mike Resseler
Location: Belgium
Contact:

Re: Backup Copy Files Going Offline on Dell DR4300

Post by Mike Resseler »

Tadashi,
Thanks for coming back to the forum with the additional information. Please let us know what the result is of those changes

Again, thanks!
Mike

inayama
Influencer
Posts: 15
Liked: never
Joined: Dec 08, 2016 12:02 am
Full Name: Tadashi Inayama
Contact:

Re: Backup Copy Files Going Offline on Dell DR4300

Post by inayama »

Hello,

The registry hack worked, the files are no longer going off line.

[HKEY_LOCAL_MACHINE\SOFTWARE\Veeam\Veeam Backup and Replication]
"DoNotCreateHeartbeatFile"=dword:00000001

The support engineer said that there is no KB article regarding this registry key.

Here's what he said: "Regarding the details of the change. The registry key changes the way how heartbeat process is performed. With key set, it doesn't create temporary file on repository to verify that it's writable. You can work with this key set, no issues were reported on using this key. There is no KB on this. "

Does anyone else have any information regarding this key?

Was there some change between the heartbeat process between Veeam B&R 8.0 and 9.x?

Thank you,
Tadashi

Mike Resseler
Product Manager
Posts: 6146
Liked: 718 times
Joined: Feb 08, 2013 3:08 pm
Full Name: Mike Resseler
Location: Belgium
Contact:

Re: Backup Copy Files Going Offline on Dell DR4300

Post by Mike Resseler »

Hi Tadashi,

First, I am glad that it worked and that Veeam and Dell together could find a solution. Interesting case though.

I am unfortunately not aware of changes of the heartbeat process between 8 and 9.x, nor am I aware what this key specifically does. I hope that someone else knows more about it so he / she can answer your question

Please keep us informed when issues would arise again.

Thanks
Mike

davidwatts71
Enthusiast
Posts: 25
Liked: 6 times
Joined: Oct 30, 2017 8:05 am
Full Name: David Alexander Watts
Contact:

Re: Backup Copy Files Going Offline on Dell DR4300

Post by davidwatts71 »

Had the same problem with a Quest DR6300. The reg key also fixed this problem.

Thanks David

Post Reply

Who is online

Users browsing this forum: Bing [Bot] and 62 guests