Comprehensive data protection for all workloads
Post Reply
mkaec
Veteran
Posts: 465
Liked: 136 times
Joined: Jul 16, 2015 1:31 pm
Full Name: Marc K
Contact:

Why Is It An Error to Interrupt an Idle BCJ?

Post by mkaec »

I have the VBR server scheduled to install Windows Updates Friday afternoon as this is the least likely time for a backup to be running. When it reboots after the update, I get emails about the BCJs failing and they show in the history with an error of "Job has failed unexpectedly". The last time this happened, I checked and found the BCJ's were actually "idle" at the time of the reboot. According to the log, they started at 4:00 AM, ran for 20-30 minutes and sat idle for the rest of the day until "failing" while "waiting for the new copy interval".

I went in and added exclusions for when the server is likely to reboot. Additionally, If I want to manually reboot the server at another time for some reason, I need to disable all the BCJs first and then remember to turn them back on after the reboot.

I would prefer if BCJs weren't so high maintenance. To me, it is not a failure if a BCJ is interrupted while it is idle.
Shestakov
Veteran
Posts: 7328
Liked: 781 times
Joined: May 21, 2014 11:03 am
Full Name: Nikita Shestakov
Location: Prague
Contact:

Re: Why Is It An Error to Interrupt an Idle BCJ?

Post by Shestakov »

Hi Marc,
VBR reboot should not affect backup copy job in your case if you reboot the server while the job is in "idle" state.
How often do you reboot the server? Do you observe the described behavior all each time you make a reboot?
Thanks!
mkaec
Veteran
Posts: 465
Liked: 136 times
Joined: Jul 16, 2015 1:31 pm
Full Name: Marc K
Contact:

Re: Why Is It An Error to Interrupt an Idle BCJ?

Post by mkaec »

The server reboots once a month when Windows updates are applied. Then sometimes it will be manually rebooted, like if an SQL update is applied.

I just tested now and I did not get an email, but the BCJs are now showing in the Last 24 Hours \ Failed node. They had a status of "Waiting for the new copy interval" prior to the reboot. It's a cosmetic thing, but admins tend not to like red Xs. :)
JaxIsland7575
Veteran
Posts: 391
Liked: 107 times
Joined: Apr 27, 2015 1:59 pm
Full Name: Ryan Jacksland
Location: NY, USA
Contact:

Re: Why Is It An Error to Interrupt an Idle BCJ?

Post by JaxIsland7575 »

I am not sure if this helps but I can confirm what Marc is seeing on v8. I have yet to determine what situation causes an email to be sent but I have BCJ running at 1400. I manually disabled the jobs at 0700 and they all fail and some send an email. Other times on the same server I have just rebooted it without disabling the jobs and I get the same failures.

I don't understand why its considered a failure when the job is set to run at 1400, but 1410 the jobs are done, if I reboot the server in the next 23 hours 49 minutes it should be completed successfully because it copied over the latest data.

Didn't mean to hijack this, but I thought I would share that I see the same thing and also do not understand why its labeled a failure.

Cheers!
VMCE v9
Shestakov
Veteran
Posts: 7328
Liked: 781 times
Joined: May 21, 2014 11:03 am
Full Name: Nikita Shestakov
Location: Prague
Contact:

Re: Why Is It An Error to Interrupt an Idle BCJ?

Post by Shestakov » 1 person likes this post

I believe that the logic is next: backup copy job runs continuously, so any interrupt is considered as a failure.
However I see your point and we will discuss the warning behaviour with the R&D team if we have more similar requests.

You may also leverage Veeam ONE monitoring capabilities to be notified of the VM reboots, VBR server availability etc.
Peejay62
Expert
Posts: 235
Liked: 37 times
Joined: Aug 06, 2013 10:40 am
Full Name: Peter Jansen
Contact:

Re: Why Is It An Error to Interrupt an Idle BCJ?

Post by Peejay62 »

JaxIsland7575 wrote:
Didn't mean to hijack this, but I thought I would share that I see the same thing and also do not understand why its labeled a failure.
!
I am in on this one. I also experience this imho annoying behavior. Scheduled reboot of Veeam servers once a week. Added a timeslot for the duration of the reboot for the continuous copy jobs as disabled but I receive a lot of errors for the "interrupted" copyjobs. In my daily reporting from EM everything always looks very smooth, some warnings, incidently an error and of course tons of successes ;-) but on the day after the reboot a lot of errors (failed copyjobs). So if any improvement will be available, I'd gladly make use of it.

thanks, Peter
Shestakov
Veteran
Posts: 7328
Liked: 781 times
Joined: May 21, 2014 11:03 am
Full Name: Nikita Shestakov
Location: Prague
Contact:

Re: Why Is It An Error to Interrupt an Idle BCJ?

Post by Shestakov »

Thanks for the feedback Peter!
We will think about that.
mkaec
Veteran
Posts: 465
Liked: 136 times
Joined: Jul 16, 2015 1:31 pm
Full Name: Marc K
Contact:

Re: Why Is It An Error to Interrupt an Idle BCJ?

Post by mkaec »

And then errors get reported in the "Last 24 Hours" digest email, which prompts a visit to the console to see what failed.
mkaec
Veteran
Posts: 465
Liked: 136 times
Joined: Jul 16, 2015 1:31 pm
Full Name: Marc K
Contact:

Re: Why Is It An Error to Interrupt an Idle BCJ?

Post by mkaec »

It looks like this issue still occurs in 9.0 update 2.
Shestakov
Veteran
Posts: 7328
Liked: 781 times
Joined: May 21, 2014 11:03 am
Full Name: Nikita Shestakov
Location: Prague
Contact:

Re: Why Is It An Error to Interrupt an Idle BCJ?

Post by Shestakov »

Hello Marc,
That`s expected, the issue is not critical, so we were not pointing the resolution to the minor update.
Thanks
mkaec
Veteran
Posts: 465
Liked: 136 times
Joined: Jul 16, 2015 1:31 pm
Full Name: Marc K
Contact:

Re: Why Is It An Error to Interrupt an Idle BCJ?

Post by mkaec »

It looks like the issue is also still present in 9.5.
Shestakov
Veteran
Posts: 7328
Liked: 781 times
Joined: May 21, 2014 11:03 am
Full Name: Nikita Shestakov
Location: Prague
Contact:

Re: Why Is It An Error to Interrupt an Idle BCJ?

Post by Shestakov »

Hello Marc,
Indeed the change didn`t get into v95 release because of low priority.
mkaec
Veteran
Posts: 465
Liked: 136 times
Joined: Jul 16, 2015 1:31 pm
Full Name: Marc K
Contact:

Re: Why Is It An Error to Interrupt an Idle BCJ?

Post by mkaec »

Hmm... Didn't get fixed in a point release. Didn't get fixed in a major release. I believe all the options have been eliminated. :)

On a serious note, great job on the 9.5 release! I think the ReFS 3.1 functionality is a huge game changer.

Gostev noted that the 9.5 release has been super smooth. Maybe that'll open up a window to work on some of the small boring stuff.
Shestakov
Veteran
Posts: 7328
Liked: 781 times
Joined: May 21, 2014 11:03 am
Full Name: Nikita Shestakov
Location: Prague
Contact:

Re: Why Is It An Error to Interrupt an Idle BCJ?

Post by Shestakov » 1 person likes this post

For Marc and others following the thread,
the issue is to be solved in the upcoming update.
mkaec
Veteran
Posts: 465
Liked: 136 times
Joined: Jul 16, 2015 1:31 pm
Full Name: Marc K
Contact:

Re: Why Is It An Error to Interrupt an Idle BCJ?

Post by mkaec »

Awesome! Thanks.
mkaec
Veteran
Posts: 465
Liked: 136 times
Joined: Jul 16, 2015 1:31 pm
Full Name: Marc K
Contact:

Re: Why Is It An Error to Interrupt an Idle BCJ?

Post by mkaec »

I think this just happened to me today in 9.5.0.1536. Windows Updates installed and the Veeam server rebooted. I logged in, started up Veeam, and my BCJs are listed in the Last 24 Hours \ Failed group. When I open them up, all VMs have a success status and one log contains:

Waiting for the new copy interval 11:33:56
Job has failed unexpectedly

The other log contains:

Waiting for the new copy interval 11:43:21
Job has been stopped with failures.

The jobs were idle for over 11 hours before the server rebooted.
foggy
Veeam Software
Posts: 21138
Liked: 2141 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Why Is It An Error to Interrupt an Idle BCJ?

Post by foggy »

Hi Marc, according to our records, the fix for this issue was included in Veeam B&R 9.5 U2. What Veeam B&R should do in case of server reboot, is wait for the jobs to stop correctly prior to stopping the service - this ensures the proper job status being shown. Since you're running U3 and still see the opposite behavior, I recommend contacting support for confirmation. Either the job wasn't in Idle status at the moment of reboot (was waiting for resources to start processing, for example) or there were some issues that prevented the job from being stopped within 10 minutes and it was terminated, anyway this should be investigated more closely.
mkaec
Veteran
Posts: 465
Liked: 136 times
Joined: Jul 16, 2015 1:31 pm
Full Name: Marc K
Contact:

Re: Why Is It An Error to Interrupt an Idle BCJ?

Post by mkaec »

I rebooted the backup server now and the BCJs did not get marked as failed. So, it's not an every-time occurrence. If it happens again, I'll try to gather data for a support case.
foggy
Veeam Software
Posts: 21138
Liked: 2141 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Why Is It An Error to Interrupt an Idle BCJ?

Post by foggy »

Thanks for sharing, this confirms the assumptions from my post. Feel free to contact support in case the behavior shows up again.
Post Reply

Who is online

Users browsing this forum: efd121, massimiliano.rizzi, NightBird and 139 guests