Host-based backup of Nutanix AHV VMs.
Post Reply
arogarth
Service Provider
Posts: 81
Liked: 18 times
Joined: Sep 25, 2017 7:15 am
Location: Frankfurt/M., Germany
Contact:

Backup Problems with AHV - Task failed unexpectedly

Post by arogarth »

Hello togehter,

on one AHV Cluster we have over 20 Nodes and over 700 VMs - Backup using VEEAM AHV.
First of all with Proxy v4 backups are running much better and faster, good job, if they are running ;)

In total all VMs are splitted into multiple Jobs, 5 in total. A Job can contains over 200 Objets.

Always, if there are runnign more that one job, they failed with "Task failed unexpectedly".
The Cluster has no errors, Network is fine and sometimes the jobs finished successfully...

The AHV Proxy ressources are also more than needed.

In one of the last steps I disabled the Job scheduler and started each job one by one. I did this over some days, and it workes perfectly.
Now I wrote a small python-script, which triggered the next job if no one is running.

Code: Select all

#!/bin/python3

import argparse
import logging
import requests
from datetime import datetime
import sys
from urllib3.exceptions import InsecureRequestWarning

# Suppress only the single warning from urllib3 needed.
requests.packages.urllib3.disable_warnings(category=InsecureRequestWarning)

parser = argparse.ArgumentParser(description='')
parser.add_argument('--host', required=True)
parser.add_argument('--username', default='veeam')
parser.add_argument('--password', default='veeam')
parser.add_argument('--log', default="info")
ARGS = parser.parse_args()

AHV_HOST=ARGS.host
AHV_USER=ARGS.username
AHV_PASSWORD=ARGS.password

log = logging.getLogger(__name__)
log.addHandler(logging.StreamHandler())
log.setLevel(ARGS.log.upper())

payload = {
  'grantType': 'password',
  'userName': AHV_USER,
  'password': AHV_PASSWORD,
  'longLivedRefreshToken': False
}
res = requests.post(f'https://{AHV_HOST}/api/oauth2/token', data=payload)

if res.status_code != 200:
    raise Exception(res)

log.info('Login successfull')

TOKEN = res.json()['accessToken']
session = requests.Session()
session.auth = ("", "")
session.headers['Content-Type'] = 'application/json'
session.headers['Accept'] = 'application/json'
session.headers['Authorization'] = f'Bearer {TOKEN}'
log.debug(session.headers)

res = session.get(f'https://{AHV_HOST}/api/v4/jobs', auth={})
if res.status_code != 200:
    raise Exception(res)
jobs = res.json()
jobs = jobs['results']

def sort_date(obj):
    date = obj['lastRunUtc'].split('.')[0]
    timestamp = datetime.strptime(date, '%Y-%m-%dT%H:%M:%S').timestamp()
    log.debug(timestamp)
    return timestamp

running_jobs = list(filter( lambda x: x['status'] == 'Running', jobs) )
if len(running_jobs) > 0:
    log.warning('A Job is running - "%s"', running_jobs[0]["name"])
    sys.exit(0)

next_job = sorted(jobs, key=sort_date)[0]

log.info('Start next job "%s"', next_job["name"])
res = session.post(f'https://{AHV_HOST}/api/v4/jobs/{next_job["id"]}/start', auth={})
if res.status_code != 202:
    raise Exception(res)
Put it on the proxy to "/home/veeam/run-next-job.py" and set cron scheduler using "crontab -e":

Code: Select all

*/15 *  *   *   *     python3 /home/veeam/run-next-job.py --host <hostname> --username <user> --password <user>
At the End, yes I have a ticket open (#05949961) at suppport to find the error ;)
Socials: https://arogarth.net
HannesK
Product Manager
Posts: 14322
Liked: 2890 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: Backup Problems with AHV - Task failed unexpectedly

Post by HannesK »

Hello,
oh, that case is running already pretty long :-(

I will talk to support to make sure, we get an answer here why the tasks fail

Best regards,
Hannes
PS: we are about to release the V5 beta with distributed proxy architecture soon
arogarth
Service Provider
Posts: 81
Liked: 18 times
Joined: Sep 25, 2017 7:15 am
Location: Frankfurt/M., Germany
Contact:

Re: Backup Problems with AHV - Task failed unexpectedly

Post by arogarth » 1 person likes this post

Hey, Hannes, this sounds very good :)

The long running Ticket it's my fault too - For every new test I have to find time. And the Error was seems to be happened randomly...

The Support is great and the next steps are scheduled for Monday.

Regards,
Martin
Socials: https://arogarth.net
Post Reply

Who is online

Users browsing this forum: No registered users and 4 guests