on one AHV Cluster we have over 20 Nodes and over 700 VMs - Backup using VEEAM AHV.
First of all with Proxy v4 backups are running much better and faster, good job, if they are running
In total all VMs are splitted into multiple Jobs, 5 in total. A Job can contains over 200 Objets.
Always, if there are runnign more that one job, they failed with "Task failed unexpectedly".
The Cluster has no errors, Network is fine and sometimes the jobs finished successfully...
The AHV Proxy ressources are also more than needed.
In one of the last steps I disabled the Job scheduler and started each job one by one. I did this over some days, and it workes perfectly.
Now I wrote a small python-script, which triggered the next job if no one is running.
Code: Select all
#!/bin/python3
import argparse
import logging
import requests
from datetime import datetime
import sys
from urllib3.exceptions import InsecureRequestWarning
# Suppress only the single warning from urllib3 needed.
requests.packages.urllib3.disable_warnings(category=InsecureRequestWarning)
parser = argparse.ArgumentParser(description='')
parser.add_argument('--host', required=True)
parser.add_argument('--username', default='veeam')
parser.add_argument('--password', default='veeam')
parser.add_argument('--log', default="info")
ARGS = parser.parse_args()
AHV_HOST=ARGS.host
AHV_USER=ARGS.username
AHV_PASSWORD=ARGS.password
log = logging.getLogger(__name__)
log.addHandler(logging.StreamHandler())
log.setLevel(ARGS.log.upper())
payload = {
'grantType': 'password',
'userName': AHV_USER,
'password': AHV_PASSWORD,
'longLivedRefreshToken': False
}
res = requests.post(f'https://{AHV_HOST}/api/oauth2/token', data=payload)
if res.status_code != 200:
raise Exception(res)
log.info('Login successfull')
TOKEN = res.json()['accessToken']
session = requests.Session()
session.auth = ("", "")
session.headers['Content-Type'] = 'application/json'
session.headers['Accept'] = 'application/json'
session.headers['Authorization'] = f'Bearer {TOKEN}'
log.debug(session.headers)
res = session.get(f'https://{AHV_HOST}/api/v4/jobs', auth={})
if res.status_code != 200:
raise Exception(res)
jobs = res.json()
jobs = jobs['results']
def sort_date(obj):
date = obj['lastRunUtc'].split('.')[0]
timestamp = datetime.strptime(date, '%Y-%m-%dT%H:%M:%S').timestamp()
log.debug(timestamp)
return timestamp
running_jobs = list(filter( lambda x: x['status'] == 'Running', jobs) )
if len(running_jobs) > 0:
log.warning('A Job is running - "%s"', running_jobs[0]["name"])
sys.exit(0)
next_job = sorted(jobs, key=sort_date)[0]
log.info('Start next job "%s"', next_job["name"])
res = session.post(f'https://{AHV_HOST}/api/v4/jobs/{next_job["id"]}/start', auth={})
if res.status_code != 202:
raise Exception(res)
Code: Select all
*/15 * * * * python3 /home/veeam/run-next-job.py --host <hostname> --username <user> --password <user>