RESTful knowledge exchange
Post Reply
rdock
Novice
Posts: 4
Liked: never
Joined: Apr 14, 2021 5:03 pm
Full Name: Rob Dock
Contact:

Question Regarding backupFiles DataSize and BackupSize fields

Post by rdock »

Hello!

I'm looking to automate billing reports using the API to grab backup utilization from the Enterprise Manager server. I'd like to use the API to avoid having problems with manual processes when a new customer is added or a backup server is added/decommed.

Our setup involves separate jobs tied to the customer across several backup servers. My aim has been:
  • Enumerate the backup servers using

    Code: Select all

    /api/query?type=BackupServer
  • Pull out the repositories using

    Code: Select all

    /api/backupServers/{id}/repositories
  • Get the BackupReferenceList for each repository using

    Code: Select all

    /api/repositories/{id}/backups
  • List out the BackupFiles for the list using

    Code: Select all

    /api/backups/{id}/backupFiles?format=Entity
It's here where I get stuck - I can see the BackupSize, DataSize, DeduplicationRatio, and CompressRatio on each job, but I'm unsure how they all fit in together. In my testing, it seems like the sum of all of the DataSize fields across each of the BackupFiles is closest to what the Enterprise Manager web UI lists as the "Total Size" column under Reports > {backupServer} for each job. This number is how we're currently billing.

The problem is that it's not quite the same; it's close, but I have to assume some rounding is happening as the API results from adding the DataSize fields together is just a bit more than what the web UI reports for "Total Size". Or perhaps I'm not using the correct values?

Another question comes in with compression - if I multiply the DataSize by the CompressRatio percentage [DataSize(1/CompressRatio)], I'm still a little bit off/under from what the "Total Size" is reporting on the web UI. Is this how compression is calculated by the Enterprise Manager server (or the individual backup server if that's where the data is coming from)?

In conclusion:
  • What do the BackupSize and DataSize numbers actually represent?
  • How do I calculate in the CompressRatio (and DeduplicationRatio if necessary) to get the Total Size of the selected job?
  • Is there an easier way to get the Total Size of a selected job using the API?
Thank you in advance!
oleg.feoktistov
Veeam Software
Posts: 1912
Liked: 635 times
Joined: Sep 25, 2019 10:32 am
Full Name: Oleg Feoktistov
Contact:

Re: Question Regarding backupFiles DataSize and BackupSize fields

Post by oleg.feoktistov » 1 person likes this post

Hi Rob,

- BackupSize is the actual size of each backup file kept on backup repository upon backup session completion.
- DataSize is the amount of data read from the source VM on each task session run.
- TotalSize value you see under Reports > Server > Jobs View is actually a sum of vm sizes determined on each task session run. These values are reflected in TotalSize property in the corresponding task sessions model retrieved from /backupTaskSessions/{id} endpoint.

I don't think you need to calculate neither compression, nor deduplication ratio to get the information you are after.

Hope it helps,
Oleg
rdock
Novice
Posts: 4
Liked: never
Joined: Apr 14, 2021 5:03 pm
Full Name: Rob Dock
Contact:

Re: Question Regarding backupFiles DataSize and BackupSize fields

Post by rdock »

Oleg,

Thank you for your reply! I found the TotalSize values you stated would be in the /backupTaskSessions/{id} location.

Right now, I'm currently trying to make sure I can get the TotalSize values for each of the jobs on each of the backup servers in our Enterprise Manager. My method includes:

- Query for BackupTaskSessions:

Code: Select all

/api/query?type=BackupTaskSession&sortAsc=name&pageSize=1
(we have over 950k task sessions, so I'm not sure how this is truly going to work)

- Get the BackupTaskSession IDs and navigate to them individually:

Code: Select all

/api/backupSessions/{id}
- Get the BackupTaskSessionReferenceList and navigate to each entity individually:

Code: Select all

/api/backupSessions/{id}/taskSessions?format=Entity
- Parse the xml to get the TotalSize field for each taskSession and add them together to get the total for that job identified by the {id} in the request.

Given that there are so many task sessions, is there a better way to formulate a query against the API such that I can filter for the name of the backupSession and see the TotalSize fields without having to write code around parsing for the individual BackupTaskSession ID for all 950k sessions?

Thanks again!

-- Rob
oleg.feoktistov
Veeam Software
Posts: 1912
Liked: 635 times
Joined: Sep 25, 2019 10:32 am
Full Name: Oleg Feoktistov
Contact:

Re: Question Regarding backupFiles DataSize and BackupSize fields

Post by oleg.feoktistov »

I see what you mean, but I'm afraid it is not feasible with EM REST. This summarised value is not held in the EM database, but calculated during middleware processing. That's how you see it in the jobs report view.

As for your approach, you could also go vice versa:
- Get jobs list from /api/jobs.
- Get sessions list from BackupSessionReferenceList.
- Get task sessions list for each session.
- Acquire TotalSize values for each task session and add them together.

Looks more structured, I think.

Thanks,
Oleg
oleg.feoktistov
Veeam Software
Posts: 1912
Liked: 635 times
Joined: Sep 25, 2019 10:32 am
Full Name: Oleg Feoktistov
Contact:

Re: Question Regarding backupFiles DataSize and BackupSize fields

Post by oleg.feoktistov » 1 person likes this post

By the way, if you are precisely after TotalSize value displayed for each job under Jobs View, you won't need to query every task session you have as TotalSize property displays the value calculated on the last session run. So, eventually it's about getting the last backup session for each job and calculating a sum of TotalSize values from child task sessions.
I threw it all together in a Node.js script to obtain the info for one sample job. That's what I usually use for testing.
It's a bit clumsy, but to give you the idea:

Code: Select all

process.env["NODE_TLS_REJECT_UNAUTHORIZED"] = 0;
const request = require('request');
const config = require('./config.json');
const hostAddr = config.hostAddr;
const jobsUrl = `https://${hostAddr}:9398/api/jobs`;
let name = 'Backup Job';
let header = {
    'Accept': 'application/json',
    'Content-Type': 'application/json',
    'Cookie': `X-RestSvcSessionId=${config.sessionId}`
}

Object.options = function options (url) {
    return options = {
        'method': 'GET',
        'url': url,
        'headers': header
    }
}

Object.sortDate = function sortDate(items) {
    let latestDatetime = new Date(Math.max.apply(null, items.map(item => {
        return new Date(item.CreationTime); })));
    let latestObject = items.find(item => {
        let date = new Date(item.CreationTime);
        if (date = date.getTime() == latestDatetime.getTime()) {
            return item;
        }
  });
  return latestObject;
}

request(Object.options(jobsUrl), function(error, response) {
    if (error) throw new Error(error);
    let parsedBody = JSON.parse(response.body);
    let jobsArray = parsedBody["Refs"];
    jobsArray.forEach(function(job) {
        if (job.Name === name) {
            job["Links"].forEach(function(ref) {
                if(ref.Type === 'BackupJobSessionReferenceList') {
                    let sessionsUrl = ref.Href;
                    request(Object.options(sessionsUrl), function(error, response) {
                        if (error) throw new Error(error);
                        let parsedBody = JSON.parse(response.body);
                        let sessions = parsedBody["Refs"];
                        sessions.forEach((session) => {
                            let creationTime = session.Name.split('@')[1];
                            session.CreationTime = creationTime;
                        });        
                        let lastSession = Object.sortDate(sessions);
                        let taskSessionsUrl;
                        lastSession["Links"].forEach(
                            (link) => {
                                if (link.Type == 'BackupTaskSessionReferenceList') {
                                    taskSessionsUrl = link.Href;
                                }
                            }
                        );
                        request(Object.options(taskSessionsUrl), function(error, response) {
                            if (error) throw new Error(error);
                            job.TotalSize = null;
                            let parsedBody = JSON.parse(response.body);
                            let totalSize = null;
                            parsedBody["Refs"].forEach((item) => {
                            let taskSessionUrl = `${item.Href}?format=Entity`;
                            request(Object.options(taskSessionUrl), function(error, response) {
                                if (error) throw new Error(error);
                                let parsedBody = JSON.parse(response.body);
                                job.TotalSize += parsedBody.TotalSize/(1024**3);
                                console.log(job);
                             
                            });
                        });
                        })
                    }); 
                }
            })
        }
    });
});
Thanks,
Oleg
oleg.feoktistov
Veeam Software
Posts: 1912
Liked: 635 times
Joined: Sep 25, 2019 10:32 am
Full Name: Oleg Feoktistov
Contact:

Re: Question Regarding backupFiles DataSize and BackupSize fields

Post by oleg.feoktistov »

A correction from my side to be precise. DataSize is not the amount of data read from the source, but the amount of allocated + sparsed blocks in a storage before compression. Hence, though Read size and data size can correlate, those are different metrics. Thanks!
Post Reply

Who is online

Users browsing this forum: No registered users and 3 guests