Agentless, cloud-native backup for Amazon Web Services (AWS)
Post Reply
felixasiapac
Novice
Posts: 8
Liked: never
Joined: May 09, 2023 2:01 am
Full Name: Felix
Contact:

Appliance sizing

Post by felixasiapac »

Veeam support case ID: 06026284

Hi Veeam R&D team,

We understood from the tech support that the guideline at https://bp.veeam.com/vbcloud/guide/aws/ ... iance.html provided best practices as guidelines that might not fit to all cases in actual environment.

We had a client in which their environment needed higher spec than what were provided as guidelines there.

In general, good guidelines, the recommended spec should not be deviate too far from what are advertised in the sizing documentation.

If it needs 2x or 3x higher, they still can justify it.

However, the actual case we got on that particular client, is that they needed 5x to 10x more resources in which it is far from what is stated in the guidelines

This gives indication to them that the software does not work as per advertised or it may have some bug or configuration issue that makes the Veeam not working optimally.

This client had got system hang/frozen few times and from the load average we saw during the time of occurrence, we saw there were 7 subprocesses on postgres in high resource demand in which t3.medium that only has 2 vCPUs were able to handle the load average >11

Image

The tech support (case ID: 06026284) also provided same highlight that more vCPUs are needed for the VBAws appliance.

As Veeam partner, we tried to convince this specific user about their resource demand with the case update info, however, we'd like the guideline at https://bp.veeam.com/vbcloud/guide/aws/ ... iance.html to be updated with more detailed calculation on the vCPUs and some disclaimer to prevent doubt from our users who may not have chance to read the ticket case 06026284 from us.

This is because any users can use google to find the page https://bp.veeam.com/vbcloud/guide/aws/ ... iance.html and that info had misled some of them to think that the VBA t3.medium could handle workload up to 1000, while our finding on this client environment, they have 200+ workloads and the system is indicating to require 12 vCPUs during the failure.

- Number of policies: 15
- Total workload: 209
- t3.medium = 2 vCPUs, 4 GB RAM (general recommendation based on the "Advised number of workloads" on the sizing page)
- t3.large = 2 vCPUs, 8 GB RAM (general calculation based on sizing page: (200 * 15) + (3 * 209) = 3627 = ~3.7 GiB)
- c5.2xlarge = 8 vCPUs, 16 GB RAM (actual compute demand we found during the system/backup failure that showed high load average >11)

In further troubleshooting, we saw the postgresql log (/var/lib/postgresql/12/main/log) had been generating error "Broken pipe" in every seconds or minutes regardless the backup activities were busy or idling.

Image

Would you also please help to advice/check if those "Broken pipe" errors are also seen on your side and if those errors are caused by lack of vCPUs or due to some configuration issues?

Thank you.
nielsengelen
Product Manager
Posts: 5634
Liked: 1181 times
Joined: Jul 15, 2013 11:09 am
Full Name: Niels Engelen
Contact:

Re: Appliance sizing

Post by nielsengelen »

Hi Felix,

We are currently updating our sizing guide to add more insight for vCPU usage as well as the new limitations/maximums related to VB for AWS v6. They should be online very soon which hopefully will assist you here.

Could you tell me if this client is running v5 or v6 already?
Personal blog: https://foonet.be
GitHub: https://github.com/nielsengelen
felixasiapac
Novice
Posts: 8
Liked: never
Joined: May 09, 2023 2:01 am
Full Name: Felix
Contact:

Re: Appliance sizing

Post by felixasiapac »

Hi Niels,

Thanks for the info.

The client in the screenshot above is running v5, we were able to convince them last week to upgrade the instance type to c5.xlarge (4 vCPUs 8 GB RAM).
After the instance type upgrade, we no longer observed system hang, frozen, and OOM.
However, we still see the error "Broken pipe" constantly appended in the /var/lib/postgresql/12/main/log every seconds or minutes there.

And last month, we had other client who had upgraded VBA from v5 to v6 together with instance type upgrade from t3.medium to c5.2xlarge (8 vCPUs 16 GB RAM). Earlier before the upgrade, that client also experienced same system hang, frozen and OOM issues, with same error "Broken pipe" observed in the /var/lib/postgresql/12/main/log.

And after upgrade, the hang/frozen/OOM issue on that client was gone, and the error "Broken pipe" was also gone.
So, we do not know whether the error "Broken pipe" was resolved as of result of VBA upgrade from v5 to v6, or as of result of instance type upgrade from t3.medium to c5.2xlarge (because they performed those VBA and instance type upgrade in short period of time last month)

Do you experience/see the "Broken pipe" errors (/var/lib/postgresql/12/main/log) at your side with VBA v5 and t3.medium?
Are those errors are caused by lack of vCPUs or caused by configuration related in the VBA?
Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest