Maintain control of your Microsoft 365 data
Post Reply
mkevenaar
Veeam Vanguard
Posts: 41
Liked: 19 times
Joined: May 14, 2019 2:34 pm
Full Name: Maurice Kevenaar
Location: Uithoorn
Contact:

[BUG] API Health endpoint

Post by mkevenaar »

Hi all,

I would like to report what appears to be a bug or design issue with the Veeam Backup for Microsoft 365 REST API /Health endpoint.

According to the documentation, the /v8/Health endpoint returns the health status of the NATS server and the PostgreSQL configuration database. The response model also appears to support returning an overall Unhealthy state with individual component entries.

Observed behavior:
  • When all components are healthy, the endpoint returns a normal health response.
  • Currently, only nats and configurationDB are exposed.
  • When one of these components becomes unavailable, the /Health endpoint itself becomes unavailable.
Expected behavior:
  • The /Health endpoint should remain reachable as long as the REST API service itself is running.
  • It should return HTTP 200 or a newly introduced 503 (Service Unavailable) response with a body showing the affected component as Unhealthy.
  • The endpoint should not depend so tightly on the same components it is supposed to report on.
Why this is a problem:

The purpose of a health endpoint is to allow external monitoring systems to detect degraded or unhealthy components. If the endpoint disappears or fails completely when a monitored dependency is down, monitoring cannot distinguish between:
  • REST API service failure
  • NATS failure
  • configuration database failure
  • network or authentication issue
This makes the endpoint less useful for operational monitoring and alerting.

Suggested improvement:

The /Health endpoint should degrade gracefully and return a structured response similar to:

Code: Select all

{
  "status": "Unhealthy",
  "entries": {
    "nats": {
      "status": "Healthy"
    },
    "configurationDB": {
      "status": "Unhealthy",
      "description": "Unable to connect to PostgreSQL configuration database"
    }
  }
}
That would make the endpoint useful for monitoring even during partial component failures.

How to reproduce:
  1. Open the Veeam Backup for Microsoft 365 REST API endpoint:

    Code: Select all

    /v8/Health
  2. Verify the normal healthy response:

    Code: Select all

    {
      "status": "Healthy",
      "entries": {
        "nats": {
          "status": "Healthy"
        },
        "configurationDB": {
          "status": "Healthy"
        }
      }
    }
    
  3. Stop the NATS service or make the PostgreSQL configuration database unavailable.
  4. Query the /v8/Health endpoint again.
  5. Observe that instead of returning an Unhealthy status for the affected component, the endpoint itself becomes unavailable or returns an error.
Can someone confirm whether this is expected behavior or should this be treated as a bug?
Post Reply

Who is online

Users browsing this forum: AdsBot [Google], Polina and 131 guests