Comprehensive data protection for all workloads
Post Reply
AlexHeylin
Veteran
Posts: 563
Liked: 174 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Feature Request: Please ensure EventIDs are unique (disambiguate them)

Post by AlexHeylin »

I'm sure I've requested this elsewhere, but I can't find a ticket or forum post for it.

Please can Veeam use unique EventIDs when writing to Windows event log?

By that I mean, don't use the same EventID for a success and a failure.

For example
Source "Veeam MP", eventID "40002" should mean EITHER
"The storage management session for repository RepoName has finished: FailedRepository name: Repo-NameJob type: DehydrateSesson result: FailedMode: MoveOldBackupFiles, CopyNewBackupFiles"
or
"The storage management session for repository RepoName has finished: SuccessRepository name: RepoNameJob type: DehydrateSesson result: SuccessMode: CopyNewBackupFiles"
but not both.

For those of us getting millions of events per day and searching them thousands of times each day, being able to search just by EventID (which is easily indexed) rather by substring searching in the message text (non-indexable search) this makes a huge difference to search time and server load. It would also make it possible to write a set of filters that look for "bad" events occurring in systems which don't support searching the message text.

I'm aware of at least three separate EventIDs which Veeam routinely logs which are ambiguous in one form or another (such as as backup restore point was created vs a replication restore point was created).

I realise the work / problems involved in changing existing eventids, but please don't make the problem any worse by adding any more in future.

Thanks

Alex
HannesK
Product Manager
Posts: 15598
Liked: 3445 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: Feature Request: Please ensure EventIDs are unique (disambiguate them)

Post by HannesK » 1 person likes this post

Hello,
just to understand it better... the "Level" of Event 40002 has Error / Warning / Information. Can these information not be used for your use-case?

Best regards,
Hannes
wishr
Veteran
Posts: 3077
Liked: 455 times
Joined: Aug 07, 2018 3:11 pm
Full Name: Fedor Maslov
Contact:

Re: Feature Request: Please ensure EventIDs are unique (disambiguate them)

Post by wishr »

Hi Alex,

Thank you for the feedback.

These events are used for integration with monitoring Veeam products and some of them have been created a long time ago. While rewriting the old events may be extremely costly (due to changes required in several products), we'll try to take your feedback into account when adding new events.
AlexHeylin
Veteran
Posts: 563
Liked: 174 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: Feature Request: Please ensure EventIDs are unique (disambiguate them)

Post by AlexHeylin »

Hi both,

We'll look at using the "Level" for this particular event, but thanks for taking the feedback onboard as a general point.

Thanks

Alex
wishr
Veteran
Posts: 3077
Liked: 455 times
Joined: Aug 07, 2018 3:11 pm
Full Name: Fedor Maslov
Contact:

Re: Feature Request: Please ensure EventIDs are unique (disambiguate them)

Post by wishr » 1 person likes this post

Alex,

I apologize for the confusion. I've checked with the integration team and it turned out these events have been created based on feedback from customers specifically for 3rd-party integrations. We'll take a look at that.

Thanks
AlexHeylin
Veteran
Posts: 563
Liked: 174 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: Feature Request: Please ensure EventIDs are unique (disambiguate them)

Post by AlexHeylin »

Just a quick update that this use of the same event ID for processing a VM for backup and processing the VM for replication has caused us an issue again today. Based on the information in the logs, it's not possible to identify if a VM was processed via the backup job, or via the replication job. Some VMs are processed by both backup and replication jobs, for different reasons and targeted at different sites / storage / hypervisors.

Any news on disambiguating these event IDs, or if that's particularly problematic - adding the name of the job that processed the VM into the message text so we could parse it out - or even a straight "TYPE: [Backup|Replication]" field to the text.... something / anything we could work with?

Thanks
wishr
Veteran
Posts: 3077
Liked: 455 times
Joined: Aug 07, 2018 3:11 pm
Full Name: Fedor Maslov
Contact:

Re: Feature Request: Please ensure EventIDs are unique (disambiguate them)

Post by wishr »

Hi Alex,

No updates so far. Don't you have the job type mentioned in the message text or within the EventData section of the XML of event 40002? Or are you asking about another event?

As a general recommendation, I would strongly suggest using REST API of VBR or EM for integration with 3rd-party solutions because we've specifically built them for such scenarios and the data there contains all the required elements. Also, it's much easier for us to apply changes to the REST API based on your feedback, thanks to the native versioning mechanisms. Changing the existing WEL events may cause issues with existing integrations.

Thanks
AlexHeylin
Veteran
Posts: 563
Liked: 174 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: Feature Request: Please ensure EventIDs are unique (disambiguate them)

Post by AlexHeylin »

Hi @Wishr,

Sorry it's taken me a while to come back to this. I'm working on a VBR server today which shows this issue nicely.
It has jobs for backup, backup copy, and replication set up. Any given VM (or agent machine) might be included in Backup, Backup Copy, and / or Replication jobs.
While the JOBS can be distinguished from the JOB event IDs and / or event text, the tasks they run cannot always be reliably identified as to what actually happened and sometimes the parent job which ran them is not identifiable either. Sometimes we can achieve this through our naming conventions which then appear in the message, but this is problematic for many reasons and should not be needed.

Here's some examples with my comments
EventID: 10010
Message: VM 'My VM Name' restore point has been created.
Is this a backup, backup copy, or replication restore point? Which job does it belong it? Where was it created? According to https://www.veeam.com/kb1834 it should be "Restore point created" - which is ambiguous.

EventID: 150
Message: VM My VM Name task has finished with 'Success' state.
Is this a backup, backup copy, or replication restore point? Which job does it belong it? Where was it created? According to https://www.veeam.com/kb1834 it should be "Backup task finished" - which is not actually true as this is also generated for replication tasks.

Here's what we would like to see - separate easily identifiable events for:
  • Backup Job 'job name' started = Event ID 110 shared with Replication Job, but message text can be used to disambiguate
  • Backup Copy Job 'job name' started = Event ID 410
  • Replication Job 'job name' started = Event ID 110 shared with Backup Job, but message text can be used to disambiguate
  • Backup Task for VM 'My VM Name' completed with 'status' = Event ID 150 Seems to clash with Replication Task Completed without any text that allows disambiguation (adding [backup|replication] and the job name would be helpful)
  • Backup Copy Task for VM 'My VM Name' completed with 'status' = Event ID 450 (adding the job name would be helpful)
  • Replication Task for VM 'My VM Name' completed with 'status' = I'm inferring from our data that this generates Event ID 150 but without any text that allows disambiguation from Backup tasks (adding [backup|replication] and the job name would be helpful)
  • Backup Job 'job name' completed with 'status' = Event ID 490
  • Backup Copy Job 'job name' completed with 'status' = Event ID 190 shared with Replication Job, but message text can be used to disambiguate
  • Replication Job 'job name' completed with 'status' = Event ID 190 shared with BackupJob, but message text can be used to disambiguate
Also Event ID 10010 is ambiguous as to if that's a backup, backup copy, or replication RP and which job it belongs to.
Also Event ID 0 ("Session Job Name has been completed.") is near useless (doesn't give the status or job name as text) and I hope only there for legacy compatibility.
Across all the messages quote usage appears to be random as to if they're used or not. Quotes might be better, but please don't change it now or you'll break everyone's existing parsing. Perhaps just pick one and stick to it in future?

In case it's not clear from what we're asking for, in many instances we're checking both that the Job completed and that the Tasks for specific machines completed. The reason for this is to ensure those machines are being processed and haven't been removed from the job, or not run for some reason. These checks are separate for backup, backup copy, and replication tasks.

For the purpose of this discussion assume that the event logs are the only data available to do these automated checks, and that we're doing this at scale across multiple servers at multiple clients (we're an MSP).

Thanks
AlexHeylin
Veteran
Posts: 563
Liked: 174 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: Feature Request: Please ensure EventIDs are unique (disambiguate them)

Post by AlexHeylin »

To highlight the need for the ability to check the backup, backup copy, and replication tasks for each VM - while looking at something else I realised we're backing up the Hyper-V replica of a machine (replica_VM-NAME) rather than the actual VM (VM-NAME). This is likely down to human error and a failover having been performed without updating the jobs in VBR. If we were able to reliably check the VM tasks as I requested above, our daily checks would have identified this problem the morning after the mistake was made.
wishr
Veteran
Posts: 3077
Liked: 455 times
Joined: Aug 07, 2018 3:11 pm
Full Name: Fedor Maslov
Contact:

Re: Feature Request: Please ensure EventIDs are unique (disambiguate them)

Post by wishr »

Hi Alex,

Thank you for the feedback.

Could you please explain why you are not considering using REST API instead of the WEL events for that? Not only REST API would help you avoid all the aforementioned issues in the currently available VBR version, but also open lots of additional opportunities for managing your backups. We have specifically designed REST API in our products to cover usage scenarios similar to yours.

By the way, the KB article you referring to is not being updated since the functionality it had been made for is discontinued.

Thanks
AlexHeylin
Veteran
Posts: 563
Liked: 174 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: Feature Request: Please ensure EventIDs are unique (disambiguate them)

Post by AlexHeylin »

Hi @wishr,

As an MSP with clients that range from "full service" where we do everything to "monitoring only" where we have very limited control over the setup, the answer to that is very simple. We already have a MySQL DB that contains all the Veeam events from ALL the systems for ALL the clients we look after. Parsing that data is "simple" compared to having to build a system to interrogate the REST API on each VBR server (which won't all be running the same version), then transport that data back to us securely over the Internet which is a LOT more work than parsing data we already have in the MySQL DB.

We also track other backup solutions and various daily health checks using the same database, system, and codebase, so splitting this out to have to deal with Veeam data in a different way is not something we'd consider unless it was easy and more reliable than the data we have. Sadly neither of those are true. Many other systems used by MSPs also rely on parsing events to track backup, backup copy, and replication jobs (though right now, they can't do that accurately and granularly due to the event ambiguity).

While you might think the REST API was designed for this scenario - it just wasn't. For a start, our central system can't connect to the REST API on tenant systems directly - the internet and at least one firewall are in the way and we certainly could not change that. Even if it could connect, authenticating and accessing the API on number of customer machines is far more complex that parsing data we already have. Ironically Veeam used to have a solution pre-built for this in the shape of the Veeam plugin for LabTech which was retired despite my protests about lack of replacement. That presented the data we need in the database we need it in, and in a format / schema we could easily work with. We were told VAC would be the replacement. It's not a replacement for many reasons. Even using the VSCP plugin for LabTech (CW Automate) to pull the data over into the LabTech database where we want it is just not a suitable solution because the data in VSPC is incomplete and in our experience untrustworthy. Add to that the very complex DB structure the plugin creates and it becomes virtually impossible to deal with the data at the DB level (which is where we need to deal with it). When I asked Veeam for a copy of the schema for the data inserted into the LabTech DB I was told "don't use it, we might change it any time" - when I said "I'll take the risk please provide the schema so we can use the data" my request was declined. The schema is so complex that after days of trying to reverse engineer it I still couldn't reliably relate some tables to each other making the data useless to me.

The nearest thing Veeam have for this scenario is VAC / VSPC, which we're running but it can't be deployed to all clients (unlike our RMM which already collects the event logs). We've tried using VSPC for this and found it a complex and unhelpful layer of abstraction from the actual data. We also found it massively over-opinionated with a terrible tendency to state as a fact something failed for a completely different reason to why it actual failed, and while doing so hiding the event message which would often make it very clear what the cause was (if only we could see it). Add to that that often features / changes are released in VBR which then either break something, or don't work / prevent data being visible in VSPC until at least vNext or vNext+1 of VSPC and we gave up trying to use VSPC for anything other than managing VCC services for clients where we're providing VCC services (not all our clients, not even all our clients who run VBR).

I appreciate you're trying to help - but the help we require is for the log events to be unambiguous, not to try and get the data another way which we already know either won't work reliably or would require a huge amount of ongoing software development to be possible at all. Veeam have already led us down all these other roads, and only after we've spent a lot of time trying to make it do what we need we've discovered it isn't suitable. Event logs might be basic, but they're easy and they work well compared to the other options in this scenario.
By the way, the KB article you referring to is not being updated since the functionality it had been made for is discontinued.
That's true, but I couldn't find the actually authoritative document on this. Do you know where it is?
Trouble finding the correct doc can be a challenge in Veeam's doc system, the search often returns many results with low relevance - or returns few results none of which are the correct doc.
It doesn't change anything I said about the ambiguity of some of those event IDs.

Thanks
wishr
Veteran
Posts: 3077
Liked: 455 times
Joined: Aug 07, 2018 3:11 pm
Full Name: Fedor Maslov
Contact:

Re: Feature Request: Please ensure EventIDs are unique (disambiguate them)

Post by wishr »

Hi Alex,

Many thanks providing feedback in such a detailed way. This is definitely a good example of what we are looking for on the forums.

I got the point. I'll speak to internally to the person responsible for VBR events. While I cannot guarantee that all we can solve quickly the issues with the events you outlined above, but we'll definitely check what we can do. Changing the existing events is primarily painful for us because there are at least three Veeam products using them in conjunction with the WMI interface. The products take data from events, then look into WMI using the IDs provided in the event to get additional data. Therefore, some events may not provide the information you expect them to provide - integrated products take it from WMI. Changing an event which is used by these products implies applying changes in all three products to support that change unless it's a simple change of the event message. I would say, this is the biggest concern as the timelines of the product updates differ. Also, applying changes in three products altogether adds additional overhead to the QC team because they will have to re-test the existing functionality relying on the changed events which may affect the possibility to deliver new features which most customers are waiting. Anyway, we understand your pain and will see what we can do here.

Regarding the event descriptions, they are all included in the corresponding user guide section. If anything is not correct, we would appreciate if you could report all discrepancies using the "Send feedback" button at the bottom of the corresponding web page where you've found an issue.

Again, thanks for the detailed feedback.
AlexHeylin
Veteran
Posts: 563
Liked: 174 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: Feature Request: Please ensure EventIDs are unique (disambiguate them)

Post by AlexHeylin »

Hi Wishr,

Thanks very much for understanding, and the feedback on my feedback. I suggest it might be better and easier to add additional more detailed unambiguous events and leave the existing ones in place. There appears to be precedence for this as it looks like this has already been done. IIRC, some events already generate an event with ID 100 - 999 (legacy event), and also generate an event with ID > 10000 (newer more detailed event). We're not expecting this improvement to happen overnight in the next release, but it would be great if the principle can be accepted so any new events which are added are already compliant and the problematic existing events can be dealt with over time. Event ID 150 being the biggest pain point, it would be great if that could be done first.

Thanks for the link to the official event IDs. If I find any variance I'll report it.

Thanks again.
wishr
Veteran
Posts: 3077
Liked: 455 times
Joined: Aug 07, 2018 3:11 pm
Full Name: Fedor Maslov
Contact:

Re: Feature Request: Please ensure EventIDs are unique (disambiguate them)

Post by wishr » 1 person likes this post

Alex,

Agreed. Adding new events is one of the possible ways. We'll discuss that internally.

Thanks
Egor Yakovlev
Product Manager
Posts: 2632
Liked: 752 times
Joined: Jun 14, 2013 9:30 am
Full Name: Egor Yakovlev
Location: Prague, Czech Republic
Contact:

Re: Feature Request: Please ensure EventIDs are unique (disambiguate them)

Post by Egor Yakovlev »

Hi Alex,

Thanks for interesting ideas!

My few cents here:
- for event ID 110 and 150 you can use third <Data> field as a trigger for Job Type clearance: "0" means "Backup", and "1" means "Replication".
- for event ID 190 Job Type is a fourth <Data> field.

/Cheers!
AlexHeylin
Veteran
Posts: 563
Liked: 174 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: Feature Request: Please ensure EventIDs are unique (disambiguate them)

Post by AlexHeylin »

Hi Egor,
I wrote a detailed response nine hours ago and forgot to click Submit. :roll: Here's the short version.

Our RMM, like many systems, only accesses the "classic" logs information. We have an additional limitation that it only collects the first 1000 (usually more like 900) characters of the Message field. It doesn't access the XML data you're referring to in any way. You're the first person that I've ever had refer to that XML data as having the slightest real world use. In 25 years as a systems engineer I've never seen anything read it, or expect that anything else reads it.

So while what you suggest might be technically correct - it's of no use to use as we can't collect that data. As such, my request still stands please.
Thanks
Post Reply

Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 38 guests