Discussions related to using object storage as a backup target.
Post Reply
Veeancent
Novice
Posts: 4
Liked: never
Joined: Feb 13, 2019 4:05 pm
Full Name: Vincent Benoit
Contact:

Data Lake using Veeam Data Integration API vs Veeam Tiered/Copied data on AWS S3

Post by Veeancent »

Hi there,

Let's say that I wanna run whether DLaaS using the AWS EMR platform or a Data Lake on-premise using Hadoop and look for the most efficient way to leverage my Veeam backup data.

So what about presenting Veeam backup files to an on-premise Hadoop using Veeam's upcoming Data Integration API via the so called VeeamFLR folder as a primary data source, and on request copy data from the Veeam Repository to HDFS or S3, versus presenting data to AWS EMR using Veeam Cloud Tiered/Direct Copied data resting on S3 to be directly used as a Data Lake data source?

Ultimately, would there be any further advantages (or disadvantages, except high AWS operating costs) of presenting Veeam backup files to AWS EMR from a running Veeam B&R instance on EC2 or on VMware on AWS, that would hold backup files and present data via Veeam Data Integration API, then on request copy data from the Veeam repository to HDFS or S3 on AWS?

Many thanks

Vincent
TM West Switzerland
nielsengelen
Product Manager
Posts: 5796
Liked: 1215 times
Joined: Jul 15, 2013 11:09 am
Full Name: Niels Engelen
Contact:

Re: Data Lake using Veeam Data Integration API vs Veeam Tiered/Copied data on AWS S3

Post by nielsengelen »

Hi Vincent,

As this feature is still in development and fine-tuned, this is a bit hard to tell. Cost is for sure one thing that you'll have when using AWS EMR - it will mostly depend on the number of disks/data you want to present. Presenting it to local Hadoop clusters will be cheaper and most likely be faster but this would require some testing more close to GA to confirm.
Personal blog: https://foonet.be
GitHub: https://github.com/nielsengelen
Post Reply

Who is online

Users browsing this forum: Gostev and 21 guests