- Posts: 4
- Liked: never
- Joined: Feb 13, 2019 4:05 pm
- Full Name: Vincent Benoit
Let's say that I wanna run whether DLaaS using the AWS EMR platform or a Data Lake on-premise using Hadoop and look for the most efficient way to leverage my Veeam backup data.
So what about presenting Veeam backup files to an on-premise Hadoop using Veeam's upcoming Data Integration API via the so called VeeamFLR folder as a primary data source, and on request copy data from the Veeam Repository to HDFS or S3, versus presenting data to AWS EMR using Veeam Cloud Tiered/Direct Copied data resting on S3 to be directly used as a Data Lake data source?
Ultimately, would there be any further advantages (or disadvantages, except high AWS operating costs) of presenting Veeam backup files to AWS EMR from a running Veeam B&R instance on EC2 or on VMware on AWS, that would hold backup files and present data via Veeam Data Integration API, then on request copy data from the Veeam repository to HDFS or S3 on AWS?
TM West Switzerland
- Product Manager
- Posts: 5486
- Liked: 1158 times
- Joined: Jul 15, 2013 11:09 am
- Full Name: Niels Engelen
As this feature is still in development and fine-tuned, this is a bit hard to tell. Cost is for sure one thing that you'll have when using AWS EMR - it will mostly depend on the number of disks/data you want to present. Presenting it to local Hadoop clusters will be cheaper and most likely be faster but this would require some testing more close to GA to confirm.