Comprehensive data protection for all workloads
Post Reply
seanleyne
Novice
Posts: 5
Liked: never
Joined: Jan 30, 2011 11:08 pm
Full Name: Sean Leyne
Contact:

THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collissions

Post by seanleyne »

The recent posting contained the following:

"For example, in Veeam Backup & Replication, we use MD5 128bit hashes for deduplication. So, the odds of the collision happening is 1 in 2^128 (or 10^38). If we assume that there's less than a yottabyte (1 billion petabytes) of data on the planet Earth, then the odds of a hash collision with two random chunks of data are roughly 1`461`501`637`330`900`000`000`000`000 times greater than the number of bytes in the known computing universe. Obviously, this chance are way too slim to ever be concerned, as realistically this can only happen once in millions of years."

Unfortunately, MD5 collissions can occur in less than 2^21 (Hope I read the Wikipedia article correctly http://en.wikipedia.org/wiki/Comparison ... te_note-10), in fact collissions have also been found in the longer SHA-0 hash and SHA-1 hash has theoretical collissions in 2^51.

So, I would suggest that Veeam adopt a safer hash algorithm for deduplication purposes, the downside is that the hash will need more storage (224 to 512 bits, depending on the algorithm chosen).
Gostev
Chief Product Officer
Posts: 31804
Liked: 7298 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collission

Post by Gostev »

Hi Sean, I guess you did not read the posting carefully if you are still referring to hash collision attacks (in other words, purposely achieving hash collisions by generating random data chunks in the specific manner). But this was the whole point of my posting anyway? If you ever have hackers obtain write access to your virtual disks to pull something like this, you will have much bigger problems than dedupe hash collisions I suppose ;)
seanleyne
Novice
Posts: 5
Liked: never
Joined: Jan 30, 2011 11:08 pm
Full Name: Sean Leyne
Contact:

Re: THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collission

Post by seanleyne »

I think that you may have missed an important aspect, random data chunks (ie. file data) can generate identical MD5 hashes. The collission can occur without the need for an attack/hack.

Since the hash is the basis for identifying unique file blocks, any possible collission is a very bad thing.
Gostev
Chief Product Officer
Posts: 31804
Liked: 7298 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collission

Post by Gostev »

seanleyne wrote:I think that you may have missed an important aspect, random data chunks (ie. file data) can generate identical MD5 hashes. The collission can occur without the need for an attack/hack.
You are right, and as a part of my posting I did provide the non-attack probability of two different data chunks having the same MD5 hash :wink: too low even if you were to backup all the data on planet Earth with the single Veeam B&R job ;) which we all know is impossible, as we recommend maximum of 8TB of data per job today (with the default storage settings).
mongie
Expert
Posts: 152
Liked: 24 times
Joined: May 16, 2011 4:00 am
Full Name: Alex Macaronis
Location: Brisbane, Australia
Contact:

Re: THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collission

Post by mongie »

we recommend maximum of 8TB of data per job today (with the default storage settings).
Woah, really... so my 9.5TB fulls aren't really a great idea?

Never heard this before.
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collission

Post by tsightler »

If you are happy with the performance and you can perform FLRs with no problems then it's not a huge deal as the 8TB best practice recommendation leaves some "headroom" before you'll hit any trouble, but it's important to note that there is a special setting for jobs >16TB, but sometimes it's best to use this setting even as you approach 10TB especially if you keep a long backup chain. When you perform a backup, or especially an FLR, look to see how much memory the VeeamAgent.exe uses. If it's getting close to the maximum size of a 32-bit Windows process (typically around 2GB), then your getting very close to hitting the limits.
dellock6
VeeaMVP
Posts: 6165
Liked: 1971 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collission

Post by dellock6 »

Any chance in the near future the Veeam agent would become a 64bit process? I remember in the past some statements about beeing at 32bit was not a problem, but seems like those statements needs to be arranged...

Luca.
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
foggy
Veeam Software
Posts: 21138
Liked: 2141 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collission

Post by foggy »

See here > Is Veeam Backup & Replication still 32 bit?. However not sure how near this future actually is.
Gostev
Chief Product Officer
Posts: 31804
Liked: 7298 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collission

Post by Gostev » 2 people like this post

v7 will have 64-bit agents with 95% probability (unless something goes terribly wrong in the final testing).
dellock6
VeeaMVP
Posts: 6165
Liked: 1971 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collission

Post by dellock6 »

Oh, that's nice! Even if it's not among the 7 new features, is really a cool one :)

Thanks Anton!
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
Gostev
Chief Product Officer
Posts: 31804
Liked: 7298 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collission

Post by Gostev »

In fact, there are many nice technical enhancements besides those 7+2 major ones in v7. However, those are something only hardcore geeks like me and you can appreciate ;) so no marketing for those... you would have to wait for beta What's New for the full list! Or, make me spill the beans on the forum, just like you did above!

By the way, you sound like you already know about all 7 features, how? :mrgreen:
dellock6
VeeaMVP
Posts: 6165
Liked: 1971 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collission

Post by dellock6 » 1 person likes this post

Ahahaha, did not have any preview or insider info, I failed the best way to say it, it would have been "even if this feature is not going to be listed among the 7 major ones, FOR ME is a major one". Like you said, it a deeply geek one, and that's why I like it. It's going to save many headaches right now as VMs (and thus backup files) are becoming bigger and bigger.

Luca.
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
Post Reply

Who is online

Users browsing this forum: Google [Bot], Semrush [Bot] and 128 guests