-
- Novice
- Posts: 5
- Liked: never
- Joined: Jan 30, 2011 11:08 pm
- Full Name: Sean Leyne
- Contact:
THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collissions
The recent posting contained the following:
"For example, in Veeam Backup & Replication, we use MD5 128bit hashes for deduplication. So, the odds of the collision happening is 1 in 2^128 (or 10^38). If we assume that there's less than a yottabyte (1 billion petabytes) of data on the planet Earth, then the odds of a hash collision with two random chunks of data are roughly 1`461`501`637`330`900`000`000`000`000 times greater than the number of bytes in the known computing universe. Obviously, this chance are way too slim to ever be concerned, as realistically this can only happen once in millions of years."
Unfortunately, MD5 collissions can occur in less than 2^21 (Hope I read the Wikipedia article correctly http://en.wikipedia.org/wiki/Comparison ... te_note-10), in fact collissions have also been found in the longer SHA-0 hash and SHA-1 hash has theoretical collissions in 2^51.
So, I would suggest that Veeam adopt a safer hash algorithm for deduplication purposes, the downside is that the hash will need more storage (224 to 512 bits, depending on the algorithm chosen).
"For example, in Veeam Backup & Replication, we use MD5 128bit hashes for deduplication. So, the odds of the collision happening is 1 in 2^128 (or 10^38). If we assume that there's less than a yottabyte (1 billion petabytes) of data on the planet Earth, then the odds of a hash collision with two random chunks of data are roughly 1`461`501`637`330`900`000`000`000`000 times greater than the number of bytes in the known computing universe. Obviously, this chance are way too slim to ever be concerned, as realistically this can only happen once in millions of years."
Unfortunately, MD5 collissions can occur in less than 2^21 (Hope I read the Wikipedia article correctly http://en.wikipedia.org/wiki/Comparison ... te_note-10), in fact collissions have also been found in the longer SHA-0 hash and SHA-1 hash has theoretical collissions in 2^51.
So, I would suggest that Veeam adopt a safer hash algorithm for deduplication purposes, the downside is that the hash will need more storage (224 to 512 bits, depending on the algorithm chosen).
-
- Chief Product Officer
- Posts: 31804
- Liked: 7298 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collission
Hi Sean, I guess you did not read the posting carefully if you are still referring to hash collision attacks (in other words, purposely achieving hash collisions by generating random data chunks in the specific manner). But this was the whole point of my posting anyway? If you ever have hackers obtain write access to your virtual disks to pull something like this, you will have much bigger problems than dedupe hash collisions I suppose
-
- Novice
- Posts: 5
- Liked: never
- Joined: Jan 30, 2011 11:08 pm
- Full Name: Sean Leyne
- Contact:
Re: THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collission
I think that you may have missed an important aspect, random data chunks (ie. file data) can generate identical MD5 hashes. The collission can occur without the need for an attack/hack.
Since the hash is the basis for identifying unique file blocks, any possible collission is a very bad thing.
Since the hash is the basis for identifying unique file blocks, any possible collission is a very bad thing.
-
- Chief Product Officer
- Posts: 31804
- Liked: 7298 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collission
You are right, and as a part of my posting I did provide the non-attack probability of two different data chunks having the same MD5 hash too low even if you were to backup all the data on planet Earth with the single Veeam B&R job which we all know is impossible, as we recommend maximum of 8TB of data per job today (with the default storage settings).seanleyne wrote:I think that you may have missed an important aspect, random data chunks (ie. file data) can generate identical MD5 hashes. The collission can occur without the need for an attack/hack.
-
- Expert
- Posts: 152
- Liked: 24 times
- Joined: May 16, 2011 4:00 am
- Full Name: Alex Macaronis
- Location: Brisbane, Australia
- Contact:
Re: THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collission
Woah, really... so my 9.5TB fulls aren't really a great idea?we recommend maximum of 8TB of data per job today (with the default storage settings).
Never heard this before.
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collission
If you are happy with the performance and you can perform FLRs with no problems then it's not a huge deal as the 8TB best practice recommendation leaves some "headroom" before you'll hit any trouble, but it's important to note that there is a special setting for jobs >16TB, but sometimes it's best to use this setting even as you approach 10TB especially if you keep a long backup chain. When you perform a backup, or especially an FLR, look to see how much memory the VeeamAgent.exe uses. If it's getting close to the maximum size of a 32-bit Windows process (typically around 2GB), then your getting very close to hitting the limits.
-
- VeeaMVP
- Posts: 6165
- Liked: 1971 times
- Joined: Jul 26, 2009 3:39 pm
- Full Name: Luca Dell'Oca
- Location: Varese, Italy
- Contact:
Re: THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collission
Any chance in the near future the Veeam agent would become a 64bit process? I remember in the past some statements about beeing at 32bit was not a problem, but seems like those statements needs to be arranged...
Luca.
Luca.
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software
@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
Principal EMEA Cloud Architect @ Veeam Software
@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
-
- Veeam Software
- Posts: 21138
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collission
See here > Is Veeam Backup & Replication still 32 bit?. However not sure how near this future actually is.
-
- Chief Product Officer
- Posts: 31804
- Liked: 7298 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collission
v7 will have 64-bit agents with 95% probability (unless something goes terribly wrong in the final testing).
-
- VeeaMVP
- Posts: 6165
- Liked: 1971 times
- Joined: Jul 26, 2009 3:39 pm
- Full Name: Luca Dell'Oca
- Location: Varese, Italy
- Contact:
Re: THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collission
Oh, that's nice! Even if it's not among the 7 new features, is really a cool one
Thanks Anton!
Thanks Anton!
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software
@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
Principal EMEA Cloud Architect @ Veeam Software
@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
-
- Chief Product Officer
- Posts: 31804
- Liked: 7298 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collission
In fact, there are many nice technical enhancements besides those 7+2 major ones in v7. However, those are something only hardcore geeks like me and you can appreciate so no marketing for those... you would have to wait for beta What's New for the full list! Or, make me spill the beans on the forum, just like you did above!
By the way, you sound like you already know about all 7 features, how?
By the way, you sound like you already know about all 7 features, how?
-
- VeeaMVP
- Posts: 6165
- Liked: 1971 times
- Joined: Jul 26, 2009 3:39 pm
- Full Name: Luca Dell'Oca
- Location: Varese, Italy
- Contact:
Re: THE WORD FROM GOSTEV (March 18-24) - MD5 Hash collission
Ahahaha, did not have any preview or insider info, I failed the best way to say it, it would have been "even if this feature is not going to be listed among the 7 major ones, FOR ME is a major one". Like you said, it a deeply geek one, and that's why I like it. It's going to save many headaches right now as VMs (and thus backup files) are becoming bigger and bigger.
Luca.
Luca.
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software
@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
Principal EMEA Cloud Architect @ Veeam Software
@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
Who is online
Users browsing this forum: Google [Bot], Semrush [Bot] and 128 guests