Recently, a post to the full-disclosure mailing list described an update to the well known MD5 collision problem. The authors - Marc Stevens, Arjen K. Lenstra, and Benne de Weger - provided a method whereby they can append only a few thousand bytes to two arbitrary files, with the result that both files have the same MD5 value. This is known as a "chosen prefix collision." Not only that, but they produced their proof-of-concept files using one machine in less than two days. If you distribute the work, you can make it go faster.
While what they have achieved is not the same as producing an identical MD5 for an existing file, it's still not a good thing. In particular it causes serious trouble for application white-listing implementations. Why? Imagine this scenario:
- malware author creates a harmless application.
- malware author creates a malicious application.
- malware author uses the chosen prefix collision method to alter these two applications to produce an identical MD5 value.
- malware author uploads the harmless application to a Web site that hosts a drive-by download exploit. The drive-by download installs the harmless application everywhere.
- harmless application is submitted to a classification server, which white-lists the file because it is harmless.
- malware author uploads the malicious application to a Web site that hosts a drive-by download exploit. The drive-by download installs the malicious application everywhere.
- since the MD5 values are the same, the malicious application now looks identical to the harmless application and is not blocked.
And we wouldn't even know that it's happening. Oh.
Suddenly (or not - it has been expected for some time), MD5 isn't looking so good anymore. But wait, there's more: the harmless application can even be digitally signed after alteration, making it look even more harmless. And so can the malicious application. That's because plenty of digital signatures are verified by using MD5.
How about SHA-1? No, the same kind of problem exists there.
Multiple checksums? Useless if the second algorithm is weaker than the first one (MD5 and CRC32, for example). If the second algorithm is strong, then it is sufficient on its own and the MD5 calculation becomes a waste of time.
Can we be suspicious about the extra bytes? Not reliably - it might look like compressed data in an installer application, or some kind of signature.
SHA-2 is looking good.
More...
Bookmarks