Audits, Triage, and the Future of Hashing
DoD Cyber Crime Conference, 2012
Slides (pdf)
fasthash.py - Script for hash matching using file sizes
Everybody knows cryptographic hashes can be used to find identical files. But advanced techniques can help you detect hash collisions, find files even faster, and get a better sense of what's really in your data. We'll talk about hash collisions, why you should stop using MD5 (like, really for serious this time), using partial hashes, and hash set auditing. Each technique will be explained and demonstrated.