Identifying Almost Identical Files Using Context Triggered Piecewise Hashing
J. Kornblum
Proceedings of the Digital Forensic Workshop
2006

Paper (pdf)     Slides (pdf)     Bibtex

Homologous files share identical sets of bits in the same order. Because such files are not completely identical, traditional techniques such as cryptographic hashing cannot be used to identify them. This paper introduces a new technique for constructing hash signatures by combining a number of traditional hashes whose boundaries are determined by the context of the input. These signatures can be used to identify modified versions of known files even if data has been inserted, modified, or deleted in the new files. The description of this method is followed by a brief analysis of its performance and some sample applica- tions to computer forensics. Homologous files share identical sets of bits in the same order. Because such files are not completely identical, traditional techniques such as cryptographic hashing cannot be used to identify them. This paper introduces a new technique for constructing hash signatures by combining a number of traditional hashes whose boundaries are determined by the context of the input. These signatures can be used to identify modified versions of known files even if data has been inserted, modified, or deleted in the new files. The description of this method is followed by a brief analysis of its performance and some sample applications.


Home     Publications     Presentations     Utilities     Tools     Blog