Getting Fuzzy With It
High Technolog Crime Investigation Association, Mid-Atlantic Chapter

Slides (pdf)    

The blessing and the curse of "similarity" is that there's more than one way to do it. This talk will explain the fuzzy hashing tool ssdeep and some of the other similarity detection techniques available today. Some of these programs work like ssdeep at the ones and zeros level. That is, they identify files which have long sequences of identical bytes. The sdhash program by Dr. Vassil Roussev, for example, works on the same principles, but has different results. Which is better? (It depends!) There are also a slew of other programs available now or in development for many other kinds of similarity matching. These programs can work at the level of finding identical phrases of text, similar phrases (e.g. "planting shrubbery" is similar to "planting greenery"), similar functionality in programs, and so on.

