Statistical Validation and Data Analytics in E-Discovery
DoD Cyber Crime Conference, 2011

Slides (pdf)    

Computers are fantastic at finding identical pieces of data, but terrible at finding similar data. This is a problem in eDiscovery, where we would like to find documents similar to each other or responsive to a query. There are several statistical analysis methods which enable a computer to do these tasks and more. This talk will demonstrate these methods, clarify what 'similar' means in any context, and introduce the technology in tomorrow's eDiscovery tools.

