Unprecedented Review Accuracy and Efficiency in Federal Criminal Investigation

A global transportation company was under investigation for possible infractions of the Foreign Corrupt Practices Act (FCPA) in India. The company’s legal counsel needed to quickly produce responsive documents and find key documents to prepare their defense.

Download case study PDF

Precision and Recall on Both Rounds of TAR


Hot Docs Identified in Three Weeks

Key Results

  • 4M total documents reduced to 250K through 2 rounds of responsive review, with precision rate and recall of 85% or higher. 
  • 810 key documents quickly delivered to outside counsel, saving them hours of review and gaining more time for case strategy. 

A Complex Dataset Requiring Nuanced Approaches

The company collected 2M documents from executives in India and the U.S. Information in the documents was extremely sensitive, making it critical to produce only those documents related to the India market. This would be impossible for most TAR tools, which use machine learning and therefore can’t reliably differentiate between conversations about the company’s business in India from discussions solely pertaining to U.S. business. 

Finding key documents to prepare a defense was challenging as well. The company wanted to learn whether vendors and other third parties had bribed officials in violation of the FCPA, but references to any such violations were sure to be obscure rather than overt.

Zeroing In On the Right Conversations

Lighthouse used a hybrid approach, supplementing machine learning models with powerful linguistic modeling. First, our linguistic experts created a model to remove documents that merely referred to India but didn’t pertain to business in that market, so that the machine learning TAR wouldn’t pull them into the responsive set. Then our responsive review team developed geographic filters based on documents confirmed as India-specific and used those filters to train the machine learning model. 

The TAR model created an initial responsive set, which our linguists refined even further with an additional model, based on nuances of English used in communications across different regions of India. By the end, our hybrid approach had reduced the corpus by 97%, with an 87% precision rate and 85% recall. 

Once this first phase of review was successfully completed, Lighthouse dove into an additional 2M documents collected from custodians located in India.

Finding Key Documents Among Obfuscated Communications

To help inform a defense, our search specialists focused on language that bad actors outside the company might have used to obfuscate bribery. The team used advanced search techniques to examine how often, and in what context, certain verb-noun pairs indicating an “exchange” were used (for instance, commonly used innocent pairings like give a hand vs. rarer pairs like give reward). The team could then focus on the documents containing language indicating an attempt to conceal or infer. 

$1.7M Saved, 810 Key Documents Found to Support Defense

Lighthouse performed responsive review on two datasets of 2M documents each, reducing them to less than 250K and saving the client more than $1.7M.

Out of the 237K responsive documents, Lighthouse uncovered 810 hot docs spanning 7 themes of interest. The work was complete in just 3 weeks and enabled outside counsel to provide the best defense to the underlying company.