What Large Language Models, Predictive AI, and Generative AI Mean for eDiscovery

April 3, 2024



Karl Sobylak
Karl Sobylak
Lon Troyer
Lon Troyer

Large language models (LLMs) have changed how people think and talk about AI. As legal teams become increasingly open to using AI in eDiscovery, it helps to get a little more familiar with what LLMs are and what they can do.

LLMs enable two types of AI

LLMs entered the lexicon thanks to chat platforms like ChatGPT, which use LLMs to mimic human language and conversation. LLMs are also behind powerful tools for assessing responsiveness and supporting other human judgments in eDiscovery. Compared to the machine learning models which are most commonly used in TAR, LLMs are able to analyze data in a more sophisticated way, including the nuances of language.

These examples demonstrate the two types of AI enabled by LLMs: generative and predictive. Both types do exactly what their names suggest.

  • Predictive AI predicts things, such as the likelihood that a document should be classified (as responsive, privileged, etc.) based on prior coding.
  • Generative AI generates things, creating new content, such as answers to questions and summaries of source material (ChatGPT is an example).

Predictive AI and generative AI both have applications in eDiscovery. Neither is “better” than the other. In fact, they are most powerful when stacked together. And there is still a lot of untapped potential within each type—including how they can complement each other.

Predictive AI: established and measurable

Predictive AI can play a valuable role in classifying documents for responsiveness and other data types, including privilege, PII, PHI, trade secrets, and so on, for the scale of documents going through eDiscovery today. We have been working with predictive LLMs since 2019 and have used LLMs to analyze 2 billion documents.

Predictive AI classifiers can significantly reduce eyes-on review. Attorneys start by reviewing a sample of the full document set. Then predictive AI learns from the attorneys’ decisions and predicts how a human would classify the rest of the documents. Meticulous QC and retraining enhance the AI model as it goes.

The precision of AI predictions can be astonishing. One of the first times we ever used predictive AI for responsiveness, it proved to be 10 times more precise than the old machine learning-based models that are standard today.

But since LLMs deliver more nuanced analysis, they also require more computing power. And as with many newer technologies, it comes at a higher price. Thus, there’s a risk-reward ratio to using AI in this way: The larger the dataset, the more you gain from AI’s help.

Generative AI: compelling and still emerging

Generative AI (or “gen AI,” for short) is dominating attention right now because it’s the new kid on the block. It’s also extremely compelling—you can experience it firsthand.

But legal teams have to be smart about how they weave it into their eDiscovery work so it actually adds value. For example, clients sometimes ask us whether gen AI can form the basis of a new kind of TAR workflow for responsiveness. Technically, the answer is yes, but gen AI hasn’t been optimized for this use case, so the cost is not necessarily worth the investment.

To see where (and whether) gen AI can add value, it’s important to identify and test pragmatic use cases. With gen AI, we are focused on an AI-powered privilege log builder. When tested on real matter data, outside counsel found the AI-generated log lines were 12% more accurate than log lines written by third-party contract reviewers.

Integrating predictive and generative AI in eDiscovery

Present and future use cases will include opportunities for predictive AI and gen AI to feed outputs into one another.

We are already seeing this in the area of privilege detection and logging. We can use a predictive AI model to predict privilege classifications for a set of documents (based on a sample reviewed by attorneys, as above). Documents classified as privileged can then be fed into a gen AI model that drafts a privilege log description, based on intricate analysis of the documents and understanding of the privilege parameters.

Someday, the sequence may be able to flow the other direction, where outputs from gen AI get fed into a predictive model. For example, you might use gen AI to create hypothetical documents of a certain type. These documents would then serve as the seed set for predictive AI to study and use as a basis for identifying other documents that are the same type.

A critical component of using and integrating these technologies will continue to be the experts who know how to implement technology into review workflows to unleash its highest value. The technology on its own will not bring the results and value that are needed in eDiscovery.

As we build AI-powered solutions, we make sure we can answer the following questions. We encourage you to ask them as well, as you evaluate what AI will mean for your eDiscovery practice:

  • How do we know the model performed well?
  • How do we know that this was a good review?
  • How defensible is the model and workflow?

For a deeper look at how we’re using predictive and generative AI at Lighthouse, check out our AI-Powered Privilege Review solution.

About the Author

Karl Sobylak

Karl is responsible for the innovation, development, and deployment of cutting-edge big data analytic based products that create better and more optimized legal outcomes for our clients, including the reduction of cost and risk. After graduating from SUNY Albany with a B.S. in Computer Science and Applied Mathematics in 2003, Karl joined a start-up eDiscovery services company where he learned everything he could about the world of legal including operations, development, services, and strategy. With more than 16 years of expertise in the legal industry, creating data-centric solutions, and applying risk mitigation tactics, Karl possesses a strong background that has allowed him to help reduce legal costs, improve precision and recall rates, and gain favorable legal results.

About the Author

Lon Troyer

Dr. Lon Troyer is Vice President of Review and Advanced Analytics at Lighthouse, overseeing the application of analytics, search, and information retrieval expertise to implement solutions to clients’ litigation and regulatory compliance challenges. His teams specialize in leveraging artificial intelligence and search technologies as well as extensive investigative experience to scope, design, and implement innovative solutions for clients throughout the data lifecycle.

Drawing on his diverse background in technology-assisted review, linguistic modeling, advanced information retrieval strategies, and project management, Lon leads the team that provides Lighthouse’s full suite of review solutions.

During his career, Lon has worked domestically and internationally on dozens of high stakes matters in a wide variety of industries, including antitrust, class action, IP, product liability, and other types of litigation, as well as internal and government investigations.

Prior to joining Lighthouse, Lon was the Executive Managing Director and Head of Professional Services at H5, taught constitutional law in graduate school at the University of California, Berkeley, and gained practical experience in corporate law at Sidley Austin. Lon earned his undergraduate degree at Williams College and his Ph.D. from the University of California, Berkeley.