Does It Actually Work? How to Measure the Efficacy of Modern AI

June 6, 2024



Karl Sobylak
Karl Sobylak

The first step to moving beyond the AI hype is also the most important.

That’s when you ask: How can AI actually make my work better? It’s the great ROI question. Technology solutions are only as good as the benefits they provide. So it’s critical to consider an AI solution’s efficacy—its ability to deliver the benefits you expect of it—before bringing it onboard.

To help you do that, let’s walk through what efficacy means in general, and then look at what it means for the two types of modern AI.

Efficacy varies depending on the solution

You can measure efficacy in whatever terms matter most to you. For simplicity’s sake, let’s focus on quality, speed, and cost.

When you’re looking to improve efficacy in those ways, it’s important to remember that not all AI is the same. You need to choose technology suited for your task.

The two types of AI that use large language models (LLMs) are predictive AI and generative AI (for a detailed breakdown, see our previous article on LLMs and the types of AI). Because they perform different functions, they impact quality, speed, and cost in different ways.

Measuring the efficacy of predictive AI

Predictive AI predicts things, such as the likelihood that a document is responsive, privileged, etc.

Here’s how it works, using privilege as an example.

  1. Attorneys review and code a sample set of documents.
  2. Those docs are fed to the AI model to train it—essentially, teaching it what does and doesn’t count as privilege for this matter.
  3. Then, the classifier analyzes the rest of the dataset and assigns a percentage to each document: The higher the percentage, the more likely the document is to be privileged.

The training period is a critical part of the efficacy equation. It requires an initial investment in eyes-on review, but it sets the AI up to help you reduce eyes-on review down the line. The value is clearest in large matters: Having attorneys review 4,000 documents during the training period is more than worth it when AI removes more than 100,000 from privilege review.

With that in mind, here’s how you could measure the efficacy of a predictive AI priv classifier.

Quality: Does AI make effective predictions?

AI privilege classifiers can be very effective at identifying privilege, including catching documents that other methods miss. A client in one real-life matter used our classifier in combination with search terms—and our classifier found 1,600 privileged docs that weren’t caught by search terms. Without the classifier, the client would have faced painful disclosures and clawbacks.

Speed: Does predictive AI help you move faster?

AI can accelerate review in multiple ways. Some legal teams use the percentages assigned by their AI priv classifier to prioritize review, starting with the most likely docs and reviewing the rest in descending order. Some use the percentages to cull the review population, removing docs below a certain percentage and reviewing only those docs that meet a certain threshold of likelihood.

One of our clients often does both. For 1L review, they prioritize docs that score in the middle. Docs with extremely high or low percentages are culled: The most likely docs go straight to 2L review, while the least likely docs go straight to production. By using this method during a high-stakes Second Request, the client was able to remove 200,000 documents from privilege review.

Cost: Does predictive AI save you money?

Improving speed and quality can also improve your bottom line. During the Second Request mentioned above, our client saved 8,000 hours of attorney time and more than $1M during privilege review.

Measuring the efficacy of generative AI

Generative AI (or “gen AI”) generates content, such as responses to questions or summaries of source content. Use cases for gen AI in eDiscovery vary widely—and so does the efficacy.

For our first gen AI solution, we picked a use case where efficacy is straightforward: privilege logs. In this case, we’re not giving gen AI open-ended questions or a sprawling canvas. We’re asking it to draft something very specific, for a specific purpose. That makes the quality and value of its output easy to measure.

This is another case where AI’s performance is tied to a training period, which makes efficacy more significant in larger matters. After analysts train the AI on a few thousand priv logs, the model can generate tens of thousands on its own.

Given all that, here’s how you might measure efficacy for gen AI.

Quality: Does gen AI faithfully generate what you’re asking it to?

This is often tricky, as discussed in an earlier blog post about AI and accuracy in eDiscovery. Depending on the prompt or situation, gen AI can do what you ask it to without sticking to the facts.

So for gen AI to deliver on quality and defensibility, you need a use case that affords:

  • Control—AI analytics experts should be deeply involved, writing prompts and setting boundaries for the AI-generated content to ensure it fits the problem you’re solving for. Control is critical to drive quality.
  • Validation—Attorneys should review and be able to edit all content generated by AI. Validation is critical to measure quality.

Our gen AI priv log solution meets these criteria. AI experts guide the AI as it generates content, and attorneys approve or edit every log the AI generates.

As a result, the solution reliably hits the mark. In fact, outside counsel has rated our AI-generated log lines better than log lines by first-level contract attorneys.

Speed: Does gen AI help you move faster?

If someone (or something) writes content for you, it’s usually going to save you time. But as I said above, you shouldn’t accept whatever AI generates for you. Consider it a first draft—one that a person needs to review before calling it final.

But reviewing content is a lot faster than drafting it, so our priv log solution and other gen AI models can definitely save you time.

Cost: Does gen AI save you money?

Giving AI credit for cost savings can be hard with many use cases. If you use gen AI as a conversational search engine or case-strategy collaborator, how do you calculate its value in dollars and cents?

But with priv logs, the financial ROI is easy to track: What do you spend on priv logs with gen AI vs. without? Many clients have found that using our gen AI for the first draft is cheaper than using attorneys.

Where can AI be effective for you?

This post started with one question—How can AI make your work better?—but you can’t answer it without also asking where.

Where are you thinking about applying AI? Where could your team benefit the most?

So much about efficacy depends on the use case. It determines which type of AI can deliver what you need. It dictates what to expect in terms of quality, speed, and cost, including how easy it is to measure those benefits and whether you can expect much benefit at all.

If you’re struggling to figure out what benefits matter most to you and how AI might deliver on them, sign up to receive our simple guide to thinking about AI below. It walks through seven dimensions of AI that are changing eDiscovery, including sections on efficacy, ethics, job impacts, and more. Each section includes a brief questionnaire to help you clarify where you stand—and what you stand to gain.

About the Author

Karl Sobylak

Karl is responsible for the innovation, development, and deployment of cutting-edge big data analytic based products that create better and more optimized legal outcomes for our clients, including the reduction of cost and risk. After graduating from SUNY Albany with a B.S. in Computer Science and Applied Mathematics in 2003, Karl joined a start-up eDiscovery services company where he learned everything he could about the world of legal including operations, development, services, and strategy. With more than 16 years of expertise in the legal industry, creating data-centric solutions, and applying risk mitigation tactics, Karl possesses a strong background that has allowed him to help reduce legal costs, improve precision and recall rates, and gain favorable legal results.