Analytics and Predictive Coding Technology for Corporate Attorneys: Six Use Cases

September 2, 2021



John Del Piero
John Del Piero

Below is a copy of a featured article written by Jennifer Swanton of MedtronicShannon Capone Kirk of Ropes & Gray,  and John Del Piero of Lighthouse for Legaltech News.

This is the second article in a two-part series, designed to help create a better relationship between corporate attorneys and advanced technology. In our first article, we worked to demystify the language technology providers tend to use around AI and analytics technology.

With the terminology now defined, we will now focus on six specific ways that corporate legal teams can put this type of technology to work in the eDiscovery and compliance space to improve cost, outcome, efficiencies.

1. Document Review and Data Prioritization: The earliest example of how to maximize the value of analytics in eDiscovery was the introduction of TAR (technology-assisted review). CAL (or continuous active learning) allows counsel to see the most likely to be relevant documents much earlier on in the process than if they had been simply looking at search term results, which are not categorized or prioritized and are often overbroad. Plainly put, it is the difference between an organized review and a disorganized review.

Data prioritization offers strategic value to the case team, enabling them to get to the crux of a case earlier in the process and ultimately develop a better strategic plan for cost and outcomes. This process also offers the ability to get to a point of review where the likelihood of additional relevant information is so low, no new review is needed. This will save time and money on large document review projects. Such prioritization is critical for time-sensitive internal investigations, as well.

To dive further into the Pandora analogy we used above: if you were to listen to a random shuffle of songs on Pandora without giving feedback on what you like and don’t like, you’d likely listen for days to encounter several songs you love. Whereas, if you give Pandora feedback, it learns and you’re likely to hear several songs you love within hours. So why suffer days of listening to show tunes and harp solos when what you really love is the brilliant artistry found in songs by the likes of Ray LaMontagne?

2. Custodian and Data Source Identification: Advanced analytics that can analyze complex concepts within data can be a powerful tool to clearly identify your relevant data custodians, where that data lives, and other data sources worth considering. Most conceptual analytics technology can now provide real-time visibility into information about custodians, including the date range of the data collected and the data types delivered. More advanced technology that also analyzes metadata can provide you with a deeper understanding of how custodians interact with other people, including the ability to analyze patterns in timing and speech, and even the sentiment and tone of those interactions.

All of this information can be used to help quickly determine whether or not a prospective custodian has information relevant to the case that needs to be collected, or if any supplemental collections are required to close a gap in the date range collected. This, in turn, will help reduce the amount of collections required and minimize processing time in fast-paced cases. These tools also help determine which data sources are likely to hold your most relevant information and where supplemental collections may be warranted.

Above: Brainspace display of communication networks, which enable users to identify custodians of interest, as well as related people and conversations.

3. Identifying Privileged and Personal Information: Another powerful way to leverage analytics in the eDiscovery workflow is to identify privileged documents in a far more cost-effective way than we could in the past. New privilege categorization software creates significant efficiencies by analyzing the text, metadata, and previous coding of documents in order to categorize documents according to the likelihood that they are actually privileged.

More advanced analytics tools can now identify documents that have been flagged as privileged by traditional privilege term screens, but have a high likelihood of not containing privileged communications. For example, the technology identifies that the document was sent to a third-party (thus breaking the privilege attorney-client privilege) or because the only privilege term within the document is contained within a boilerplate footer.

These more advanced analytics tools can be much more effective at identifying privileged documents than a privilege search term list, and can help case teams successfully meet rolling production deadlines by pushing the documents that are less likely to be privileged (i.e. those that require less privilege review) to the front of the review line. When integrated with other eDiscovery applications, you can also create a defensible privilege log that can be produced for the litigation team.

Additionally, flagging potential PII and protected intellectual property (IP) caught up in a large data set can be challenging, but analytics technology provides in-house legal teams with an important ally for automating those processes. Advanced analytics can streamline the process of locating and isolating this sensitive data, which is often hiding in a variety of different systems, folders, and other information silos. Tools allow you to flag Health Insurance Portability and Accountability Act (HIPAA) protected information based on common format and structure to help quickly move through documents and accurately identify and redact needed information.

4. Information Governance: One of the high-stakes elements of large data collections is the importance of parsing out highly sensitive records, such as those that contain PII and protected IP. This information is incredibly important to protect company data and also to comply with the growing number of data privacy regulations worldwide, including Europe’s General Data Protection Regulation (GDPR), the California Consumer Protection Act (CCPA), and HIPAA. Analytics can help identify and flag documents per their appropriate document classification. This can be helpful for both the business in their day-to-day operations as well as the legal team in responding to requests.

5. Data Re-Use: One of the largest potentials with the use of analytics is the ability to save time and money on your next matter. Technologically advanced companies are now starting to use analytics technology to integrate previous attorney work product, case information, and documents across all organization matters. On a micro level, recycling and analyzing previous work product allows companies to stop re-inventing the wheel on each case and aids in much faster identification of privilege, personal information, and non-responsive documents.

For example, organizations often pay to store documents that contain previous privilege tagging from past matters in inactive or archived databases. Those documents, sitting unused in storage, can be separately re-ingested and used to train a privilege model in the new matter, allowing legal teams to immediately eliminate documents that were identified as privileged in previous reviews—even prior to any human coding in the new matter.

On a macro level, this type of advanced capability enables organizations to make data-driven decisions across their entire eDiscovery landscape. Rather than looking at each new matter on an individual basis in a singular lens, legal teams can use advanced analytics to analyze previously coded data across the organization’s entire legal portfolio. This can provide previously unheard of insights, like which custodians often contain the most privileged documents matter over matter, or if a data source rarely produces responsive documents. Data re-use can also come in handy in portfolio matters that have overlapping custodians and data sets and need common production. The overall results are more strategic legal and data decisions, more favorable case outcomes, and increased cost efficiency.

6. Accuracy: Finally, and potentially the most important reason to use analytics tools, is to increase accuracy and have a better work product. Studies have shown that tools like predictive coding are more accurate than human work product. That, coupled with the potential for cost savings, should be all one needs to utilize these technologies.

As useful as these new analytics tools are to in-house legal teams in their efforts to manage eDiscovery today, it is important to understand that the great promise of these technologies is the fact that they are in a state of continuous improvement. Because analytics tools learn, they refine and “get smarter” as they review more data sets. We all know that we’re on just the cusp of what analytics will bring to our profession—but we believe the future of this technology in the area of eDiscovery management is here now.

About the Author

John Del Piero

John focuses on developing integrated partnerships with law firms and corporations to manage fast-moving, complex litigation and investigations. He manages relationships with various AmLaw 200, Global 100, and Fortune 500 clients. John has overseen some of the most complex engagements including global antitrust investigations from both US and EU institutions, large-scale FERC investigations, FCPA matters, and complex class actions. He graduated from Vanderbilt University with an engineering and economics degree and is known for helping clients develop repeatable, integrated, and defensible processes.