Since the dawn of modern litigation, attorneys have grappled with finding the most efficient and strategic method of producing discovery. However, the shift to computers and electronically stored information (ESI) within organizations since the 1990s exponentially complicated that process. Rather than sifting through filing cabinets and boxes, litigation teams suddenly found themselves looking to technology to help them review and produce large volumes of ESI pulled from email accounts, hard drives, and more recently, cloud storage. In effect, because technology changed the way people communicated, the legal industry was forced to change its discovery process.
The Rise of TAR
Due to growing data volumes in the mid-2000s, the process of large teams of attorneys looking at electronic documents one-by-one was becoming infeasible. Forward-thinking attorneys again looked to technology to help make the process more practical and efficient – specifically, to a subset of artificial intelligence (AI) technology called “machine learning” that could help predict the responsiveness of documents. This process of using machine learning to score a dataset according to the likelihood of responsiveness to minimize the amount of human review became known as technology assisted review (TAR).
TAR proved invaluable because machine learning algorithms’ classification of documents enabled attorneys to prioritize important documents for human review and, in some cases, avoid reviewing large portions of documents. With the original form of TAR, a small number of highly trained subject matter experts review and code a randomly selected group of documents, which are then used to train the computer. Once trained, the computer can score all the documents in the dataset according to the likelihood of responsiveness. Using statistical measures, a cutoff point is determined, below which the remaining documents do not require human review because they are deemed statistically non-responsive to the discovery request.
Eventually, a second iteration of TAR was developed. Known as TAR 2.0, this second iteration is based on the same supervised machine learning technology as the riginal TAR (now known as TAR 1.0) – but rather than the simple learning of TAR 1.0, TAR 2.0 utilizes a process to continuously learn from reviewer decisions. This eliminates the need for highly trained subject matter experts to train the system with a control set of documents at the outset of the matter. TAR 2.0 workflows can help sort and prioritize documents as reviewers code, constantly funneling the most responsive to the top for review.
Modern Data Challenges
But while both TAR 1.0 and TAR 2.0 are still widely used in eDiscovery today – the data landscape looks drastically different than it did when TAR first made its debut. Smartphones, social media applications, ephemeral messaging systems, and cloud-based collaboration platforms, for example, did not exist twenty years ago but are all commonly used within organizations for communication today. This new technology generates vast amounts of complicated data that, in turn, must be collected and analyzed during litigations and investigations.
Aside from the new variety of data, the volume and velocity of modern data is also significantly different than it was twenty years ago. For instance, the amount of data generated, captured, copied, and consumed worldwide in 2010 was just two zettabytes. By 2020, that volume had grown to 64.2 zettabytes.
Despite this modern data revolution, litigation teams are still using the same machine learning technology to perform TAR as they did when it was first introduced over a decade ago – and that technology was already more than a decade old back then. TAR as it currently stands is not built for big data – the extremely large, varied, and complex modern datasets that attorneys must increasingly deal with when handling discovery requests. These simple AI systems cannot scale the way more advanced forms of AI can in order to tackle large datasets. They also lack the ability to take context, metadata, and modern language into account when making coding predictions. The snail pace of the evolution of TAR technology in the face of the lightning-fast evolution of modern data is quickly becoming a problem.
The Future of TAR
The solution to the challenge of modern data lies in updating TAR workflows to include a variety of more advanced AI technology, together with bringing on technology experts and linguistics to help wield them. To begin with, for TAR to remain effective in a modern data environment, it is necessary to incorporate tools that leverage more advanced subsets of AI, such as deep learning and natural language processing (NLP), into the TAR process. In contrast to simple machine learning (which can only analyze the text of a document), newer tools leveraging more advanced AI can analyze metadata, context, and even the sentiment of the language used within a document. Additionally, bringing in linguists and experienced technologists to expertly handle massive data volumes allows attorneys to focus on the actual substantive legal issues at hand, rather than attempting to become an eDiscovery Frankenstein (i.e., a lawyer + a data scientist + a technology expert + and a linguistic expert all rolled into one).
This combination of advanced AI technology and expert service will enable litigation teams to reinvent data review to make it more feasible, effective, and manageable in a modern era. For example, because more advanced AI is capable of handling large data volumes and looking at documents from multiple dimensions, technology experts and attorneys can start working together to put a system in place to recycle data and past attorney work product from previous eDiscovery reviews. This type of “data reuse” can be especially helpful in tackling the traditionally more expensive and time-consuming aspects of eDiscovery reviews, like privilege and sensitive information identification and can also help remove large swaths of ROT (redundant, obsolete, or trivial data). When technology experts can leverage past data to train a more advanced AI tool, legal teams can immediately reduce the need for human review in the current case. In this way, this combination of advanced AI and expert service can reduce the endless “reinventing the wheel” that historically happens on each new matter.
The same cycle that brought technology into the discovery process is again prompting a new change in eDiscovery. The way people communicate and the systems used to facilitate that communication at work are changing, and current TAR technology is not equipped to handle that change effectively. It’s time to start incorporating more modern AI technology and expert services into TAR workflows to make eDiscovery feasible in a modern era.
 “Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2025” https://www.statista.com/statistics/871513/worldwide-data-created/
About the Author
Sarah is an eDiscovery Evangelist and Proposal Content Strategist at Lighthouse. Before coming to Lighthouse, she worked for a decade as a practicing attorney at a global law firm, specializing in eDiscovery counseling and case management, data privacy, and information governance. At Lighthouse, she happily utilizes her eDiscovery expertise to help our clients understand and leverage the ever-changing world of legal technology and data governance. She is a problem solver and a collaborator and welcomes any chance to discuss customer pain points in eDiscovery. Sarah earned her B.A. in English from Penn State University and her J.D. from Delaware Law School.