Big Data and Analytics in eDiscovery: Unlock the Value of Your Data

May 20, 2020



Karl Sobylak
Karl Sobylak

The current state of eDiscovery is complex, inefficient, and cost prohibitive as data types and volumes continue to explode without bounds. Organizations of all sizes are bogged down in enormous amounts of unresponsive and duplicative electronically stored information (ESI) that still make it to the review stage, persistently the most expensive phase of eDiscovery.

Data is at the center of this conundrum and it presents itself in a number of forms including:

  1. Scale of Data - In the era of big data, the volume, or amount of data generated, is a significant issue for large-scale eDiscovery cases. By 2025, IDC predicts that 49 percent of the world’s stored data will reside in public cloud environments and worldwide data will grow 61 percent to 175 zettabytes.
  2. Different Forms of Data - While the volume of ESI is dramatically expanding, the diversity and variety are also greatly increasing, and a big piece of the challenge involved with managing big data is the varying kinds of data the world is now generating. Gone are the days in eDiscovery where the biggest challenge was processing and reviewing structured, computer-based data like email, spreadsheets, and documents.
  3. Analysis of Data - Contending with large amounts of data creates another significant issue around the velocity or speed of the data that’s generated, as well as the rate at which that data is processed for collection and analysis. The old approach is to put everything into a database and try to analyze it later. But, in the era of big data, the old ways are expensive and time-consuming, and the much smarter method is to analyze in real time as the data is generated.
  4. Uncertainty of Data - Of course, with data, whether it’s big or small, it must be accurate. If you’re regularly collecting, processing, and generally amassing large amounts of data, none of it will matter if your data is unreliable or untrustworthy. The quality of data to be analyzed must first be accurate and untainted.

When you combine all of these aspects of data, it is clear that eDiscovery is actually a big data and analytics challenge!

While big data and analytics has been historically considered too complex and elaborate, the good news is that massive progress has been made in these fields over the past decade. Access to the right people, process, and technology in the form of packaged platforms is more accessible than ever.

Effective utilization of a robust and intelligent big data and analytics platforms enable organizations to revamp their inefficient and non-repeatable eDiscovery workflows by intelligently learning from past cases. A powerful big data and analytics tool utilizes artificial intelligence (AI) and machine learning to create customized data solutions by harvesting data from all of a client’s cases and ultimately creating a master knowledge base in one big data and analytics environment.

In particular, the most effective big data and analytical technology solution should provide:

  • Comprehensive Analysis – The ability to integrate disparate data sources into a single holistic view. This view gives you actionable insights, leading to better decision making and more favorable case outcomes.
  • Insightful Access – Overall and detailed visibility into your data landscape in a manner that empowers your legal team to make data-driven decisions.
  • Intelligent Learnings – The ability to learn as you go through a powerful analytics and machine learning platform that enables you to make sense of vast amounts of data on demand.

One of the biggest mistakes organizations make in eDiscovery is forgoing big data and analytics to drive greater efficiency and cost savings. Most organizations hold enormous amounts of untapped knowledge currently locked away in archived or inactive matters. With big data and analytics platforms more accessible than ever, the opportunity to learn from the past to optimize the future is paramount.

If you are interested in this topic or just love to talk about big data and analytics, feel free to reach out to me at

About the Author

Karl Sobylak

Karl is responsible for the innovation, development, and deployment of cutting-edge big data analytic based products that create better and more optimized legal outcomes for our clients, including the reduction of cost and risk. After graduating from SUNY Albany with a B.S. in Computer Science and Applied Mathematics in 2003, Karl joined a start-up eDiscovery services company where he learned everything he could about the world of legal including operations, development, services, and strategy. With more than 16 years of expertise in the legal industry, creating data-centric solutions, and applying risk mitigation tactics, Karl possesses a strong background that has allowed him to help reduce legal costs, improve precision and recall rates, and gain favorable legal results.