To Reinvigorate Your Approach to Big Data, Catch the Advanced AI Wave

October 19, 2022



Mitch Montoya
Mitch Montoya

Emerging challenges with big data—large sets of structured or unstructured data that require specialized tools to decipher— have been well documented, with estimates of worldwide data consumed and created by 2025 reaching unfathomable volumes. However, these challenges present an opportunity for innovation. Over the past few years, we’ve seen a renaissance in AI products and solutions to help address and evolve past these issues. From smaller players creating bespoke algorithms to bigger technology companies developing solutions with broader applications, there are substantial opportunities to harness AI and rethink how to manage data.

A recent announcement of Microsoft’s Syntex highlights the immense possibilities for, and investment in, leveraging AI to manage content and augment human expertise and knowledge. The new feature in Microsoft 365 promises advanced AI and automation to classify and extract information, process content, and help enforce security and compliance policies. But what do new solutions like this mean for eDiscovery and the legal industry?

There are three key AI benefits reshaping the industry you should know about:

1.    Meeting the challenges of cloud and big data
2.    Transforming data strategies and workflows
3.    Accelerating through automation

Meeting the challenges of cloud and big data 

Anyone close to a recent litigation or investigation has witnessed the challenge posed by today’s explosion of data—not just volume, but the variety, speed, and uncertainty of data. To meet this challenge, traditional approaches to eDiscovery need to be updated with more advanced analytics so teams can first make sense of data and then strategize from there. 

Simultaneous with the need to analyze post-export documents, it’s also clear that proactively managing an organization’s data is increasingly essential. Organizations across all industries must comply with an increasingly complex web of data privacy and retention regulations. To do so, it is imperative that they understand what data they are storing, map how that data flows throughout the organization, and have rules in place to govern the classification, deletion, retention, and protection of data that falls within certain regulated categories of data types. However, the rise of new collaboration platforms, cloud storage, and hybrid working have introduced new levels of data complexity and put pressure on information governance and compliance practices—making it impossible to use older, traditional means of information governance workflows. Leveraging automation and analytics driven by AI advances teams from a reactive to proactive posture. 

For example, teams can automate a classification system with advanced AI where it reads documents entering the organization’s digital ecosystem, classifies them, and labels them according to applicable sensitivity or retention categories implemented by the organization—all of which is organized under a taxonomy that can be searched later. This not only helps an organization better manage data and risks upfront—creating a more complete picture of the organization’s data landscape—but also informs better and more efficient strategies downstream. 

Transforming data strategies and workflows 

New AI capabilities give legal and data governance teams the freedom to think more holistically about their data and develop strategies and workflows that are updated to address their most pressing challenges. For eDiscovery, this does not necessarily mean discarding legacy workflows (such as those with TAR) that have proven valuable, but rather augmenting them with advanced AI, such as natural language processing or deep learning, which has capabilities to handle greater data complexity and provide insights to approach a matter with greater agility. 

But the rise of big data means that legal teams need to start thinking about the eDiscovery process more expansively. An effective eDiscovery program needs to start long before data collection for a specific matter or investigation and should contemplate the entire data life cycle. Otherwise, you will waste substantial time, money, and resources trying to search and export insurmountable volumes of data for review. You will also find yourself increasingly at risk for court sanctions and prolonged eDiscovery battles if your team is unprepared or ill-equipped to find and properly export, review, and produce the requested data within the required timeline. 

For compliance and information governance teams, this proactive approach to data has even greater implications since the data they’re handling is not restricted to specific matters. In both cases, AI can be leveraged to classify, organize, and analyze data as it emerges—which not only keeps it under control but also gives quicker access to vital information when teams need it during a matter.

Advanced AI can be applied to analyze and organize data created and held by specific custodians who are likely to be pulled into litigation or investigations, giving eDiscovery teams an advantage when starting a matter. Similarly, sensitive or proprietary information can be collected, organized, and searched far more seamlessly so teams don’t waste time or resources when a matter emerges. This allows more time for case development and better strategic decisions early on.

Accelerating through automation  

Data growth continues to show no signs of slowing, emphasizing the need for data governance systems that are scalable and automated. If not, organizations run the risk of expending valuable resources on continually updating programs to keep pace with data volumes and reanalyzing their key information.

The best solutions allow experts in your organization to refine and adjust data retention policies and automation as the organization’s data evolves and regulations change. In today’s cloud-based world, automation is a necessity. For example, a patchwork of global and local data privacy regulations (GDPR, California’s CCPA, etc.) include restrictions related to the timely disposal of personal information after the business use for that data has ended. However, those restrictions often conflict with or are triggered by industry regulations that require companies to keep certain types of documents and data for specific periods of time. When you factor in the dynamic, voluminous, and complex cloud-based data infrastructure that most company’s now work within, it becomes obvious why a manual, employee-based approach to categorizing data for retention and disposal is no longer sustainable. AI automation can identify personal information as it enters the company’s system, immediately classify it as sensitive data, and label it with specific retention rules.   

This type of automation not only keeps organizations compliant, it also enables legal and data governance teams to support their organization’s growth—whether it’s through new products, services, or acquisitions—while keeping data risk at bay. 


Advancements in AI are providing more precise and sophisticated solutions for the unremitting growth in data—if you know how to use them. For legal, data governance, and compliance teams, there are substantial opportunities to harness the robust creativity in AI to better manage, understand, and deploy data. Rather than be inhibited by endless data volumes and inflexible systems, AI can put their expertise to work and ultimately help to do better at the work that matters.

About the Author

Mitch Montoya

Mitch is a Content Marketing Manager at Lighthouse whose focus is connecting industry leaders, clients, and communities to the stories and solutions that impact them most. At Lighthouse, Mitch writes and develops stories highlighting the advancements in artificial intelligence, big data, and information governance. He also brings together industry leaders for thoughtful conversations on the legal technology revolution as the producer of Lighthouse’s podcast, Law & Candor. Prior to Lighthouse, he was a Thought Leadership Marketing Manager at H5, specializing in creative storytelling, brand and messaging development, and content and digital strategy. Mitch started his career as a journalist and earned a Master of Science in Journalism from Northwestern University and a Bachelor of Arts in English Language and Literature from the University of Chicago.