Primary Research: Modification of Modern Attachments in Google Post-Send

June 30, 2026

By:

Josh Headley
Josh Headley
1 min read

Get the latest insights

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

The Challenge

An attorney reviewing an important Gmail message notices that even though the message was sent two years ago, one of the linked attachments was modified just a couple of months ago. How long after the message was sent did the referenced file change? What exactly was altered, and does it have any bearing on her client’s case? Does the version in hand still have meaningful probative value?

The e-discovery industry has grappled with “modern attachments”—items that present only as a reference or link to content stored outside the email system—for quite some time. Some are generally static: photographs, training videos, AI-generated images, PDF files, ZIP containers. Others are purpose-built for collaboration, with multiple users contributing content over time, leaving comments, and resolving tasks. Collection, review, and production of this dynamic information can be a moving target.

Our Study

We set out to gather empirical evidence to evaluate the true scope of the situation. Although this topic has many layers—each with important and often subjective legal implications—we focused narrowly on two questions:

  • What is the ratio of non-editable modern attachments to those that are inherently dynamic and collaborative? In other words, how many of these linked items actually have the ability to evolve in place over time?
  • Of the editable modern attachments, what is the frequency and prevalence of modification after a communication containing the link has been sent?

Helpful Metadata

The following data elements can be acquired or computed using output from Google Vault and a variety of other acquisition toolkits.

Gmail Sent Date/Time: the UTC timestamp of when the message was delivered to Google’s cloud servers for routing to recipients.

GDrive Modified Date/Time: the UTC timestamp of the most recent modification to the hyperlinked Google Drive item at the time of collection. The collection event may have taken place hours, days, or many years after the original communication. A change in the Modified Date of an editable linked item after the communication was transmitted is the marker of post-transmittal modification for the purposes of this study.

GDrive Item Type: to separate static GDrive items from editable ones, we built an inventory of document types considered collaborative and editable for the purposes of the study. A JPEG is an image unlikely to be edited over time; a Google Sheets file may remain static but also welcomes in-place edits throughout its life.

Gmail Age at Collection (Days): the number of days elapsed between when the Gmail message was sent and when the collection event took place. To allow sufficient time for hyperlinked items to potentially undergo modification, we excluded messages where this figure was less than 180 days (approximately six months). This threshold is adjustable for future studies.

Findings

Using 19 data sets spanning industries and organization sizes, we examined 271,145 hyperlinks to GDrive items from within Gmail messages. Of those, 150,321 links (55%) pointed to items considered collaborative and editable in place, while the remaining 45% pointed to typically non-editable binary files such as images and video. Of the editable corpus, 90,574—60%—were modified in some manner after the Gmail message was sent and prior to the collection event.

Caveat Emptor

The primary limiting factor in this study is the reliance on the Modified Date of hyperlinked GDrive items as the barometer for substantive change to document content. In Google Workspace, this metadata field is more volatile than it is on a Windows-centric file system such as NTFS or FAT. Simply opening a Google Doc and pressing the space bar causes the item to auto-save and update the Modified Date. Other triggering actions include making or editing comments, resolving comments, updating permissions, and renaming the file.

We recognize this is an imperfect metric for evaluating substantive content changes. Due to the volatility of the Modified Date, our tally of modified items necessarily overstates the true frequency of post-transmittal content modification.

A second compounding factor: Google Vault follows links anywhere in a Gmail thread, including replied-to body text and forwarded content buried months or years deep. A link from an old message in a long thread gets re-collected at acquisition time, giving it more calendar time to accumulate Modified Date changes without any actual content alteration. This also causes our post-transmittal modification tally to be overstated.

Additional Parameters and Implications

At the time of publication, Google Vault does not include hyperlinked GDrive items from other potential sources within the Google ecosystem, such as Calendar, GChat, or other Google Docs files. The source data for this initial study therefore focused only on Gmail. It remains unclear whether the type of communication providing the links has a material effect on the resulting metrics.

Improvements for Future Studies

Google Drive Audit Log—sample events for a Google Doc

Future studies should seek to move beyond reliance on GDrive Modified Dates as indicators of substantive content modification. Hash values offer little additional help: the server-side hash values provided by Vault include the Modified Date in the hash computation, further reducing visibility. The following data points and workflows may prove useful:

  • Number of post-transmittal revisions. Fifty subsequent revisions over three months may suggest substantive change; five revisions very soon after transmission may indicate noise such as comment resolution.
  • Number of different contributors. If ten different accounts made post-transmittal changes, this may indicate meaningful collaborative activity, as opposed to a single actor refining a draft.
  • File size. Generally unreliable except for drastic changes—for example, an item that grew from 24KB to 240KB—and even then the change could be entirely attributable to commenting activity.
  • Acquisition of all versions of each hyperlinked item. This would yield a rich dataset but is impractical for most routine e-discovery collections due to data volume, time, and cost. AI may be able to assist with evaluating content changes across versions if provided a thoughtful prompt.
  • Expansion of source communication types. This study focused on Gmail, but many other sources warrant consideration: Google Calendar and Groups, GChat and in-meeting messaging, Slack channels and direct messages, text messages, WhatsApp, Telegram, Discord, Signal, and others.
  • Google Drive Audit Log. The audit log may provide the clearest window into whether a document’s content—as distinct from its metadata, comments, or permissions—was actually changed. It records events with enough granularity to distinguish views, comments, renames, permission changes, and true edits. Filtering to “edit” events alone would be a meaningful improvement over relying on Modified Date. Key challenges include:
    • Default log retention of only six months, though events can be ported continuously to SIEM systems such as Google Security Operations or Splunk.
    • Potentially tens of thousands of entries per GDrive item, requiring targeted filtering at collection time.
    • High-level tenant access requirements, particularly for pulling data via the Reports API and/or Drive Activity API.
    • Unavailability of equivalent log data for personal Google accounts not part of a Workspace tenant.

Final Thoughts

The data points leveraged in this initial study are admittedly imperfect, but more precise metrics are coming within reach as our industry’s tools and collective knowledge about collaboration platforms improve. We hope this inaugural research leads to further discoveries that better inform discovery practitioners and the courts.

Tools & Further Reading

Lighthouse’s Linked Files Solution for Google Workspace

Lighthouse’s Modern Data Solutions

Craig Ball, “A Dog and Its Tail: Don’t Let Version Uncertainty Cloud Linked Attachment Production

Metaspike’s Forensic Email Collector

About the Author

Josh Headley

In his role as Manager of Digital Forensics, Josh provides a range of technology services including forensic evidence collection, theft and misappropriation investigations, and departing employee red flag reporting. With over 12 years of ediscovery experience in Big Law and service provider environments, he is uniquely positioned to approach digital evidence from a legal workflow perspective. Prior to this role, Josh held positions in litigation support, solutions architecture, and sales engineering. Skilled at guiding the development of software applications and utilities to streamline electronic discovery operations, he also directed a large data processing team as well as an application development group at a leading ediscovery services provider.