Like most cloud-based productivity platforms, Google offers solutions for both home and business environments. Free for personal use applications such as Gmail, Google Docs, and Google Drive deliver a rich set of communication and Office-like functionality that have near feature parity with their commercial corporate-focused G Suite counterparts. From the perspective of evidence acquisition in the civil arena, we find a significant number of organizations bypassing the conventional Microsoft stack in favor of G Suite. These organizations tend to operate in the technology space including biotech, electronics, engineering, and all flavors of “garage” startups.
While cloud platforms enable a limitless world of collaboration and information storage, they also introduce an alternative set of metadata that can trip up seasoned examiners and eDiscovery practitioners. This can be particularly problematic for metadata dates. Historically, determining the date of a file that moved between computers is quite simple; however, arriving at the “best” date for any given piece of cloud evidence can be a subjective exercise and is limited to metadata exposed and potentially altered by the cloud platform. In the following post, I’ll dive into how this issue arises so that practitioners and analysts can use the most accurate evidence date for their eDiscovery needs.
A “document” in Google Docs is simply a set of records and field values stored in a database. This departs from the traditional concept of a document being contained in a stand-alone file on your computer’s desktop. Currently, to be reviewed alongside traditional ESI, a Google Doc (ie, a spreadsheet or presentation) must be pulled from Google’s database, converted into a traditional document file, and downloaded for processing and review.
Thus, the handling of dates can become an issue for documents within G Suite. If a Microsoft (MS) Excel document is created by a user on their laptop, uploaded to Google Drive, edited in place, and then later downloaded for eDiscovery purposes, what is the document’s date? A typical MS Office (Excel, Word, PowerPoint, etc) document has three dates assigned by the file system (think: my laptop’s hard drive): Created, Modified, and Accessed. It also has up to three dates “embedded” inside the file itself: Created, Modified, and Last Printed. What happens when the Excel file makes a round trip to Google and back? With so many dates to choose from, it’s tough to pick just one!
Before the upload to Google Drive, here are the file system dates for our MS Excel document. Notice that the file system is telling us the document was created on June 30, 2020, at 11:33 AM.
And here are the embedded “application” dates. Note that “Date last saved” is essentially a “modified” date, and this document has not yet been printed. By looking at the application-level dates, we can also tell that the file was actually created at 11:04 AM, and then copied to its present location at 11:33 AM.
After uploading to Google Drive, Google will assign its own Created and Modified dates to the item. You’ll notice in the graphic below that Google’s displayed Modified date of June 30 at 1:36 PM matches the Modified date of the original file. So far so good! But, take a look at Google’s recording of the Created Date: it’s been set by Google to simply “11:23 AM” on the date of the upload action (July 10, 2020.) Notice also that Google indicates the document was created “with Google Drive Web.”
Now, let’s make an edit to the Excel file. There are two ways to accomplish this in Google Drive: 1) you can edit the document “in place” using Google Docs without abandoning the original MS Excel format, or 2) you can do a “Save As” and convert the document into Google Sheets format. In this example, we are going to use method #1 and make a couple of edits to our MS Excel file. Google Docs immediately auto saves the file for us. Let’s look at the dates.
After editing in Google Drive, but leaving as Excel format, you’ll notice in the graphic below that Google’s Modified date has been changed to the time of the edit. This makes sense. The Created date, which Google previously set to the time of upload, remains the same.
Let’s assume that this record is needed for e discovery purposes, and it is downloaded from Google Drive to a forensic examiner’s machine to pass along to the case team. When the file reaches the machine, the creation of the new file results in the following file system date values. Notice that they’ve all been changed to the date/time of the download action!
However, if we take a look inside the Excel file at the embedded “application” dates, we notice that we have a creation date of 6/30/2020 at 11:04 AM that has remained unaltered throughout this entire process. However, the “Date last saved” is reflective of the time of the download action. We may have expected this date to be set to 11:27 AM, which was the time at which the document was edited in Google Drive, but it is unfortunately altered by the download action. The image on the right shows the “Info” tab from MS Excel itself, which indicates a blank value for “Last Modified.”
Using the same Excel file, I will now choose to “Save as Google Sheets”.
You’ll notice that the creation and modification timestamps in the graphic below have been set to the time at which the MS Excel file was converted to a Google Sheet. Google also indicates the application that created the document was “Google Sheets.”
I made a couple of edits to the file in Google Sheets and then right clicked to download it to my workstation. First, Google converts the file from Google Sheets format into MS Excel format.
Let’s see what dates we get! Here are the file system dates on the machine that downloaded the document. You’ll notice that all of them are set to the date and time of the download action, which is inconvenient for eDiscovery review to say the least.
Unfortunately, the embedded application-level dates have also been altered:
And, inside Excel, the application’s “Info” tab shows only empty values. The conversion of the MS Excel file into Google Sheets format resulted in the loss of the application-level dates. Currently, a Google Sheets document created from scratch will not have any useful application-level dates for us to leverage in downstream eDiscovery processes.
The examples above only cover a personal Google Drive account accessed via the Chrome web browser. However, users can also access Google Cloud resources via Apple and Android mobile devices, a “synch” folder on their computer, and countless third-party applications that leverage Google’s API (application programming interface). Evidence extraction can occur using a simple “right-click-download” method, proprietary applications for forensic examiners, Google’s Takeout interface for consumers, and Google’s Vault archiving system for enterprise customers. Each combination of the end user's application and the evidence extraction method can result in a unique series of changes to the evidence dates.
About the Author
Manager, Forensics | Josh Headley is a member of Lighthouse's Advisory group. In his role as Manager of Digital Forensics, Josh provides a range of technology services including forensic evidence collection, theft and misappropriation investigations, and departing employee red flag reporting. With over 12 years of ediscovery experience in Big Law and service provider environments, he is uniquely positioned to approach digital evidence from a legal workflow perspective. Prior to this role, Josh held positions in litigation support, solutions architecture, and sales engineering. Skilled at guiding the development of software applications and utilities to streamline electronic discovery operations, he also directed a large data processing team as well as an application development group at a leading ediscovery services provider.