Nothing can derail an eDiscovery project like a poorly planned email collection, and one of the most common road blocks we see are searches performed in live email environments pre-collection. There are several issues with performing live searches across email systems including; loss of the opportunity to sample and test search terms; inability to search non-searchable messages and attachments; difficulty in identifying custodians; breaks in chain of custody for search exports; and the lack of a preservation copy for future use…just to name a few.

Rather than performing a live search in existing email environments, D4 recommends that a forensic collection be performed for each custodian. Such a collection allows for proper preservation of the data set.  In addition to the preservation copy, a forensic collection allows for another copy of the data to be ingested into an eDiscovery tool.  Utilizing an eDiscovery tool is beneficial because it is designed for indexing large amounts of data, searching non-searchabledata, sampling and testing search terms, using analytics to expand search terms, and deduplicating data while tracking the custodial values of the deduped files.

Take these nine steps for your most defensible email collection protocol yet:

1. Extract full PST/NSF Files from Live Email Environments

When working with a live email environment, D4 recommends that you export the individual custodian’s mailbox to PST or NSF file, rather than searching within exchange.

2. Import Custodian PST/NSFs to an eDiscovery Tool

There are several eDiscovery tools and all of them accept PST or NSF files for ingestion.  At the time of import, the custodian is assigned to all files within the mailstore (PST).  This allows for tracking of each file, as it is deduped creating a record of each custodian’s files.

3. Resolve Errors and Exceptions

While ingesting files, most eDiscovery tools alert the user to files that have exceptions due to encryption, corruption, unrecognized filetype, extraction errors, etc.  These exceptions are reconciled as well as they can be, and those unable to be reconciled are reported as non-processed documents.

4. OCR All Non-Text Messages

Certain attachments and messages may not have searchable text after import.  These non-searchable files will be made searchable by running an Optical Character Recognition (OCR) software across them.  OCR allows the non-searchable files to be searched.

5. Identify and Analyze Search Terms

An eDiscovery expert should review search terms to ensure their efficacy for the specific tool in which they are being applied.  This analysis should:

  • a)    Include the identification of “stop words” – certain words and punctuation withheld from the index.  (These are ignored by most search engines.  Examples of “stop words” are about, but, the, to, from, you, etc.)
  • ** Search term analysis should be run before indexing so as to allow for adjustments to be made to the stop word list or punctuation list, if needed.
  • b)    Ensure the search syntax conforms to the syntax of the search engine;
  • c)    Take a close look at proper names to make sure that nicknames and formatting are captured;
  • d)    Confirm email addresses are properly formatted, and;
  • e)    Make sure that number strings, leading wildcards, and single letters should are avoided as much as possible. These terms seldom yield the desired result.

6. Index All Data Including Messages

Once all files have text, the eDiscovery tool will create an index of all the words in all the files in the database.  This index is then searched.

7. Report Results

Once the searches have been run, search term reports should be reviewed to look for anomalies.  Terms netting few or no results should be reviewed for syntax and formatting to confirm they are working as expected (refer to step 5).  Terms returning large hit counts should be reviewed with the intent of applying limiters (proximity or qualifiers).

8. Sample Results

Certain eDiscovery tools allow for the generation of random sample sets from search results.  Samples can be reviewed to confirm efficacy of search results and weed out false positives.

9. Refine Search Terms

After running the first iteration of the terms and having reviewed the results and sample sets, a refined set of terms can be re-run across the data set.

The above protocol is best practice for working with live email environments, but it is important to note that not all organizations have the same data retention policies, and not all custodians store their email data in the corporate email environment.  In fact, many organizations encourage their employees to archive email on local shares or on their laptops.  Searches run across the [live] environment often do not account for these files.

For these reasons, D4 recommends that, in addition to the forensic collection of the exchange environment, firms also perform custodian interviews.The two primary goals of those interviews should be to interview each employee related to the matter and to identify areas where he/she may store data.

With these basic principles in place you will ensure that your collection sets up the rest of the e-discovery project for success and no train wrecks lie around the next bend.