Deduplication. What a headache. Time and time again we continue to encounter identical records that refuse to deduplicate. Some contributing factors include email migration, the format in which custodians store their emails, file manipulation, and collection from multiple email clients. Did you know that collection from mobile devices usually includes images and icons as attachments versus an original email from the exchange server which keeps them together? Stubborn.

The hashing process

The process for hashing emails is different than that used to hash loose files and attachments. Emails require looking at various metadata fields and combining those values into a string. The string is then run through a hashing algorithm. Any variation in the fields will result in a different hash value. For example, common fields include the To; From; CC; Subject line; Sent Date and so forth. Non-emails are more straight-forward. The system hashes the file in its entirety creating a unique fingerprint from the file signature. 

Consider metadata matching

You might have tried email threading or near deduplication, but what about metadata matching? The EQ Data Intelligence Group has developed a workflow wherein a select number of metadata fields are compared to identify matching emails. Each field of the matching emails is compared, and the matches are flagged with certain values. The internet message ID is a unique identifier created by the email client when the message is sent. This is why emails collected from two different sources can match. Once the matching process is complete, an EQ Data Intelligence Group Consultant would prepare a set of match families for sampling to confirm that the metadata matching process is yielding the desired results. If so, this data can be removed from the review population and significantly reduce the overall volume that needs to undergo review. 

Pre-Review Analysis:  Simple yet effective.

The Pre-Review Analysis, or PRA for short, is a workflow innovation that has been around for quite a while although it is arguably under-utilized. Pre-Review Analysis is not a complex process and that is the beauty of it.

The benefits of the PRA approach include:

  • Gaining a collaborative understanding of overall  review goals
  • Identifying potential exceptions before beginning document by document review
  • Mitigating starts and stops in review which could avoid expensive delays in the review process

So, how does it work? An EQ Data Intelligence Group Consultant starts by conducting a data analysis, looking at file and record type, percentages of total volumes, etc.  The consultant will also conduct a date/chronology analysis to identify gaps in time, as well as various other types of data analysis. The next step is the world of analytics: email threading to identify large email threads, near duplicates to identify large groups, case caption concept search, search term analysis, and foreign language analysis amongst others. Once the PRA is complete, the case team meets with the consultant to review the results and collaborate on next steps. This could include things such as tailored workspace dashboards, custom reports based on PRA findings, and review and workflow strategies. The PRA is a useful tool to better understand the collection prior to beginning review. Why not test the water before diving in?

Interested in learning more about alternate deduplication methods? We recently hosted a webinar on the topic with a corresponding white paper.

Lili Sorondo Rosenberg is a Director of Delivery for EQ, the legal consulting division of Special Counsel. Connect with Lili via LinkedIn or email today!