How do you respond when opposing counsel argues that it is not possible to cull a large, collected data set with potentially responsive data by applying keywords because of the shorthand nature of email communications and text messages? What if this argument is further based on a belief that individuals communicated in cryptic format and would not necessarily use terminology that would be searchable using a traditional keyword list? What if portions of those communications contain foreign language vernacular and vocabulary that would not be apparent to American reviewers and lawyers?
This is a case study where D4 applied a review based on a statistical sampling workflow to further substantiate D4 and our client’s confidence that the review conducted was thorough and defensibly sound.
Wrongful Termination Dispute Workflow
Background and Initial Review
D4 forensically collected email data and mobile devices from two defendants in a wrongful termination dispute. All collections and workflows were performed and devised with assistance and advice from Counsel for defendants. The mobile data collected included Call Logs, SMS, MMS, and Chats. Email data containing results from searching for variations of plaintiff’s name, as well as work on the mobile data batches with hits for plaintiff’s name resulted in the 11,490 documents. Just over 93% of the reviewed documents were not responsive, and only about 4.5% were deemed responsive. An additional 2% were marked privileged.
When leveraging analytics technology in managed review, the power of defensible, highly prioritized reviews can be adapted to the needs of each case. Download this on-demand webinar to learn how reduce your review costs →
Review of documents was performed by US licensed practicing attorneys under the direction of a licensed practicing attorney experienced in document review, management, and review quality control. The design of the review and the criteria identified for responsiveness and privilege centered around allegations of the complaint and the document requests herein.
The use of search criteria has been found to be both effective and acceptable as a filter prior to review for production of documents. In this case, the filtering prior to review was an absolute necessity, due to the fact that data collected from defendants exceeded more than 160,000 items of emails, files and text messages. Filtering emails using only a name is perhaps the broadest possible form of filtering, making the review of the resulting search set unusually broad. D4 used no Boolean (“AND” or “WITHIN”) clauses to limit the use of the search terms, as a result, every single document using those search terms was reviewed for responsiveness.
Furthermore, filtering text messages using phone numbers is the broadest possible form of filtering this type of electronically stored information (“ESI”), so that the broadest review was performed on these text messages. Again, D4 used no Boolean (“AND” or “WITHIN”) clauses to limit the use of the search term phone numbers. As a result, every single document using those search terms was reviewed for responsiveness.
Parties filtering ESI prior to review, in both California and Federal Court jurisdictions, “meet-and-confer” to discuss the means and filters by which a large volume of ESI may be filtered prior to review. Upon information and belief, plaintiff was solicited for any search terms they wished to be included in the review, and either failed or refused to participate. After review the resultant production set of 1,040 documents was produced to Plaintiff’s counsel.
Challenge: Burdensome eDiscovery Costs
A day after producing the documents our client informed us that Plaintiff’s Counsel had requested that all text messages and WeChat messages between 8/1/2015 and 3/29/2018 found on mobile devices belonging to the defendants to be reviewed for production. Counsel for Defendants shared with us that Plaintiff’s counsel argues:
- (a) that it is not possible to cull this data set based on keyword because the shorthand format of these messages makes it impossible to predict which messages may be responsive to a set of keywords;
- (b) that he believes that the defendants communicated in cryptic format and would not necessarily use terminology that would be searchable using a traditional keyword list, and
- (c) that they use foreign language vernacular and vocabulary would not be apparent to American reviewers and lawyers.
The request to perform such a full review within such a short time following the production of more than a thousand responsive documents, is an attempt to “weaponize discovery”. This phrase is used by many eDiscovery professionals to characterize the attempt to make the cost of discovery disproportionate to the value of the underlying litigation.
D4 estimated the cost associated with Plaintiff’s request to review approximately 48,000 documents of email data and approximately 65,000 documents of mobile data to exceed $40-50,000 with a review of two weeks to complete. This estimate was based on the most advantageous costs associated with hourly rates for U.S. barred contract attorneys reviewing 45 or more documents per hour being supervised by a U.S. barred experienced attorney review.
D4 concluded that the request to perform such a full review within such a short time following the production of more than a thousand responsive documents, is an attempt to “weaponize discovery”. This phrase is used by many eDiscovery professionals to characterize the attempt to make the cost of discovery disproportionate to the value of the underlying litigation. D4’s conclusion was supported by the Sedona Conference Principles of Proportionality (18 Sedona Conference J 141 et seq.) was drafted by judges, technologists, and attorneys practicing on both sides of the caption in civil litigations. The document states, in part,
- “Principle 1: The burdens and costs of preserving relevant electronically stored information should be weighed against the potential value and uniqueness of the information when determining the appropriate scope of preservation.
- Principle 2: Discovery should focus on the needs of the case and generally be obtained from the most convenient, least burdensome, and least expensive sources.
- Principle 3: Undue burden, expense, or delay resulting from a party’s action or inaction should be weighed against that party.
- Principle 4: The application of proportionality should be based on information rather than speculation.
- Principle 5: Nonmonetary factors should be considered in the proportionality analysis.
- Principle 6: Technologies to reduce cost and burden should be considered in the proportionality analysis.”
In D4’s opinion Plaintiff’s request for full review of the collected documents does not take into account (a) the needs of the case or (b) the most convenient and least burdensome and least expensive sources; and (c) is based on speculation rather than information.
Despite Plaintiff’s attempt to weaponize discovery and failure or unwillingness to participate in methods to pre-filter the collection prior to review, Defendants and D4 undertook further to qualify its own production. Methods of statistical sampling as a means to assure the quality of production is consistent with instruction from The Sedona Conference, the Duke Law Center Electronic Discovery Reference Model and various state and federal case law. Defendants and D4 undertook a measure of statistical sampling quality control that is consistent with best practices in the eDiscovery industry, as follows:
Rather than randomly sample the entire universe of documents and thereby dilute the sampling, D4 and Defendants selected the range of dates within the collection most likely to contain additional responsive documents that did not contain search terms. That range of dates is 9/1/2015 to 01/31/2016.
These dates were selected because the volume of 1,040 documents found to be responsive and, in fact, produced, fit in a bell curve between these dates.
It was D4’s opinion that discussions of text and emails about a person that do not contain a person’s name are most likely to occur in the context of other documents that do, in fact, name the subject of the discussion. Otherwise, the speakers or correspondents would have limited context to identify the subject of their discussion.
The sampling size and power was performed at a confidence level of 95% with a margin of error at +/- 2%, with a sample size based on an even (50%) chance that a document is responsive or not. This sampling power, i.e. the likelihood that it will distinguish an effect of a certain size from pure luck/coincidence, is the most powerful that is recommended in any published case decision or industry guide of which we at D4 are aware, and a power that is consistent with litigation support coding and review going back as far as the late 1980s. The sampling was performed on documents between the dates above that did NOT hit on any search terms, and as a result, were not previously reviewed.
In the sampling conducted between 9/1/2015 to 01/31/2016, Defendants reviewed 2,780 documents using the same staffing, supervision, and quality control as described above. No additional responsive documents were found in the review of the entire sample set, the review of which completed today. In addition to the costs expended for collection, hosting, review and production to date described above,Defendants expended for the conduct of this sampling approximately an additional $3,000 in costs for hosting, review, supervision and consulting.
There is no means to provide perfection in the discovery process, even with advanced tools for technology-assisted review (“TAR”) or even with a complete manual review of all documents, as argued by Plaintiffs. This highly powerful and focused sampling was designed and executed to provide quality control and assurance in the discovery process that reasonably no responsive documents were overlooked.