When taking on a matter that includes foreign language documents, I find clients are typically split into two distinct buckets:
- Those that deal with foreign language reviews often and
- Those that hardly ever find themselves at the helm of a foreign language document review.
As a result, clients are looking to eDiscovery vendors more and more to set the standard and have a best practice workflow in place to deal with what can be a very complicated and error prone workflow. I’ll start with a little story to set the stage, but for purposes of this article, some specific facts have been altered to protect the confidentiality of the client described.
Large pharmaceutical company involved in a DOJ subpoena regarding drug pricing and sales following the acquisition of a foreign entity. Right off the bat, we knew we would be dealing with foreign language documents and we knew from which country the foreign entity hailed, so we had a fair idea of which foreign language(s) would be present in the documents. Of course, we had the typical email and loose document collections, but we also had the added complexity of some handwritten documents in the foreign language, which can be tricky. Corporate client’s first question was: “How much is this going to cost us?”, while outside counsel’s first question was: “Who speaks German?”
It’s hard to know where to start when faced with a mammoth number of documents, let alone when more than half of those documents are not in a language you speak, nor does anyone else on your payroll. Nonetheless, it’s best to start at the beginning.
Do I have Software that Can Handle Foreign Language Records?
Assuming the preservation and collection of the foreign language data has all been completed correctly and in accordance with the General Data Protective Regulation (GDPR) if coming from the EU (if you’re looking for a great article about data privacy and compliance check out this blog on Cross-border eDiscovery) then you really have to ask yourself whether the software you plan to use for processing this data has the capability to handle foreign language?
Best Practice Tip #1: Confirm the processing software you plan on using has multi-language OCR, so it can recognize text from files with languages other than English. Not every processing module is built the same, so check with your provider. This feature is critical if you plan on searching the data in any meaningful way once in your hosting environment.
Back to our story, so we confirmed all the data has been handled appropriately at the preservation and collection phases, and we have our bases covered with our processing module having the capability to recognize and OCR our foreign language documents. We load everything post deduplication and date restriction to Relativity, our preferred hosting environment. Relativity, for quite a while now, has offered Language Identification as part of their Structured Analytics package. Language Identification in Relativity examines the extracted text of each document to determine the record’s primary language and whether up to two secondary languages are present. For example, an email thread could be going back and forth where German is the primary language being used, but then an English-speaking colleague responds and the thread continues in English. Relativity would report that document as being primarily German with a secondary language of English, which is really helpful when deciding how to bucket your foreign language documents for review.
Best Practice Tip #2: Use Relativity’s Language Identification to assess how many documents present in the collection are primarily a foreign language, which foreign languages those are, and easily separate documents out by language for future batching purposes. The Foreign Language reporting from Relativity is easy to understand and gives you a 10,000-foot overview of where things stand post-processing. Now, the real work begins…
Am I Working with Strategic Partners that Can Enhance My Ability to Conduct a Foreign Language Review?
Clients tend to handle foreign language document reviews in one of two ways: brute force with contract attorneys fluent in the subject foreign language; or, approaching with a mix of technology, English speaking contract attorneys, and foreign language contract attorneys. The latter being more time consuming, but overall more favorable on the purse strings, in my experience. So, in our story, we went with leveraging technology as the first line of defense.
There are few software companies that specifically handle foreign language document machine translation, which is good because the few that do, do it really well. We decided to engage a third-party vendor to provide machine translation services for those records we’d identified using Relativity’s Foreign Language Identification as having a primary language of German (our foreign language in question). Workflow note: For the small subset of handwritten foreign language documents, we sent a couple of samples to the machine translation vendor to see if the scans were clear enough to machine translate. It was determined they were not, these records were batched directly to the foreign language reviewers for manual review.
The third-party vendor would provide machine translated versions of each record that could be displayed in Relativity, so that our English-speaking contract attorneys can perform a simple Responsive/Not Responsive first-pass review. Depending on the eDiscovery vendor you work with, machine translation can be available as a plug-in directly within Relativity or it may require you to export out the record set in question and load back in the machine translations later.
Best Practice #3: If you have to export/import the foreign language documents and machine translations, it is crucial that you have a solid understanding of the format the machine translation vendor requires when exporting and that you have a detailed, well-documented workflow on how you are going to import the machine translations back into Relativity and properly link them with their foreign language counterpart. If you don’t properly link the machine translations with their foreign language counterpart, when the time for production comes things will be a mess, but more on that later.
Once the machine translations are in Relativity, now we can run targeted searches, leverage advanced analytics, and have English speaking contract attorneys review the translations for basic Responsiveness for a fraction of the price. This isn’t to say we aren’t going to have a foreign language contract attorney complete any review, but we are taking advantage of being able to reduce the number of records our foreign language contract attorney will have to look at by using the machine translated versions first.
How Can I Setup an Efficient Foreign Language Review, While Also Staying Within Budget?
In our story, the client initially started with roughly 250,000 documents that Relativity Language Identification reported as having a primary language of German. Those records were machine translated, which gave us the ability to run keywords and leverage email threading and near-duplicate identification.
Best Practice Tip #4: In many cases, especially for those languages that do not use the Roman alphabet, hiring a specialist to translate your search terms into the foreign language at issue and testing them on the population prior to machine translation will ensure you do not miss potentially relevant documents that may have otherwise been missed. You can imagine, it would be nearly impossible to take a proper name or product name or finance term of art written in foreign language characters and expect machine translation alone to return a uniformly-understood and correctly-spelled English word.
In our story, by conducting a more targeted collection upfront meant that our keyword testing whittled our population down by only about 10%. Email threading, however, made a fairly sizeable dent, where inclusive only emails culled down our population another 45%. Near-duplicate identification on the loose documents did not cull out any documents, but did help us create smarter batches, so reviewers could review textually similar records in a single batch and make more consistent coding choices.
Overall, from the population we had machine translated (250,000), our English-speaking contract review attorneys had to review roughly 125,000 documents following our culling efforts. Our rate of review on that population was roughly 50 documents per hour and we had a responsiveness rate of approximately 35% or roughly 40,000 documents. This small subset was then batched to foreign language document reviewers with a certified fluency in German to review for substantive issue codes, privilege, and for overall case strategy purposes with outside counsel. We saved the client hundreds of thousands of dollars in review costs by leveraging machine translation and using our foreign language contract attorneys to review only about 40,000 documents as compared to our initial 250,000 document population we started with. Workflow note: to confirm the accuracy of our machine translations and of the coding made by our English speaking attorneys on those translations, we did have our foreign language contract attorney reviewers sample roughly 10% of the Non-responsive records for an added layer of confidence in the workflow.
Are there Unique Production Considerations When Handling Foreign Language Documents?
Determining how to handle productions following foreign language review can be tricky depending on how the machine translations were handled once in Relativity. The obstacle on the technical side is making sure that:
- We only produce the foreign language copy of a responsive, non-privileged record; and
- That the coding in Relativity is consistent between the variations of a single document, so as to not cause coding conflict anomalies, which can cause confusion when approving production populations. Clients don’t often produce machine translated copies as it is considered work product.
You may recall earlier in this article, I stressed the importance of linking the machine translations with their corresponding foreign language copy. This is critical because it’s easy to find yourself in a situation where you’ve run a pre-production search pulling in any documents where Primary Determination is Set to Responsive, Privilege is Set to Not Privileged, plus family members, and end up with a search returning machine translated copies, but not the foreign language equivalent. How does this happen? For example, the English-speaking reviewer marks the machine translation Responsive and Not Privileged, we then batch the foreign language counterpart of that record to the foreign language reviewer they downgrade the record to Not Responsive. If those two records are not properly linked in Relativity, the machine translation gets picked up in the pre-production search, but the foreign language copy does not.
Best Practice Tip #5: Make sure the machine translated documents are properly linked with their appropriate foreign language equivalent to avoid inconsistent coding. If your Relativity expert is not able to do this, an alternative workflow is to designate unique Control IDs for the machine translated records with a parallel to the foreign language copy, so you can exclude them using a search condition. Either way, it’s critical to have a documented workflow of how you are tracking machine translated and foreign language copies of a single record and what impact that workflow has on creating production volumes.
In summary, foreign language reviews can be overwhelming and there are certainly many moving parts. However, designing a workflow and roadmap that accounts for the twists and turns and, most importantly, tracks the decisions made at each turn and why will pay dividends and lead to a repeatable process that can be implemented on future foreign language reviews. Likewise, working with strategic partners to help facilitate the technical nuances present in a foreign language review is paramount for producing solid work product within budget.