At Retransactional eDiscovery pricinglativity Fest this year I was asked to present ideas on how the shift to cloud solutions, and more explicitly RelativityOne, is shaping the future of transactional eDiscovery pricing models with an eye towards total project spend. Much of the presentation turned into an interactive discussion with those in attendance to flesh out common misconceptions in addition to conducting a deep dive into the various workflow options made available as a result of advancements in the software tools the industry leverages on a daily basis.

What resulted was a healthy exchange of ideas, and a few heated debates, from a room with various different business drivers across the law firm, service provider, and in-house perspectives. The argument set forth before the crowd proposed that the industry at large is ready to mature beyond the traditional approaches to eDiscovery workflow and costing that were born out of the limited toolset available over a decade ago.

How Did We Get Here?

In the early 2000’s virtually all software platforms developed to overcome the challenge of managing evidentiary data were bifurcated into disparate phases within the discovery life-cycle. The end goal was always to get discovery data into a format, and environment, that could be reviewed with ease by a team of attorneys. The popular review platforms of the day consisted of networked desktop installations, typically within a law firm environment, of products like Concordance or Summation. These platforms gained popularity through the late 90’s as the go-to solution for organizing, reviewing and producing documents that largely started as hard-copy that were scanned to image format. They quickly evolved to handle records that were born from electronic document sources, like email and office files. However, these were strictly review tools. The barrier to entry has always been the heavy lifting required in advance to convert the records into a form that the software platforms could understand.

As email became the primary communication source for most businesses, it didn’t take long before traditional solutions designed to handle paper files quickly buckled under the weight and volume of mass ESI stores. There was a time when entire inboxes were printed to hard copy only to have them scanned to images so they could be burned to CD and delivered to a law firm that would then load them into their review platform. It didn’t take long before the major software players in the industry responded with solutions designed to convert electronic documents natively so they could then be loaded cleanly into desktop review platforms. Commonly referred to as the Processing phase, containerized sets of custodial data would be collected and ingested into the software suite to create enumerated records for each email, attachment, and embedded file while maintaining a reference to the origin or parent of the document in question. Embedded text and the body of each document would be extracted in addition to any fielded system information about the file, or metadata, such as sent dates, authors, recipients etc.

[click-to-tweet tweetcta=””There was a time when entire inboxes were printed to hard copy only to have them scanned to images so they could be burned to CD and delivered to a law firm that would then load them into their review platform.” @KrisWasserman @SpecialCounsel #eDiscoveryPriceModels” /]

The tools available to physically gather and collect source data in a defensible manner were limited at the time. In turn, most matters that involved electronic evidence would suffer from a vast overcollection. Entire email databases, file servers, and images of desktop hard drives would be created at the outset of each project to ensure there was always a pristine copy of the originals and to head off the need to disrupt end-client business operations should the need to go-back-to-the-well arises midstream. This resulted in the birth of the ECA, or early case assessment, workflow. This is where most of the revenue was generated for service providers of the day.

In hindsight, one can synonymize eDiscovery service providers as the manufacturing arm of the litigation support business. Large collections of unstructured documents and data would come in, they would be converted to a usable form, then delivered to law firms. Much of this paradigm has shifted since these wild-west days, but it is precisely this business relationship paired with the limited technology stack at the time, that can be pointed to as the birthplace of the IN/OUT, otherwise known as transactional eDiscovery pricing model. Sprinkle some new rules of federal civil procedure that were codified in 2006 with minimal updates since, and you’ve got a recipe for a pervasive and lasting way of approaching discovery that has proven difficult to disrupt.

What is the IN/OUT Pricing Model?

The IN/OUT models were born when housing of discovery data started to outgrow what a law firm IT department could realistically maintain internally. Storing hundreds of thousands of files being accessed by multiple, sometimes hundreds, of users at the same time requires significant horse power, infrastructure, and human capital to manage effectively. It was around this time when service providers started offering document review hosting services to ease this burden. The premise of the IN/OUT model is based on how the processing technology was designed to work. There are exceptions to this rule, but most data processing tasks involved multiple steps.

The end-goal is to leverage data processing tools to cull out as much superfluous data up front in an effort to reduce the storage footprint and volume of documents that ultimately get hosted in a review platform.

The first step would be to identify all the files within the containers and extract the text and metadata. Often a custodian or global duplicate suppression would occur at this stage as well. All of that extracted information would be put into a database that sits behind a user interface available through the processing platform and front-line analysis or culling would take place. The number of gigabytes fed into the processing environment would be referred to as the “IN”. The end-user would then do their best to make broad sweeping cuts by using date filtering, key word searches, and other methods to exclude as much as humanly possible while still maintaining a comfortable level of precision.

Save for a few corporate legal departments or law firms that chose to bring this technology in house, most of this was largely black-box to the ultimate end-user. Visibility into the actual documents that hit on searches, versus being left on the cutting room floor, were limited to lengthy reports that detailed the number of documents that hit on each search term. Any documents that hit on these searches would then be processed in such a way to extract the native files. All of the original text and metadata would then be merged with the extracted natives to ultimately be combined into a single export that is perfectly formatted (usually), to meet the database schema of the requesting party. The number of gigabytes that resulted on disk after this exported would be referred to as the “OUT”.

Typically a lower per unit rate would be charged by service providers for anything that went “IN,” and a higher per unit rate for what came “OUT.” Hypothetically, a 500 GB project that resulted in 100 GBs after culling would often see charges anywhere from $20-$60 up front, then $100-$200 on the back end. Many practitioners in the industry continue to build repeatable processes and structure their entire approach to reducing discovery spend around the basic fundamentals associated with this particular model. The end-goal is to leverage data processing tools to cull out as much superfluous data up front in an effort to reduce the storage footprint and volume of documents that ultimately get hosted in a review platform.

There will always be use case scenarios that come up in today’s discovery landscape that will prove the effectiveness and simplicity of this approach as the preferred alternative to other methods. However, as business users continue to find new ways to communicate and store information, and as the volume of information that we all create on a daily basis continues to climb, the risks inherent with this model are too great to overlook. More importantly, much of this approach was a byproduct of what the technology was capable of doing in the first place. Finally, looking at simply the cost to process and host discovery data only take into account a minor portion of a much larger, and much more complex supply chain that sees case teams outsourcing more work than ever before to service providers.

In my next post I will be outlining the inherent risks with this workflow and the benefits of a possible alternative solution.

[callout-box ctatitle=”Looking to lower your eDiscovery costs?” ctacopy=”There is little debate that eDiscovery can be a daunting and expensive endeavor. This white paper breaks down steps you can take throughout the EDRM to lower overall costs.” ctalink=”” ctalinktext=”Get your copy of the white paper to get started → ” /]