If you have had the pleasure of dealing with ESI, and its preservation and collection, you may have heard terms thrown around like bit-stream forensic imaging, file slack, or unallocated clusters. You may have even used the terms yourself without actually understanding what they mean. That is all about to change…
What is ESI?
Let’s start with a brief and simple overview of how computers store information on a hard drive.
A hard drive resides inside a desktop or a laptop chassis and it’s about half the size of a brick, but much lighter (in a laptop it is much smaller and lighter).
- For the purpose of this not-too-technical discussion, we are going to assume that a hard drive contains all the ESI on a computer. Inside the hard drive are magnetized platters that can store billions of bits (binary digit) of information.
- A bit is either ON or OFF, a one(1) or a zero(0). These bits are then organized into larger segments, called bytes.
- 8 bits equal one byte, and 512 bytes are organized into one sector, which is a physical space on a hard drive. The hard drive file system then groups contiguous sectors into clusters.
Confused? Let’s use the analogy of a filing cabinet and a table of contents, or index. Our make believe filing cabinet has several drawers, and each drawer contains 100 folders (clusters) and each folder can hold 100 pieces of paper (bytes) and we can take it one step further and think of words or letters on the paper as bits.
In the real world, if one wanted to retrieve a paper file located somewhere in the filing cabinet, one would consult the index, or table of contents, and pull out the file. A computer works much the same way. In addition to other complex tasks, an Operating System (i.e. Windows 7, Windows XP, etc.) tracks and maintains all of the files in its virtual filing cabinet. The difference comes when a file is deleted.
ESI Term #1: Unallocated Clusters
In the paper world, one would take a file out and throw it in the trash, or shred it and remove the entry in the index. In the digital world, it is impractical for a hard drive to “wipe”, or throw out every file when the delete key is pushed, so the computer simply crosses off the entry in the index.
The computer now sees that portion of the cluster—or “folder”—as available, but the old data, or bytes, still exists. The free area where the old data resides is known as unallocated clusters. Because the old bytes are still there, computer forensic professionals are able to search, retrieve, and analyze “deleted” data on unallocated clusters.
ESI Term #2: File Slack
Next up is file slack, a term that is often misused by the uninformed. In its simplest terms, file slack is wasted space in a cluster. If a cluster has the capacity of 4,096 bytes, and a file that is 3,000 bytes is written to that cluster, the remaining 1,096 bytes is file slack. It’s when that original file is deleted that things become more interesting.
Remember, the data is not deleted; only the entry in the index is removed.
Go back in time when there was no TiVo. If you recorded a 60-minute television program on a VHS tape and then someone recorded over the first 45 minutes with another program, you would obviously lose those first 45 minutes of the 60-minute program. However, 15 minutes of the original program still remains. On a hard drive, that remaining data is file slack.
Let’s say that a computer file occupies an entire eight sectors, or exactly one cluster, for a total of 4,096 bytes of data. Think of bytes as words or letters. If that file is deleted, that cluster is marked by the Operating System as available for new data to be written. Remember, the data is not deleted; only the entry in the index is removed. Let’s then assume that a new file is created that is 3,584 bytes in size and it’s looking for a place to call home. It cozily settles into the eight sectors that were originally occupied by our 4,096 byte file. In this example, if we performed forensic analysis and looked at this one cluster, we would find 512 bytes of data from the original file. This may or may not yield ESI with evidentiary value, but nonetheless it is residual data, or more accurately, file slack.
ESI Term #3: Bit-Stream Forensic Image
On paper, unallocated clusters and file slack may seem like a potential treasure trove of evidence. The proverbial “smoking gun” may be lurking in these areas, so how do we get it? It is captured by what is called a bit-stream image. You may also have heard the terms, forensic copy or forensic image, hard drive clone, bit for bit copy; they all have the same meaning.
A bit-stream forensic image is an exact copy of every bit that is found on the hard drive. A bit-stream image also captures all of the metadata (data about data) associated with a file. Of course, the imaging procedure has to be executed properly to ensure no data is altered. Computer forensic professionals take great care to make certain no data is altered during the imaging procedure. They utilize specialized equipment, called write-blocking devices, which halt any inadvertent alterations to the hard drive.
Data is fragile; the mere act of booting into Windows XP alters at least 50 files and creates a few, as well. I am not advocating barricading an employee’s cubicle with yellow police tape when there is a suspicion of wrongdoing, but there are prudent measures that can be taken to mitigate the possibility of destroying evidence. (See: When Companies Should Secure Their Data [Infographic])
ESI Term #4: Active File Collection
The flip-side of creating a bit-stream image of a hard drive is an active file collection. Creating a bit-stream image of a hard drive is not always necessary. Some matters only require the preservation and production of active files from hard drives, or other sources of ESI. An active file collection would only capture those files listed on the virtual index and would not capture any deleted files, unallocated space or file slack. In litigation matters, this is a key subject to deliberate during the meet-and-confer, where the preservation and production of ESI is discussed.
The most important step in the eDiscovery process is the preservation and collection of ESI. ESI is fragile and any mishandling of it can spell disaster for your client and its organization. As an attorney you have an obligation to guide and advise your client with respect to the complete and accurate preservation and production of all potentially relevant ESI in each proceeding. By familiarizing yourself with terms like bit-stream forensic imaging, active file collection, file slack, and unallocated clusters, you are one step closer to fulfilling that obligation and on your way to becoming a true eDiscovery attorney.
5 Common ESI Terms & Concepts Every Attorney Should Know
- Know how ESI is stored on a hard drive.
- Unallocated clusters: On a hard drive, when data from a portion of a cluster is deleted, or “crossed off”, the free area where the old data resides is the unallocated cluster.
- File slack: In its simplest terms, file slack is residual data, or wasted space, in a cluster.
- Bit-stream forensic image: A bit-stream forensic image is an exact copy of every bit that is found on a hard drive.
- Active file collection: An active file collection only captures files listed on the virtual index and does not capture any deleted files, unallocated space or file slack.