Corrupting, then immediately repairing, a PST file in hopes of recovering deleted emails may sound counter-intuitive, but here is why it makes sense.

When it comes to eDiscovery, how far is too far? For example, if there is an easy method for potentially recovering deleted files, should this be incorporated into typical eDiscovery practices? This is certainly a consideration when working with emails during the discovery process.

The question on how to handle deleted emails often comes to mind when analyzing or processing PST files. A PST file is a database format designed by Microsoft which contains emails, contacts, calendar items and other items typically associated with Microsoft Exchange/Microsoft Outlook. PSTs are generally created when connecting Outlook to a POP3, IMAP or web-based mail account, or when an Exchange user archives their email files.

At an architectural level, the PST database format contains a record for each Outlook item and an index, which points to each item.

When you empty the deleted items folder in your PST, you are not truly deleting the items inside. What you are actually doing is deleting the  records of those items from the index. The emails still reside in the PST, but they are not visible to the user because Outlook does not know where the items are located without the pointers in the index. There is, however, the possibility of recovering these items. To do this, one would force a repair to the index by purposely corrupting the PST (in a very specific manner). This may sound counter-intuitive, but this is necessary in order for the recovery tool to repair the index. Otherwise, a scan of the original (uncorrupted) PST will result in no repair attempt and will not recover any of the deleted files.

Keep in mind, there is no guarantee the deleted items will be recovered, especially if the PST has been compacted. It is also important to note, that if this process is done improperly, it could lead to the permanent corruption of all data. That being said, let’s take a look at an example of this process to see the results when properly executed.

Will Corrupting and Repairing the PST File Recover the Deleted Emails?

Let’s start with a 1.55 GB PST file labeled “2011-archive,” which includes a folder under the Inbox labeled “Handoff” (see below). Inside this folder there are a total of 151 messages, with 63 marked as unread. (Email addresses have been deleted from all screenshots for privacy reasons.)

Let’s now delete all emails in the “Handoff” folder from October and September 2011. There were 23 emails, with 27 marked as unread. Now, we will empty the Deleted Items folder and close both the PST and Outlook.

Next, we will manually corrupt the PST file using a hex editor, repair it, and view the results. A hex editor is a software program that allows a user to view and edit files on a binary level. To corrupt the PST, we’ll delete the contents of the values in positions 7-13 and save the changes.

To repair the PST, we’ll use ScanPST, an inbox repair tool. ScanPST is a software program supplied by Microsoft that is commonly used to repair corrupted data structures and indexes in PST files. After running ScanPST, we see (below) that not only every email was recovered, but the counts of read and unread items were accurately recovered, as well.

Testing Other Factors: Hard Deletions, Entire File Deletions, and Compacted PSTs

While this is encouraging, there are also other factors to consider that would apply to a PST.

What about hard deletions?

A hard deletion is when someone uses the shift+delete key combination when deleting emails. The hard delete method is a more permanent deletion method because the deleted emails completely bypass the Deleted Items folder.

I thought this may have an impact on the quantity of the results. However, when testing this assumption, I received the same results as when I tested PST files that had been deleted using the normal deletion (soft delete) process.

What happens if entire folders are deleted rather than individual messages?

When an entire folder is deleted, should the recovery process be altered? Do the odds of full recovery decrease?

I found that when I deleted an entire folder and performed the recovery process, I was able to recover the folder and emails completely, including the folder’s original name and location in the PST folder hierarchy.

What happens if the PST is compacted after deleting the items?

Compacting is used to reduce the size of a PST by clearing the deleted items out of the index after you delete them from view. If a user compacts the PST after performing the deletions, the odds of recovering any emails are greatly reduced.

In my testing, I found that performing a PST compaction after emails had been deleted, resulted in significantly lower recovery totals. Compacting the PST after soft-deleting the emails prevented all but one email from being recovered, and all but five when the emails had been deleted using the hard-deletion method.

When the compacting process is done after deleting a folder, the results were equally disappointing. The folder name was completely unrecoverable and only seven messages were recovered.

As demonstrated in this post, it is quite evident that there is a sufficiently quick method for potentially recovering emails from PST files. As a forensic investigator, I know that such information may be vital to a case. Though the question remains: Should this method be part of standard eDiscovery practice? Are the messages considered deleted, and therefore not a part of typical discovery, but a separate forensics project? 

Answers aside, it is important to keep in mind that this process will increase the total project costs because it takes more time to complete and requires the involvement of a forensics expert. Perhaps this only makes sense in situations where there is a belief that a custodian purposely—or mistakenly—deleted emails. If so, then the guidance of a forensic professional is certainly warranted.