On the D4 blog and other blogs penned by experts in the e-discovery and litigation support fields, predictive coding, TAR, CAR—all powered by predictive analytics—has gotten tremendous coverage, especially over the last year. Working in this field has also opened my eyes to the range of ways in which predictive and linguistic analytics are used.
On April 1, I caught this segment on health news on NPR’s All Things Considered, which posed an interesting question: Were people happier in the 1950s than they are today? Or, as we believe—and watching Mad Men certainly supports this—were they more repressed, uptight, and depressed? It’s an intriguing question.
A group of researchers set out to determine emotional states through an analysis of literature by exploring books from every year of the 20th century–over a billion words. NPR interviewed Alex Bentley, an anthropologist at the University of Bristol, to talk about the results of the analysis.
In 2010, Google had digitized about 4% of all books. Bentley and his colleagues at the University of Bristol decided to mine this Google database in order to track the use of words over time to see if certain words became more popular at identified points in history. Their computers analyzed 6 categories of emotions through all of the words they could identify that denotes the particular emotion: Sadness (115 words), joy (224), anger (146), disgust (30), surprise (41) and fear (92). The researchers initially believed that the evidence of these emotions—indicated by the use of the words exemplifying each category—would be relatively consistent over time. But what they found surprised them. They mapped all of their results onto a graph, with measurements of joy and sadness plotted on the Y-axis and the decades plotted on the X-axis. They saw that distinct peaks and valleys emerged—along the lines of key events of the 20th century. The 20s were the highest peak of joy. “They really were roaring,” says Bentley. Then, in 1941, at the beginning of World War II, the trend plunges dramatically into the sadness area of the graph, rising strongly thereafter and stabilizing during the 1960s and 70s.
What is interesting is that the books in the Google database were not only novels or non-fiction about current events, but also technical manuals and automotive repair guides—the entire kitchen sink of English-speaking writers and translators. Thus, says Bentley, “It’s not like the change in emotion is because people are writing about the Depression and people are writing about the war. There might be a little bit of that, but this is just, kind of, averaged over all books, and it’s just kind of creeping in.”
With the dramatic changes in culture during the 1960s and the advent of social media sites like Facebook where people can and do express anything, have the uses of these “emotion” words soared off the charts? No, and that is the most surprising part of the study. Instances of these words have declined throughout the 20th century and into the 21st. The exception? Fear-related words started to increase just before the 1980s.
HOW DOES IT APPLY TO LITIGATION?
All of this is fascinating, but what does it really tell us? And isn’t that the question that brings us back to the use of predictive analytics in litigation discovery? The Bristol Study appears to be a more objective and potentially accurate way—than self-reporting, for example—to gauge emotions. Similarly, with TAR/CAR/Predictive Coding, subject matter experts provide a baseline of relevance, then an algorithm, tweaked by human intervention, assesses the rest of the documents—with a degree of objectivity supplied by the predictive mathematics.