
“Data visualization is the presentation of data in pictorial or graphical format. For centuries, people have depended on visual representations such as charts and maps to understand information more easily and quickly.” Data Visualization: What it is and why it is important.
Digitized archives and data visualizations are incredibly powerful tools. They give users the ability to make sense of large amounts of information—allowing them to form questions, make predictions, learn lessons and even plan future actions. As these new mediums of displaying information become more prominent, it is important to understand their limitations and drawbacks, as well as to understand how they are transferred from raw data to their meaningful digitized form. In my blog post last week I discussed in relatively more detail an approach that might be taken when learning from and interpreting data visualizations. Just as we are taught in our statistics classes, for example, to approach graphs and charts skeptically—to ensure they are not misleading or mistaken—we as scholars should also approach data visualizations with a bit of skepticism. In order to best utilize tools such as digital archives or data visualizations, we must first understand the process by which this information is transposed. A great overview of how data can be sorted and “networked” was given in the blog post Demystifying Networks. The author first recognizes that “humanities scholars are often dealing with the interactions of many types of things, and so the algorithms developed for traditional network studies are insufficient for the networks we often have.” He then goes to note “humanists also struggle with fitting square pegs in round holes. Humanistic data are almost by definition uncertain, open to interpretation, flexible, and not easily definable.” Not only is humanities data not easily transferrable in to a digital form, but many initial decisions must be made before the information is transformed—so that the digital form presents the information in a way that supports the author’s arguments or intentions. When interacting with live maps, timelines or online archives for example, we rarely consider the work and decisions that had to be made to publish those works. Lauren Klein, in The Image of Absence: Archival Silence, Data Visualization, and James Hemings, makes the point that “as scholars, we do not see the labor involved in transcribing manuscripts into machine-readable text, nor do we think of the discussions—equal parts technical and theoretical—that contribute to the development of the encoding standards and database design that allow us to perform our search queries”. We live and learn in an age of digitized information, and since digital form is relatively new, people don’t quite understand the inputs that make it possible. Because of these tools we can interact with data in a completely new way, we just need to educate ourselves on the inputs that are required for the tools as final products. We must understand these inputs, just as we must understand chart and graph standards in statistics, especially so we can look for and correct mistakes—with new, powerful tools like google refine.
The inputs we must understand include data networking and mapping, employing controlled vocabularies when entering information in to a database, and potential errors and bias that may be present in visualizations.
Although the video below is relatively dry it emphasizes the importance of controlled vocabularies, and again highlights the incredible amount of back work that must be completed for this tools to be usable.