{"id":1902,"date":"2015-06-25T14:23:17","date_gmt":"2015-06-25T21:23:17","guid":{"rendered":"http:\/\/miriamposner.com\/blog\/?p=1902"},"modified":"2015-06-25T14:45:38","modified_gmt":"2015-06-25T21:45:38","slug":"humanities-data-a-necessary-contradiction","status":"publish","type":"post","link":"https:\/\/miriamposner.com\/blog\/humanities-data-a-necessary-contradiction\/","title":{"rendered":"Humanities Data: A Necessary Contradiction"},"content":{"rendered":"<p><em>This is a talk that I gave at the <a href=\"http:\/\/library.harvard.edu\/harvard-purdue-data\">Harvard Purdue Data Management Symposium<\/a> on June 17, 2015, in Cambridge, Massachusetts. The audience was mostly librarians and other data-management professionals. I was the only humanities person on the program, so I wanted to talk about the ways that humanists think about data differently from people in some other fields.<\/em><\/p>\n<figure id=\"attachment_1910\" aria-describedby=\"caption-attachment-1910\" style=\"width: 300px\" class=\"wp-caption alignleft\"><a href=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2015\/06\/digital-humanities.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-1910\" src=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2015\/06\/digital-humanities-300x111.jpg\" alt=\"Two mosaics beside each other. The one on the left is made up of largely cool, blue images; the one on the right is composed of warmer, earthier tones.\" width=\"300\" height=\"111\" srcset=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2015\/06\/digital-humanities-300x111.jpg 300w, https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2015\/06\/digital-humanities-1024x378.jpg 1024w, https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2015\/06\/digital-humanities-32x12.jpg 32w, https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2015\/06\/digital-humanities.jpg 1735w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-1910\" class=\"wp-caption-text\">Sometimes I start class discussions by comparing <a href=\"http:\/\/imagequilts.com\/\">image quilts<\/a> of Google searches for &#8220;digital&#8221; (left) and &#8220;humanities&#8221; (right).<\/figcaption><\/figure>\n<p>Today I&#8217;d like to\u00a0talk about the ways in which humanists think about data, and how that\u2019s distinct from the ways in which scientists and social scientists think about it.<\/p>\n<p>Even though I think our issues can be pretty different, I want to make the case that there are some very promising ways in which libraries could make meaningful interventions in the humanities research lifecycle, both for what we might call traditional humanists and for digital humanists. So I\u2019ll start with what \u201ctraditional\u201d humanists might need help with and then move on to the needs of what we call \u201cdigital humanists\u201d (although I think in practice the distinction is a bit blurred).<\/p>\n<p>I just want to say at the outset that there are people who specialize in humanities data curation, and I am not one of those people. A number of talented people, including <a href=\"http:\/\/trevormunoz.com\/\">Trevor Mu\u00f1oz<\/a> at the University of Maryland and <a href=\"http:\/\/gethelp.library.upenn.edu\/contact\/subjspec\/krawson.html\">Katie Rawson<\/a> at the University of Pennsylvania, have started to take a very programmatic look at the data-curation needs of digital humanists. And I encourage you to check out <a href=\"http:\/\/www.dhcuration.org\/\">their important work<\/a>. But you don\u2019t have Trevor or Katie; you have me! So what I can do is share my own perspective and experience on what it means to work with data as a humanist, and where libraries can help.<\/p>\n<p><!--more--><\/p>\n<p>I\u2019ll start with an anecdote, and I think that anyone who consults on digital humanities projects will be familiar with this scenario. Humanities scholars will sometimes describe elaborate visualizations to me, involving charts and graphs and change over time. \u201cGreat,\u201d I respond. \u201cLet\u2019s see your data.\u201d \u201cData?\u201d they say. \u201cOh, I don\u2019t have any data.\u201d<\/p>\n<p>This is not because we\u2019re stupid or na\u00efve; it\u2019s that humanists have a very different way of engaging with evidence than most scientists or even social scientists. And we have different ways of knowing things than people in other fields. We can know something to be true without being able to point to a dataset, as it\u2019s traditionally understood. We can know, to take just one example, that <a href=\"http:\/\/www.worldcat.org\/oclc\/38501674\">early silent film relied on the conventions of melodrama<\/a> to create legible narratives, not because we have a spreadsheet somewhere, but because we\u2019ve immersed ourselves so deeply in our source material that we\u2019re attuned to its nuances.<\/p>\n<p>That\u2019s why humanists sometimes think you can make a visualization without data; because they want to illustrate ideas and movement, not necessarily data points as we\u2019ve been discussing them here.<\/p>\n<figure id=\"attachment_1906\" aria-describedby=\"caption-attachment-1906\" style=\"width: 300px\" class=\"wp-caption alignleft\"><a href=\"http:\/\/lareviewofbooks.org\/essay\/literature-is-not-data-against-digital-humanities\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1906 size-medium\" src=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2015\/06\/Screen-Shot-2015-06-25-at-1.43.16-PM-300x214.png\" alt=\"Screenshot of LARB article called &quot;Literature is Not Data&quot;\" width=\"300\" height=\"214\" srcset=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2015\/06\/Screen-Shot-2015-06-25-at-1.43.16-PM-300x214.png 300w, https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2015\/06\/Screen-Shot-2015-06-25-at-1.43.16-PM-32x23.png 32w, https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2015\/06\/Screen-Shot-2015-06-25-at-1.43.16-PM.png 848w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-1906\" class=\"wp-caption-text\"><i>Los Angeles Review of Books<\/i>, October 28, 2012<\/figcaption><\/figure>\n<p>In fact, very few traditional humanists would call their source material \u201cdata.\u201d You may have seen <a href=\"http:\/\/lareviewofbooks.org\/essay\/literature-is-not-data-against-digital-humanities\">this piece<\/a> in the <em>LA Review of Books<\/em> in October 2012. While the language is pretty hyperbolic, I do think it helps to convey how uncongenial many humanists feel the notion of data is to the work that they actually do.<\/p>\n<p>When you call something data, you imply that it exists in discrete, fungible units; that it is computationally tractable; that its meaningful qualities can be enumerated in a finite list; that someone else performing the same operations on the same data will come up with the same results. This is not how humanists think of the material they work with.<\/p>\n<p>This is not a perfect analogy, but imagine that someone called your family photograph album a dataset. It\u2019s not <em>inaccurate<\/em> per se, but it suggests that this person just fundamentally doesn\u2019t understand why you value this artifact. And it\u2019s the same with humanists. With a source, like a film or a work of literature, you\u2019re not extracting features in order to analyze them; you\u2019re trying to dive into it, like a pool, and understand it from within.<\/p>\n<p>Let\u2019s take my silent film example again. It would be <em>possible<\/em> to enumerate all of the filmic conventions that recall the conventions of melodrama. Is there a villain? Is there a heroine? Are good and evil depicted in stark, black-and-white terms? You could even build a dataset like this and use it to show how film changed over time.<\/p>\n<figure id=\"attachment_1908\" aria-describedby=\"caption-attachment-1908\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2015\/06\/Screen-Shot-2015-06-25-at-1.46.06-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-1908\" src=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2015\/06\/Screen-Shot-2015-06-25-at-1.46.06-PM-300x159.png\" alt=\"A grid listing silent films, their dates, and melodramatic conventions.\" width=\"300\" height=\"159\" srcset=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2015\/06\/Screen-Shot-2015-06-25-at-1.46.06-PM-300x159.png 300w, https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2015\/06\/Screen-Shot-2015-06-25-at-1.46.06-PM-32x17.png 32w, https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2015\/06\/Screen-Shot-2015-06-25-at-1.46.06-PM.png 820w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-1908\" class=\"wp-caption-text\">My silent film dataset.<\/figcaption><\/figure>\n<p>But, seriously, who cares? There\u2019s just such a drastic difference between the richness of the actual film and the data we\u2019re able to capture about it.<\/p>\n<p><iframe loading=\"lazy\" title=\"The Lonedale Operator (1911) DW Griffith Biograph Silent Film\" width=\"500\" height=\"375\" src=\"https:\/\/www.youtube.com\/embed\/9iGos7nDTLs?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><\/p>\n<p>A dataset like this is so much less interesting than the trained judgment of someone who\u2019s seen many of these films and can turn a nuanced observation of these changes into a real argument.<\/p>\n<p>(Of course, the video itself constitutes data, but I\u2019ll get to that in a second.)<\/p>\n<p>And I would argue that the notion of reproducible research in the humanities just doesn\u2019t have much currency, the way it does in the sciences, because humanists tend to believe that the scholar\u2019s own subject position is inextricably linked to the scholarship she produces.<\/p>\n<p>However. Things are changing, in ways both obvious and not. All of our stuff is on our computers now \u2014 all of it, from books to movies to archival documents. This is why, more than anything else, I think digital humanities is here to stay. If you <em>can<\/em> analyze something computationally, I think it\u2019s going to be really hard to tell people that they <em>shouldn\u2019t<\/em>.<\/p>\n<p>This state of affairs has created some real problems for humanists, and, I would say, some real opportunities for libraries. If you speak to any historian who works in an archive, I <em>guarantee<\/em> you that they have hundreds, maybe even thousands of photos shot in an archive that look like this:<\/p>\n<figure id=\"attachment_1909\" aria-describedby=\"caption-attachment-1909\" style=\"width: 94px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2015\/06\/Screen-Shot-2015-06-25-at-1.49.06-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-1909\" src=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2015\/06\/Screen-Shot-2015-06-25-at-1.49.06-PM-94x300.png\" alt=\"An undifferentiated, baffling list of photos labeled &quot;2012-12-24 19:19:04.jpg&quot; and the like.\" width=\"94\" height=\"300\" srcset=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2015\/06\/Screen-Shot-2015-06-25-at-1.49.06-PM-94x300.png 94w, https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2015\/06\/Screen-Shot-2015-06-25-at-1.49.06-PM-12x37.png 12w, https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2015\/06\/Screen-Shot-2015-06-25-at-1.49.06-PM.png 226w\" sizes=\"auto, (max-width: 94px) 100vw, 94px\" \/><\/a><figcaption id=\"caption-attachment-1909\" class=\"wp-caption-text\">Behold, the historical dataset.<\/figcaption><\/figure>\n<p><strong>\u00a0<\/strong><\/p>\n<p>This is <em>it!<\/em> This is how historians are organizing hundreds of archival photographs!! The best-organized among them are trying to manage these irreplaceable source documents in iPhoto. And incidentally, this is a big problem for me, as someone who works on the history of lobotomy.<\/p>\n<p>It\u2019s not just historians who have a problem. Literature scholars, film scholars, everyone\u2019s dealing with lots of journal articles, video clips, and other sources, and are really struggling to organize them so that they can produce scholarship.<\/p>\n<p>So humanists \u2014 even those who aren\u2019t digital humanists \u2014 desperately need some help managing their stuff, and libraries are in a great position to help them. I do feel that this is an underexplored opportunity space for libraries.<\/p>\n<p>It\u2019s just that if you advertise that help as \u201cdata management,\u201d they\u2019ll have no idea you\u2019re trying to talk to them.<\/p>\n<p>I used to offer a <a href=\"https:\/\/miriamposner.com\/blog\/embarrassments-of-riches-managing-research-assets\/\">workshop<\/a> on \u201cmanaging research assets,\u201d and even that felt way too clinical to describe humanists\u2019 sources. But if you get a chance to look at the <a href=\"https:\/\/miriamposner.com\/blog\/embarrassments-of-riches-managing-research-assets\/\">blog post<\/a> that contains all the suggestions I used in the workshop, you\u2019ll see that we\u2019re cobbling together dozens of tools, none of which really do what we want them to do.<\/p>\n<p>So all of that is to say that even if they don\u2019t call their sources data, traditional humanists do have pretty pressing data-management needs. But the need becomes even greater when you\u2019re talking about people who consider themselves digital humanists \u2014 that is, people who use digital tools to explore humanities questions.<\/p>\n<p>In many ways, digital humanists will have similar data-management needs to scientists and social scientists \u2014 they\u2019ll have spreadsheets, images, and video, and will probably at least know what metadata is. In addition, the NEH Office of Digital Humanities, like the NSF and other funding agencies, now requires a data-management plan; so you will very soon encounter, I\u2019m sure, a humanist approaching you at the 11<sup>th<\/sup> hour with a request that you write their data-management plan for them.<\/p>\n<p>Just to give you a sense of the kinds of things humanists might do with structured data, I\u2019ll show you a <a href=\"http:\/\/dhbasecamp.humanities.ucla.edu\/gettydata\/\">project<\/a> that <a href=\"https:\/\/www.flickr.com\/photos\/101041253@N04\/18843735855\/\">my students<\/a> just completed, as part of a collaboration with the Getty Research Institute. The GRI maintains what seems to me really big data \u2014 about 1.5 million r<a href=\"http:\/\/www.getty.edu\/research\/tools\/provenance\/\">ecords relating to the transmission and sale of works of art<\/a>, which they call the provenance index. It\u2019s a really complex and baffling database \u2014 a really great case study in humanities data, actually \u2014 because all of its records are derived from historical documents themselves, and so are eccentric, disparate, and historically and geographically uneven. But because it\u2019s so big, you can do interesting, unexpected things with it.<\/p>\n<p>For example, one of my students got really interested <a href=\"http:\/\/dhbasecamp.humanities.ucla.edu\/gettydata\/frames\/\">not in the paintings but the frames themselves<\/a>, which are fairly understudied within art history. He gathered sales data for paintings sold between 1689 and 1787, and through a combination of text analysis and secondary reading, determined that two major factors made a frame valuable during this period: its beauty or its authenticity. With that information, he was able to show that there was indeed a market for frames described as \u201cauthentic\u201d during this 100-year period.<\/p>\n<p>So it\u2019s quantitative evidence that seems to show something, but it\u2019s the scholar\u2019s knowledge of the surrounding debates and historiography that give this data any meaning. It requires a lot of interpretive work.<\/p>\n<p>These are two relatively simple examples, but I think they do show a little bit about how digital humanists are tending to work with data.<\/p>\n<p>First, we often find ourselves in conflict with publishers because of the kind of work we want to do. I mentioned that my IP address had gotten blocked, but this is mild compared to what has happened to other scholars; I definitely know people who\u2019ve been threatened with lawsuits and the like for excessive downloading.<\/p>\n<p>Second, we\u2019re not generally creating data through experimentation or observation \u2014 more often than not, we\u2019re <em>mining<\/em> data from historical documents. You name it, we\u2019ve tried to mine it, from whaling logs to menus to telephone directories. This means that we tend to want different tools than scientists, and also that we have some interesting data-wrangling problems. More often than not, the categories that our historical sources used to divide up our data are not the same ones we\u2019re interested in analyzing. So we often have to do some very creative transformations and interpretation, as my student did with the frames data.<\/p>\n<p>Third, it\u2019s just awful trying to find a humanities dataset. There are various humanities data repositories or registries, but they\u2019re terribly limited. And right now we\u2019re starting to see museums and cultural institutions releasing their data, and there\u2019s just no way to know who\u2019s released what, unless you\u2019re the kind of person who stays on top of these things. So we urgently need some help locating these datasets, aggregating them, and perhaps even linking them.<\/p>\n<p>Fourth, we <em>do <\/em>need those web services Sayeed was describing yesterday, that are built on top of existing datasets. We are working a lot with APIs, and it\u2019s really insufficient for us to download one record at a time. And even for people who aren\u2019t going to work with APIs, if you could build visualizations of datasets on the fly, or even just access the data in quantity, it would be a big help.<\/p>\n<p>Fifth, we have a desperate need for help with data-modeling \u2014 and here is another place where I think libraries could really play a big role. This July, I\u2019ll be directing a summer institute on digital humanities and art history, and as I\u2019ve been reading through the participants\u2019 project ideas, I\u2019ve been struck by how often it seems that what the scholar <em>really<\/em> needs is data-modeling advice. For example: the art historian who wants to show how and when art objects traveled across the Indian Ocean and relate that movement to corresponding changes in artistic practice.<\/p>\n<p>What she really needs is a data model that can accommodate historical and artistic periods, geographic movement, and conventional time. Most humanities scholars are not trained to build these kinds of databases. But I think \u2014 I hope \u2014 that this is an area where the library could be a huge help.<\/p>\n<p>Finally, we may even need new kinds of data specifications, because the currently existing standards for describing time and space, for example, are actually really inadequate for our needs. To give one example, many standards for specifying dates require time calculated down to the exact day, and sometimes even the minute or second. But humanists tend to deal in words like circa, spans of time, or things like \u201cbefore\u201d or \u201cafter\u201d this event. Two technologists at Stanford actually took on this problem with a project called <a href=\"http:\/\/dh.stanford.edu\/topotime\/\">Topotime<\/a>. By specifying that certain characters represent things like uncertainty, contingency, or approximation, they\u2019ve shown how we could move from depicting time as a point or a line to a much broader canvas of shapes.<\/p>\n<p>Just as we need more nuanced data models for time, we find ourselves faced with a pretty limited palette of options for depicting important structures of power, like gender and race. Take the <a href=\"http:\/\/www.getty.edu\/research\/tools\/vocabularies\/ulan\/\">Union List of Artist Names<\/a>, which is an incredibly important resource that places like museums use to establish authorities \u2014 that is, to make sure they\u2019re all using the same name to refer to an artist, and to associate that name with other data about the artist. It\u2019s a tremendously important resource, and without it museums couldn\u2019t share and network information; we\u2019d never be able to figure out who holds what. But look how it <a href=\"http:\/\/www.getty.edu\/research\/tools\/vocabularies\/ulan\/about.html\">deals with gender<\/a>!<\/p>\n<p>Now, the fact that it captures gender is crucial \u2014 otherwise we wouldn\u2019t be able to say that women are underrepresented in a museum\u2019s collection \u2014 but no self-respecting humanities scholar would ever get away with such a crude representation of gender in traditional work.<\/p>\n<p>We find ourselves needing models for gender that can accommodate much more nuance than our current standards. For us, the proper mode of visualizing data may not be a pie chart; it may be a heat map.<\/p>\n<p>So I don\u2019t know about you, but I actually find these problems to be quite interesting and challenging: taking the datasets we\u2019ve been given \u2014 which were not at all created for our purposes \u2014 and working against their grain or reinventing them to try and tease out the things we think are really interesting.<\/p>\n<p>It requires some real soul-searching about what we think data actually is and its relationship to reality itself; where is it completely inadequate, and what about the world can be broken into pieces and turned into structured data? I think that\u2019s why digital humanities is so challenging and fun, because you\u2019re always holding in your head this tension between the power of computation and the inadequacy of data to truly represent reality.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is a talk that I gave at the Harvard Purdue Data Management Symposium on June 17, 2015, in Cambridge, Massachusetts. The audience was mostly librarians and other data-management professionals. I was the only humanities person on the program, so I wanted to talk about the ways that humanists think about data differently from people [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,21],"tags":[326,324,327,325],"class_list":["post-1902","post","type-post","status-publish","format-standard","hentry","category-digital-humanities","category-history-technology","tag-curation","tag-data","tag-data-management","tag-humanities"],"_links":{"self":[{"href":"https:\/\/miriamposner.com\/blog\/wp-json\/wp\/v2\/posts\/1902","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/miriamposner.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/miriamposner.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/miriamposner.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/miriamposner.com\/blog\/wp-json\/wp\/v2\/comments?post=1902"}],"version-history":[{"count":10,"href":"https:\/\/miriamposner.com\/blog\/wp-json\/wp\/v2\/posts\/1902\/revisions"}],"predecessor-version":[{"id":1916,"href":"https:\/\/miriamposner.com\/blog\/wp-json\/wp\/v2\/posts\/1902\/revisions\/1916"}],"wp:attachment":[{"href":"https:\/\/miriamposner.com\/blog\/wp-json\/wp\/v2\/media?parent=1902"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/miriamposner.com\/blog\/wp-json\/wp\/v2\/categories?post=1902"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/miriamposner.com\/blog\/wp-json\/wp\/v2\/tags?post=1902"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}