Scaling up DH101

Over the last few years, enrollment in my Introduction to Digital Humanities class has been trending steadily upward, as has enrollment in the minor itself. Last spring, we had an unexpected surge in enrollment in the minor, and many of those students needed to take DH101 right away. We had to scramble a bit to accommodate everyone. After considering a few possibilities, we more or less doubled the size of our Intro class, from 45 to 88 students. We were fortunate to enlist an excellent T.A., Dustin O’Hara, to teach two sections, and my fabulous longtime co-conspirator, Francesca Albrezzi, took the other two. (We have lectures twice a week and section once a week.)

Even with the expanded class size, we had to turn lots of people away; I suspect we could fill another DH101 class in the spring, if we had the faculty bandwidth to teach it.

This was my first time teaching a true lecture course. In previous versions of DH101, I’ve been able to alternate between dispensing information and turning discussion over to the students. While we still had discussions in the larger DH101, I could no longer pretend this was a seminar.

I expected the large class size to be a challenge, but I think the bigger challenge was the classroom itself. We were lucky to find a room at all, given how late we transitioned to a larger class size, but we were stuck with a very conventional lecture hall, with bolted-down seats in immovable rows. It at least had modern AV equipment, but the room was a significant challenge. In my previous classroom, students’ seats were arranged in 10 or so group tables, so it was easy to alternate between hands-on work and all-eyes-up-front lecturing. Now we had no choice but to sit lecture-style.

I did what I could to ameliorate the situation. I was able to reserve the Young Research Library main conference room on a few occasions, which gave us a chance to work more collaboratively. And I did continue asking the students to check in, share work with each other, and discuss issues in small groups in the lecture hall. But the space just didn’t really lend itself to that kind of thing. This was a real bummer for me, and probably for the students, too.

The classroom arrangement actually set us back significantly in terms of technical skills, too. I wasn’t really comfortable asking students to learn technical stuff when I couldn’t circulate freely in the classroom to see how they were doing. I don’t think a lecture hall is a good environment for learning new skills on your computer, since it’s so easy to get stuck and have no way to signal for help without stopping the entire class. So technical tutorials had to be reserved for section, for the rare occasions when I could reserve the Library conference room, and for a few at-home lessons. As a result, I wasn’t able to teach the students as many skills as I have in years past.

I also struggled to check in with students as much as I’ve been used to doing. Their group project is always really challenging for them, and every project is very different. Since I’m the one who picks out the datasets, I usually like to work at least a little bit with every group. But with so many students, I had a hard time devoting attention to everyone. The result was more confusion about the assignment and expectations than in previous years, and a couple of group meltdowns. Everyone pushed through and got to the finish line, thanks in large part to the TAs’ hard work, but it was more stressful than it needed to be.

The students’ final project showcase this week reassured me that, yes, they did learn what I wanted them to, and, yes, they did learn how to do serious research and think critically about data. I loved hearing them explain what they did and how they overcame challenges, and I was really excited to hear their newfound confidence in discussing technical matters. Still, as always, it’s my errors that stand out to me.

If the class remains this size next year (and I’m still the one teaching it), there are a few things I’d do differently.

  • Rethink the final assignment. This is tough, because I’ve loved giving them “real” data, and I believe they benefit from the intense labor of making meaning from messy, incomplete, but important datasets. But I’m not sure it’s realistic for me to assemble and augment this many datasets every year. And I worry about the groups getting the attention they need to complete this very complex project when there are so many people to check in with. The alternative that makes sense to me is some kind of digital portfolio, in which students create their own examples of multiple kinds of digital work and surround it with critical commentary.
  • Undecorate the Christmas tree a little bit. As the years have gone by, I’ve tossed more and more assignments in the syllabus. I don’t think the class is more work, necessarily, but there are a lot of things to turn in and a lot of dates and assignments to remember. It’s too much. I think I could cut the blog post assignments down to just a few and simplify the final project a lot.
  • Think about asking students to complete technical modules at home. I usually like to be with students when they’re learning a new technical skill, but that wasn’t always possible. On a few occasions, I had students walk through (very carefully written) tutorials themselves at home, and they seemed to do OK. I think I could do more of this, as long as I’m cautious.
  • Get a different classroom! I don’t think we actually have a great classroom for a group this size at UCLA, but what I imagine would work well is a large room, with lots of space for my TAs and me to circulate, and multiple large tables where students can sit in groups. Multiple screens would be awesome, so that students could quickly draft and share work, but honestly, I’d happily take a large, empty room with tables and chairs, preferably one that we don’t have to set up and tear down every time (ugh).

Other miscellaneous thoughts about this year’s DH101:

  • As part of their annotated bibliography, each student needed to not only write a blurb about each of their sources, but actually obtain the book or article and submit a photo of themselves holding it. We called those “shelfies.” I’m just tired of reading book summaries that are obviously pulled from the snippets students could read on Google Books. This seemed to work really well. Students STRUGGLED to find their sources, as I expected, and waited too long, as I expected, but a number of students told us that this was the first time they’d located or checked out a book in their college career. As we did last year, we held a “research-a-thon” to help get them going on this, and while I made a mistake by holding the event during midterms week, the librarians and I were able to personally escort a number of students up to the stacks and help them read a call number.
  • Students took to network analysis more than they have in years past, perhaps because a number of them were simultaneously taking an SNA class in the Sociology department. I’m happy with the lesson plan I’ve developed to introduce network analysis, which uses a questionnaire about their favorite books, movies, and musicians to develop a homophilic network graph to show how they’re all connected. (I recorded last year’s network analysis lecture and you can see it here.)
  • For the last couple of years, it’s been clear that the hardest thing about the final assignment for my students is getting started — understanding what kind of work is necessary to start asking questions of a dataset, and how to alternate between secondary research and data analysis. The DataBasic suite really helps with this, but I think they could use step-by-step instructions to get started. Perhaps I’ll take that on at some point.
  • I just did not have the wherewithal (or the funding) to schedule a pizza-dinner hackathon, as I’ve done in previous years, but I found a simple alternative that they seemed to appreciate. I convened an evening meeting to which each group had to send at least one representative and checked in with each group that way. Then, at the same time every week, I invited each group to sign up for dedicated help with me. It worked well and allowed me to work intensively with a few groups.
  • You probably guessed this, but with a lecture this size, you need to make every announcement multiple times and send email followups, and even then, students will plead total ignorance.
  • For the last few years, I’ve started off the class with a reading from Hayden White, about the essential unknowability of history. This year I switched it up and had them read the first chapter of Michel-Rolph Trouillot’s Silencing the Past, in part because Trouillot explicitly deals with power and race in ways that White doesn’t. They really struggled to understand Trouillot, but it seemed to make an impression on them, too.
  • Of the DH projects we examined together, the one they all seemed to like the most was Gabriela Aceves Sepúlveda’s [Re]Activating Mama Pina’s Cookbook. I think they liked its consideration of the materiality of data, the questions about what “counts” as data, and the beautiful design. Also, partly because so many of my students are people of color themselves, they appreciate it when I can pull in projects from and about other people of color.

Data Packages for DH Beginners

The quarter is off and running again at lightning speed. At UCLA, we’re on the quarter system, and things move fast — just 10 weeks to get through all your material. I’m teaching DH101 again this year, and, as usual, it’s a race against the clock. The profile of my students changes a bit every year, but the typical student who enters my DH101 classroom has facility with Word, PowerPoint, maybe Excel, maybe some of the Adobe suite, but not a ton of other computer stuff. By the end of the quarter, my goal is to get them working with and thinking critically about structured data, data cleaning, data visualization, mapping, and web design.

I’ve written about this before: working in groups, my students are assigned a dataset at the beginning of the quarter. They learn how to work with it as the quarter progresses, doing a lot of secondary contextual research, interviewing an expert about it, manipulating the data, and finally building a website that makes a scholarly humanistic argument with the support of the data. You can see the mechanics of this on my course website.

People often ask me about the data I use, and indeed, that is a story in itself. I have 88 students this year, and since I don’t like any group to have more than seven people in it, I have 12 groups, each of which needs a dataset. (Really, some of them can share the same dataset; I don’t know why I get weird about this.) And they can’t just use any dataset. In fact, most of the data out there is inappropriate for them.

Here is what I look for in a dataset for my students:

  1. It has to be a CSV (or able to be wrangled into a CSV). My beginners want to be able to double-click on their dataset and see…something that they can work with. CSVs are great because they open in Excel, which is familiar to most students and allows them to immediately start doing things like filtering and simple manipulation. Plus, you can drop a CSV into almost any visualization tool. I can use a relational database, but I usually just give the students the spreadsheet that results from a query, since I just don’t have time in the quarter to teach them about more complicated data structures. Likewise, if a dataset is XML, I’ll just flatten it. But I prefer not to have to deal with this because, like I said, 12 datasets.
  2. Around 2,000 records is ideal. Here’s why: I want the dataset to be big enough that it’s too labor-intensive for the students to manipulate it by hand, but not so big that it breaks Excel. Really, I can work with bigger sets, too, but students do tend to get very anxious about working with datasets that big. Any number of fields is fine (actually more is better) because students understand fairly quickly that they can choose which fields to examine.
  3. It has to be…humanities-ish. You and I probably know that one could make a humanities argument about municipal water data, or public health information, but it takes a little bit of sophistication to get there. The most “natural” kind of analysis for these kinds of datasets would be urban planning or public health kinds of questions, and it’s too difficult for me to push students toward the kind of open-ended humanities questions I want them to pursue. It’s far easier if the data is about art, books, movies — subjects that are the traditional province of the humanities.
  4. It’s nice if it’s something they care about. I have confidence that my students will eventually become interested in any subject, once they really dig into it, but I can forestall a lot of grumbling if I can give them a dataset that’s immediately compelling to them. Things they like: fashion, food, performance, books from their youth, cartoons, comic books, TV, movies.

You can see this year’s datasets at the bottom of this page. I do not just give my students their datasets in raw form. I cut the sets down to an appropriate number of records, if necessary, and then I give them the dataset along with a “project brief,” which contains:

  1. Information about the provenance and composition of the data.
  2. The name and contact info of an expert on that subject who has agreed to allow my students to interview them.
  3. The names and contact info of librarians who can help them.
  4. The name and contact info of UCLA’s mapping specialist.
  5. Two or three secondary sources to get them going on their research. I also teach them how to citation-chain.

Here is an example of a “data package,” with the contact info removed.

If you’re thinking this is kind of an absurd amount of work for the instructor, you’re right. I really feel the students need this apparatus around their dataset, but I end up spending a good chunk of my summer hunting down data, persuading friends (and strangers) to serve as subject experts, and researching secondary sources.

Even with all of this scaffolding, students get very anxious about the project assignment, just because it’s so new to them. I’ve learned to expect it, to warn them that they’ll feel anxious about it, and to reassure them that if they’re hitting project milestones, they’ll get to the finish line on time, even if they feel at sea.

Sorry for the dashed-off blog post; I’ve been meaning to write about this for some time and finally had a few (just a few!) minutes!

New tutorials on network analysis with Cytoscape

The Cytoscape interface, featuring a pane on the left with buttons and a graph diagram on the right
I find the Cytoscape interface more intuitive than Gephi’s, although in both cases, you need to have a basic understanding of key NA terms.

For some reason I got it into my head to write a bunch of tutorials on using Cytoscape for network analysis. They’re now all up on Github. (I’ve been moving to Github for tutorials because they’re easier to update there.)

I started writing these for the students in my spring-quarter class and, even though the class is over, I’ve been adding to them compulsively. They’ll take you from zero to an interactive, web-based network graph, with stops along the way for projecting a two-mode network to a one-mode network and working with node attributes. (If you don’t know what any of that stuff means, they explain that, too.)

There’s a bit of a Gephi-versus-Cytoscape battle right now among people who do network analysis. I actually started out on Cytoscape, only because I found it slightly more intuitive, and switched to Gephi when I discovered most people used that. But in recent years, I’ve had a really hard time dealing with Gephi. First, there was the Legendary Java Problem, and although the new version is purportedly more stable, I actually just cannot get it to work on my Mac and have frankly kind of lost the will to keep trying.

Cytoscape is Fine. It’s designed for scientists, really, and other people who care very much about statistical measures of networks, which to be honest, I don’t really care that much about. (I don’t think most humanists trust these measures anyway, so I don’t see much point in hammering on them.) I find Cytoscape’s web service, CyNetShare, to be pretty janky-looking, but … you can interact with the network diagram, so that’s good, I guess.

To be honest, I’ve been slowly making the switch from Gephi/Cytoscape/etc. to R’s igraph package, and to D3 for displaying networks on the web, just because they’re so much nicer looking. One thing I like about Cytoscape is that after you’ve measured various aspects of your network, you can export JSON that’s set up specifically for D3’s popular force-layout network.

When I was visiting Stanford last winter, I got to see a preview of a network analysis tool that the Humanities + Design team is building, and I really liked the way they placed the emphasis on exploration and discovery, rather than statistical measures. I’ll be looking forward to seeing the release of that tool (I think it’s called Idiographic?), since I do feel that humanists have different interests when it comes to networks than scientists or social scientists.

Materials on Image-Mining for Medical History

Last week, I taught the image-mining portion of the Images and Texts in Medical History workshop at the National Library of Medicine. I am far from an expert on OpenCV, the open-source computer-vision library. But as usual, that didn’t stop me from attempting to teach it.

The materials I created for the workshop include detailed instructions on how to use OpenCV to extract images from scanned journal pages (using a script written by Chris Adams), as well as a detailed breakdown of how to use the Python OpenCV library to take the average color of an image. I’ve also included links to my favorite resources on OpenCV and computer-vision in general. (My experience has been that there are a lot of really terrible tutorials out there, so I’ve tried to link only to those that are actually helpful.)

Ben Schmidt taught the text-mining portion of the workshop, and his materials are really great. His handouts in particular are concise, opinionated rundowns of the strengths and weaknesses of various forms of text analysis.

In preparation for the workshop, Ben and I created a virtual machine, provisioned via Vagrant with all the dependencies and data the participants needed. If you’d like to install the VM, it has everything you need for both Ben and my portions of the workshop, and the instructions should be pretty clear. (The VM is based on one that Andrew Goldstone created for his Literary Data class.)

The process of getting the VM installed on participants’ own computers was … complicated. We learned many things about Vagrant and VirtualBox, including the fact that Windows 7 and 8 don’t come with any way to handle SSH.

It was definitely the most technically complex workshop either of us have attempted (to a group of about 50!!). It was definitely not hitch-free, but it was really satisfying to see participants get excited about computer vision, and to talk about ways they might use these techniques in their own research.

A better way to teach technical skills to a group

a stack of orange, blue, yellow, and pink post-it-notes
“Post-It Notes,” by Dean Hochman

My DH101 class this year was my biggest yet, with 45 undergrads. I suppose that’s not huge compared with many other classes, but DH101 is very hands-on. I am fortunate enough to have a TA, the awesome Francesca Albrezzi, who runs separate weekly labs. Still, I often have to teach students to do technical things in a large-group setting, and the size of the class this year prompted me to rethink how I do this.

As I see it, many of my students’ biggest problem with computers is their own anxiety. Obviously, I have a self-selecting group, since I teach a class with “digital” in the title, but even so, many of my students tell me that they are just “not technical.” Many of them are so convinced of this that they see any failure to get something to work as confirmation of what they already knew: they’re just not good with computers.

And since this is UCLA, the vast majority of my students do not fit the stereotype of the Silicon Valley programmer. This is awesome for the class, since we have so many different voices in the room. But it also puts many of my students at risk for stereotype threat, in which students’ performance suffers because they fear their mistakes will be seen as representative of their entire race or gender.

I’ve seen a version of this happen in workshops countless times. The instructor issues directions while students try to keep up at each step. Some students accomplish each step quickly, but some students take a little longer to find the right menu item or remember where they’ve saved a file. No matter how often you tell students to please interrupt or raise a hand if they need help, most students won’t do this. They don’t want to slow everyone else down with what they’re sure is a stupid question. Eventually, these students stop trying to follow along, and the workshop becomes, in their minds, further evidence that they’re not cut out for this.

Continue reading “A better way to teach technical skills to a group”

Rehabbing DH101

Someday I'll come up with a better way of illustrating blog posts than a Flickr search for "data," but in the meantime, here's "Untitled" by Karen Blaha.
Someday I’ll come up with a better way of illustrating blog posts than a Flickr search for “data,” but in the meantime, here’s “Untitled” by Karen Blaha.

I’m teaching Introduction to Digital Humanities for the third time this year, along with Francesca Albrezzi, my wonderful two-time teaching assistant, and I’m really enjoying it. It’s a challenging but rewarding class, with 45 students, a 10-week quarter, and a large number of moving parts. I reworked the syllabus quite a bit for this iteration, and I thought it might be useful to talk about what I’ve done differently and why.

As I’ve taught through the class a few times, its purpose and value have become more clear to me. My version of DH101 is about developing a humanistic attitude toward data. To me, that means the ability to hold in one’s mind simultaneously the value of any particular dataset and its inevitable poverty, compared with the phenomena it purports to describe. I want students to be able to “work” with data — that is, to analyze, visualize, and map it — but also to retain a perpetually critical, interrogative stance toward it.

In the service of this goal, I’ve completely rewritten the students’ final project assignment. The previous assignment, which I first inherited and then adapted, was for students to work in groups to build Omeka projects on topics of their choice. This had the benefit of exposing them to the demands and complexity of Dublin Core metadata, but I felt that the students were spending too much time describing objects and not enough time working directly with data. Since Omeka has no real export function, they weren’t able to do much with the metadata they were creating, besides build exhibits. Continue reading “Rehabbing DH101”

The (sort-of) selfies class

Room of boisterous students mugging for the camera
Class selfie! Lotta brilliance in this room!

Last winter I taught a class called Selfies, Snapchat, and Cyberbulles: Coming of Age Online. It was incredibly fun and rewarding, and I learned a ton. Mark Marino simultaneously taught a great class on selfies over at USC, and while we weren’t able to sync up our classes as much as we might have liked, we were able to have a joint Facebook group for them, which was really fun.

(Mark and I were able to teach our classes at all in large part because of the generosity of the scholars involved in the Selfies Research Network, to whom I owe a big debt of gratitude.)

Mark’s class generated a ton of publicity, and because he mentioned my own class, I rode Mark’s coattails a bit as we got mentioned in the New York Times, the LA Times, and elsewhere. Of course, Mark and I knew that the only reason our classes were getting any press was so that people could talk about how ridiculous a selfie class is. But it was still fun, and we tried to inject as much substance as we could into the conversation.

Meanwhile, the ever-awesome Liz Losh took the time to really dig into the substance of my class in this excellent post on the DML blog; I was really honored to be interviewed.

I got an interview request for another outlet, and since the article seems not to have seen the light of day, I thought I’d just post my responses to the interviewer’s questions here.

Incidentally, I don’t really take my own selfies, not because I disapprove of them, but because I’m really bad at it. Much respect to people who can do it well!

What enticed you to teach a class centered around the selfie?

The class wasn’t entirely centered around the selfie. It was about the experience of being a young adult in the digital age and, more broadly, how we should think about the relationship between technological and cultural change. I wanted to teach this class because I’ve heard a lot of generalizations about millennials, both in the media and from people I know, and I felt that many of these characterizations didn’t accurately reflect the complicated, diverse people I encounter in the classroom at UCLA. I wanted to submit those generalizations to rigorous scrutiny, to see whether they held up.

I also noticed that every time I mentioned social media or online culture in the classroom, students were really eager to chime in with their own experiences. I thought it would be fun and interesting for us to carefully study something they care so much about. I also have a sister who’s 21, so I felt a personal investment in countering some of the more pernicious stereotypes about young adults.

What insights and observations have you gained regarding the relationship between students and social media?

My students had a ton to say about social media and its relationship to youth culture. One thing I found most interesting is how worried they are about social media’s effects on their attention spans and relationships. That makes sense, since they’re hearing the same news stories and media messages about millennials that we are! But they’re thinking very hard about technology and social change; no one should assume that just because a young adult has her eyes on her phone, she’s not also self-aware and thoughtful.

Can you give an example of an assignment for the course?

Students’ main project was a digital ethnography (meaning an in-depth study of a particular culture) of an online community. I asked them to immerse themselves, and in some cases participate in, an online community of their choice. We had a couple Tinder papers, one on Yik Yak, and a few on Instagram. Students were surprised at how hard it was! We spent a long time talking about how to be an ethical, honest, and diligent participant-observer.

Based on what you’ve seen among students, are there specific aspects that constitute a typical selfie?

I think it really depends on context. Selfies can have different meanings, depending on who’s taking them and for what purpose, and often you’ll find people consciously imitating or exaggerating elements of the “typical selfie” for ironic effect. For example, many teenage girls will offer up an exaggerated “duckface” to the camera, in a conscious and ironic imitation of the “typical selfie.” Just as any portrait can, a selfie can mean many different things, and one has to be very alert to its context when one’s trying to suss out the meaning of any particular image.

Outside of classroom purposes, do you condone taking selfies? If yes, how do you justify a selfie as something more than an act of narcissism?

I don’t really think it’s my place to condone or not condone any form of participation in a visual culture. Community, as we all know, means a lot to people, and taking selfies is one way that some people participate in a community. I think we should also be very alert to what is connoted by the word “selfie.” As the term is popularly used, it’s closely associated with teenaged girls, who have frequently been the object of scathing ridicule in American culture. I think of selfie-opprobrium as somewhat akin to people’s annoyance at vocal fry: both phenomena are associated with teenage girls, and both suggest a degree of annoyance (perhaps even fear) at girls’ temerity in entering the public sphere.

What do you hope students carry away from the course?

On our last day of class, I asked students what they’d remember about what we learned. They all agreed that “It’s complicated!”  — which is also the title to danah boyd’s recent book, which we read in class. What boyd means, and what my students meant, is that you can’t assume that all online youth culture is one thing, or that every young person experiences life online in the same way. Phenomena that look very similar to outside observers can turn out, on closer inspection, to have very complicated and multilayered meanings. Young people — like all human beings — are complicated, diverse, and multifaceted. Sweeping generalizations about them are unhelpful and usually wrong.

They also said they’d remember our discussions about the need to “hustle,” by which we meant the reality of labor in the twenty-first century. Students carry unprecedented educational debt these days, even as the likelihood of them owning their own homes, or even attaining the same living standard as their parents, is lower than it has ever been. Steady jobs, the kind with pensions and benefit plans, are becoming increasingly rare, and students are facing the possibility of a future made up of freelance gigs and short-term contracts. It’s no wonder they feel compelled to create complex online identities. In an economic moment in which their online identities can determine their ability to earn a wage, it’s incumbent upon them to create charismatic online personae.

What’s Next: The Radical, Unrealized Potential of Digital Humanities

This is a lightly edited version of the keynote address I was honored to give at the Keystone Digital Humanities Conference at the University of Pennsylvania on July 22, 2015. Thank you to the organizing committee for inviting me!

My sincere thanks, too, to Lauren Klein and Roderic Crooks for their advice and feedback on this talk. I’d also like to acknowledge the huge intellectual debt I owe to David Kim and Johanna Drucker, with whom I’ve argued, negotiated, and formulated a lot of these ideas, mostly in the context of teaching together. David’s important dissertation, Archives, Models, and Methods for Critical Approaches to Identities: Representing Race and Ethnicity in the Digital Humanities (UCLA, 2015), takes on many of these issues at much greater length.

I gave the title of this talk to Dot Porter some time ago in a fit of ambition, and it’s seemed wildly hubristic to me ever since. But it’s something I care a lot about, and so tonight I’d like to outline some ideas about how digital humanities might critically investigate structures of power, like race and gender.

We are doing some of that now, as evidenced by some of the work at this conference, but I don’t think we’re doing it with the energy or the creativity that we might. I’ll argue that to truly engage in this kind of work would be so much more difficult and fascinating than we’re currently talking about for the future of DH; in fact, it would require dismantling and rebuilding much of the organizing logic, like the data models or databases, that underlies most our work.

So I’ll start by saying a little about where I think we are with digital humanities now, and also about some new directions, with respect to these structures of power, that I’d like to see the field go.

Continue reading “What’s Next: The Radical, Unrealized Potential of Digital Humanities”

The Case of the Missing Faces

Freeman operating on a patient with his partner, James Watts

As I’ve often mentioned,  I’ve been working for quite some time on a study of the photographs of Walter Freeman. Freeman, a Washington, D.C., based physician, was the world’s foremost lobotomist; it’s estimated that he lobotomized some 3,500 people.

He was also a prolific and dedicated photographer. He almost invariably took photos of his patients before and after the procedure, acquiring reams of these images over the course of his career. In a chapter of my book, Depth Perception, I argue that Freeman was participating in a much longer-standing tradition of psychiatric photography, one that claimed that the human face could reveal the depths of the soul. (You can see a recorded version of the story of Freeman’s photographs here.)
Continue reading “The Case of the Missing Faces”

Humanities Data: A Necessary Contradiction

This is a talk that I gave at the Harvard Purdue Data Management Symposium on June 17, 2015, in Cambridge, Massachusetts. The audience was mostly librarians and other data-management professionals. I was the only humanities person on the program, so I wanted to talk about the ways that humanists think about data differently from people in some other fields.

Two mosaics beside each other. The one on the left is made up of largely cool, blue images; the one on the right is composed of warmer, earthier tones.
Sometimes I start class discussions by comparing image quilts of Google searches for “digital” (left) and “humanities” (right).

Today I’d like to talk about the ways in which humanists think about data, and how that’s distinct from the ways in which scientists and social scientists think about it.

Even though I think our issues can be pretty different, I want to make the case that there are some very promising ways in which libraries could make meaningful interventions in the humanities research lifecycle, both for what we might call traditional humanists and for digital humanists. So I’ll start with what “traditional” humanists might need help with and then move on to the needs of what we call “digital humanists” (although I think in practice the distinction is a bit blurred).

I just want to say at the outset that there are people who specialize in humanities data curation, and I am not one of those people. A number of talented people, including Trevor Muñoz at the University of Maryland and Katie Rawson at the University of Pennsylvania, have started to take a very programmatic look at the data-curation needs of digital humanists. And I encourage you to check out their important work. But you don’t have Trevor or Katie; you have me! So what I can do is share my own perspective and experience on what it means to work with data as a humanist, and where libraries can help.

Continue reading “Humanities Data: A Necessary Contradiction”