Finding data

I’ve assembled a long list of datasets that are ready to use. If, however, you don’t find something that works for you, you can try to find other data to use for your final project.

Finding the right data can often be really challenging. There’s a lot of civic and scientific data out there, but that may not be what you want. For humanities projects, my favorite places to look for data are:

Data is Plural newsletter

Every week, Jeremy Singer-Vine, a data journalist, sends out a newsletter containing interesting datasets. They’re also gathered on this spreadsheet.

The Pudding

The Pudding is an online publication that publishes many stories based on datasets. Its collection of datasets contains many treasures, including the Internet Boy Band Database and data about the diversity of makeup shades. The challenge with these datasets is that you’ll need to find something new to say about them, beyond the story that’s already been published.

/r/datasets

This is a subreddit (a category within the discussion forum Reddit) where people can ask for and offer datasets. It contains a very wide variety of data.

Humanities data repositories

Humanities data repositories exist, but none of them is comprehensive, and I’ve had very patchy luck finding anything useful in them.

Alan Liu’s list of datasets

The literature scholar Alan Liu has a very large list of datasets and corpora for the humanities. Some of them may need a bit of finagling before they’re fit for purpose (e.g., they may be in a format other than CSV).

Finding the story in your data

This is as challenging as any technical exercise you’ll do for class. Here are some guides: