I’ve assembled a long list of datasets that are ready to use. If, however, you don’t find something that works for you, you can try to find other data to use for your final project.
Finding the right data can often be really challenging. There’s a lot of civic and scientific data out there, but that may not be what you want. For humanities projects, my favorite places to look for data are:
Data is Plural newsletter
Every week, Jeremy Singer-Vine, a data journalist, sends out a newsletter containing interesting datasets. They’re also gathered on this spreadsheet.
The Pudding
The Pudding is an online publication that publishes many stories based on datasets. Its collection of datasets contains many treasures, including the Internet Boy Band Database and data about the diversity of makeup shades. The challenge with these datasets is that you’ll need to find something new to say about them, beyond the story that’s already been published.
/r/datasets
This is a subreddit (a category within the discussion forum Reddit) where people can ask for and offer datasets. It contains a very wide variety of data.
Humanities data repositories
Humanities data repositories exist, but none of them is comprehensive, and I’ve had very patchy luck finding anything useful in them.
Alan Liu’s list of datasets
The literature scholar Alan Liu has a very large list of datasets and corpora for the humanities. Some of them may need a bit of finagling before they’re fit for purpose (e.g., they may be in a format other than CSV).
Finding the story in your data
This is as challenging as any technical exercise you’ll do for class. Here are some guides: