Milestone A: Identify a Dataset


Due: In class, Tuesday, April 28

If you’re going to tell a story with data, you need…data! For this milestone, you’ll identify and obtain the dataset you plan to use in your group’s data storytelling project.

What to choose

Topic

You should find a dataset related to an issue of social justice you care about and want to explore in depth. Of course, seldom do we find a freely available Golden Dataset that embodies everything we want to know. So it’s likely that your process of selection will be a dialogue between what you want to do and what’s actually practical.

Size and format

I do not have hard-and-fast guidelines about the size or format of the dataset you use, but I do have suggestions. You’ll want the dataset to be large enough that it makes sense to work with it programmatically (as opposed to by hand). The number of records can vary, but it’s especially good if your dataset has a lot of attributes, because that way you can see how different properties interact with each other.

If you’re comfortable working with code, your dataset can be in any format. However, if you’re still figuring things out, it’s often helpful if your dataset is formatted as (or converted to) a CSV (AKA a spreadsheet). That way, you can open it in Excel and perform simple analyses. Plus, almost all visualization software can accept a CSV.

APIs and other retrieval techniques

By April 28, you should have the data in-hand—not just identified a source for it.

If you’ve identified an API or some other kind of endpoint where you want to retrieve data, I’d like you to actually retrieve the data for this milestone. That is, you should have it saved somewhere as a file. That’s because it’s often harder than people think to work with a new API, and you don’t want to be surprised down the road. It’s OK if the dataset changes later; I just want to make sure you know how to obtain it.

Similarly, if you’ve found a resource (e.g., a table in a book or a library database) from which the data you want can be scraped, transcribed, or obtained in some other way, do that work before the milestone is due. Again, that’s just so you aren’t surprised by unforeseen complications.

Where to look

This is where you’ll put on your Sherlock Holmes hat! As you might be aware, there’s no one repository where you can reliably find the data you want, so you’ll probably have to do a fair amount of Googling.

Here are some suggestions.

  • For data related to L.A., try the city or county data portals. (For geospatial data, try the L.A. City Geohub.) There’s also a state data portal for California. L.A. City Controller Kenneth Mejia also has a data catalog, and to be honest, I’m not sure if that data is also in the city portal or if it’s separate or what. The L.A. Data Justice Hub (here at UCLA) has gathered many useful datasets related to issues of social justice.
  • The Carleton College Library has a very helpful guide to locating data. One of their suggestions is to locate an academic paper on a topic of interest to you. In many disciplines, scholars publish both an essay and the data they used to construct their argument so that other scholars can replicate what they’ve found.
  • Speaking of libraries: Did you know that the UCLA Library has a Data Science Center? Their consultants can work with you to locate data, analyze it—any step of the process! It’s easy to get in touch with them and they’re very helpful.
  • One of my very favorite places to look for unusual or interesting datasets is the weekly newsletter Data is Plural. (If you don’t want to read through the newsletters, you can view the datasets on a spreadsheet.)
  • I collect data here, although many datasets are more relevant to historical or literary analysis than social justice issues.
  • If you’re interested in analyzing data related to immigration enforcement, Project Salt Box has a “crash course” on how to get started. The Deportation Data Project (partly based here at UCLA!) regularly releases relevant data. They’ve also published a guide to sources. Finally, Austin Kocher has been regularly analyzing immigration-enforcement data and he has a useful guide to working with it.
  • If you’re interested in criminal justice, the University of Cincinnati has published a useful guide to datasets.