{"id":573,"date":"2026-04-01T19:27:47","date_gmt":"2026-04-01T19:27:47","guid":{"rendered":"https:\/\/miriamposner.com\/classes\/dh150s26\/?page_id=573"},"modified":"2026-04-01T19:27:49","modified_gmt":"2026-04-01T19:27:49","slug":"milestone-a-identify-a-dataset","status":"publish","type":"page","link":"https:\/\/miriamposner.com\/classes\/dh150s26\/assignments\/data-storytelling-project\/milestone-a-identify-a-dataset\/","title":{"rendered":"Milestone A: Identify a Dataset"},"content":{"rendered":"\n<p><strong>Due: In class, Tuesday, April 28<\/strong><\/p>\n\n\n\n<p>If you&#8217;re going to tell a story with data, you need&#8230;data! For this milestone, you&#8217;ll identify and obtain the dataset you plan to use in your group&#8217;s data storytelling project.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What to choose<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Topic<\/h3>\n\n\n\n<p>You should find a dataset related to an issue of social justice you care about and want to explore in depth. Of course, seldom do we find a freely available Golden Dataset that embodies everything we want to know. So it&#8217;s likely that your process of selection will be a dialogue between what you <em>want<\/em> to do and what&#8217;s actually <em>practical<\/em>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Size and format<\/h3>\n\n\n\n<p>I do not have hard-and-fast guidelines about the size or format of the dataset you use, but I do have suggestions. You&#8217;ll want the dataset to be large enough that it makes sense to work with it programmatically (as opposed to by hand). The number of records can vary, but it&#8217;s especially good if your dataset has a lot of <em>attributes<\/em>, because that way you can see how different properties interact with each other.<\/p>\n\n\n\n<p>If you&#8217;re comfortable working with code, your dataset can be in any format. However, if you&#8217;re still figuring things out, it&#8217;s often helpful if your dataset is formatted as (or converted to) a CSV (AKA a spreadsheet). That way, you can open it in Excel and perform simple analyses. Plus, almost all visualization software can accept a CSV.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">APIs and other retrieval techniques<\/h3>\n\n\n\n<p>By April 28, you should have the data in-hand\u2014not just identified a source for it.<\/p>\n\n\n\n<p>If you&#8217;ve identified an API or some other kind of endpoint where you want to retrieve data, I&#8217;d like you to <em>actually<\/em> retrieve the data for this milestone. That is, you should have it saved somewhere as a file. That&#8217;s because it&#8217;s often harder than people think to work with a new API, and you don&#8217;t want to be surprised down the road. It&#8217;s OK if the dataset changes later; I just want to make sure you know how to obtain it.<\/p>\n\n\n\n<p>Similarly, if you&#8217;ve found a resource (e.g., a table in a book or a library database) from which the data you want can be scraped, transcribed, or obtained in some other way, do that work <em>before<\/em> the milestone is due. Again, that&#8217;s just so you aren&#8217;t surprised by unforeseen complications.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Where to look<\/h2>\n\n\n\n<p>This is where you&#8217;ll put on your Sherlock Holmes hat! As you might be aware, there&#8217;s no one repository where you can reliably find the data you want, so you&#8217;ll probably have to do a fair amount of Googling.<\/p>\n\n\n\n<p>Here are some suggestions.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>For <strong>data related to L.A.<\/strong>, try the <a href=\"https:\/\/data.lacity.org\/\">city<\/a> or <a href=\"https:\/\/data.lacounty.gov\/\">county<\/a> data portals. (For geospatial data, try the <a href=\"https:\/\/geohub.lacity.org\/\">L.A. City Geohub<\/a>.) There&#8217;s also a <a href=\"https:\/\/data.ca.gov\/\">state data portal<\/a> for California. L.A. City Controller Kenneth Mejia also has a <a href=\"https:\/\/controller.lacity.gov\/data\">data catalog<\/a>, and to be honest, I&#8217;m not sure if that data is also in the city portal or if it&#8217;s separate or what. The <a href=\"https:\/\/communityengagement.ucla.edu\/programs\/los-angeles-data-justice-hub\/\">L.A. Data Justice Hub<\/a> (here at UCLA) has gathered many useful datasets related to issues of social justice.<\/li>\n\n\n\n<li>The Carleton College Library has a very helpful<strong> <a href=\"https:\/\/gouldguides.carleton.edu\/c.php?g=146834&amp;p=964746\">guide<\/a> to locating data<\/strong>. One of their suggestions is to locate an academic paper on a topic of interest to you. In many disciplines, scholars publish both an essay and the data they used to construct their argument so that other scholars can replicate what they&#8217;ve found.<\/li>\n\n\n\n<li>Speaking of libraries: Did you know that the<strong> <a href=\"https:\/\/www.library.ucla.edu\/help\/services-resources\/data-services\">UCLA Library has a Data Science Center<\/a><\/strong>? Their consultants can work with you to locate data, analyze it\u2014any step of the process! It&#8217;s easy to get in touch with them and they&#8217;re very helpful.<\/li>\n\n\n\n<li>One of my very <strong>favorite places<\/strong> to look for unusual or interesting datasets is the weekly newsletter <a href=\"https:\/\/www.data-is-plural.com\/\">Data is Plural<\/a>. (If you don&#8217;t want to read through the newsletters, you can view the datasets on a <a href=\"https:\/\/docs.google.com\/spreadsheets\/d\/1wZhPLMCHKJvwOkP4juclhjFgqIY8fQFMemwKL2c64vk\/edit?gid=0#gid=0\">spreadsheet<\/a>.)<\/li>\n\n\n\n<li><strong>I collect data <a href=\"https:\/\/pinboard.in\/u:miriamposner\/t:humdata\">here<\/a>,<\/strong> although many datasets are more relevant to historical or literary analysis than social justice issues.<\/li>\n\n\n\n<li>If you&#8217;re interested in analyzing data related to <strong>immigration enforcement<\/strong>, Project Salt Box has a <a href=\"https:\/\/www.projectsaltbox.com\/p\/using-data-for-good-a-crash-course\">&#8220;crash course&#8221;<\/a> on how to get started. <a href=\"https:\/\/deportationdata.org\/index.html\">The Deportation Data Project<\/a> (partly based here at UCLA!) regularly releases relevant data. They&#8217;ve also published a <a href=\"https:\/\/www.californialawreview.org\/online\/immigration-enforcement-guide\">guide<\/a> to sources. Finally, <a href=\"https:\/\/austinkocher.substack.com\/\">Austin Kocher<\/a> has been regularly analyzing immigration-enforcement data and he has a useful <a href=\"https:\/\/austinkocher.substack.com\/p\/immigration-data-literacy-skills\">guide<\/a> to working with it.<\/li>\n\n\n\n<li>If you&#8217;re interested in <strong>criminal justice<\/strong>, the University of Cincinnati has published a useful <a href=\"https:\/\/guides.libraries.uc.edu\/c.php?g=222253&amp;p=1471273\">guide<\/a> to datasets.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Due: In class, Tuesday, April 28 If you&#8217;re going to tell a story with data, you need&#8230;data! For this milestone, you&#8217;ll identify and obtain the dataset you plan to use [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":491,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_eb_attr":"","_EventAllDay":false,"_EventTimezone":"","_EventStartDate":"","_EventEndDate":"","_EventStartDateUTC":"","_EventEndDateUTC":"","_EventShowMap":false,"_EventShowMapLink":false,"_EventURL":"","_EventCost":"","_EventCostDescription":"","_EventCurrencySymbol":"","_EventCurrencyCode":"","_EventCurrencyPosition":"","_EventDateTimeSeparator":"","_EventTimeRangeSeparator":"","_EventOrganizerID":[],"_EventVenueID":[],"_OrganizerEmail":"","_OrganizerPhone":"","_OrganizerWebsite":"","_VenueAddress":"","_VenueCity":"","_VenueCountry":"","_VenueProvince":"","_VenueState":"","_VenueZip":"","_VenuePhone":"","_VenueURL":"","_VenueStateProvince":"","_VenueLat":"","_VenueLng":"","_VenueShowMap":false,"_VenueShowMapLink":false,"footnotes":""},"class_list":["post-573","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/miriamposner.com\/classes\/dh150s26\/wp-json\/wp\/v2\/pages\/573","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/miriamposner.com\/classes\/dh150s26\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/miriamposner.com\/classes\/dh150s26\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/miriamposner.com\/classes\/dh150s26\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/miriamposner.com\/classes\/dh150s26\/wp-json\/wp\/v2\/comments?post=573"}],"version-history":[{"count":1,"href":"https:\/\/miriamposner.com\/classes\/dh150s26\/wp-json\/wp\/v2\/pages\/573\/revisions"}],"predecessor-version":[{"id":574,"href":"https:\/\/miriamposner.com\/classes\/dh150s26\/wp-json\/wp\/v2\/pages\/573\/revisions\/574"}],"up":[{"embeddable":true,"href":"https:\/\/miriamposner.com\/classes\/dh150s26\/wp-json\/wp\/v2\/pages\/491"}],"wp:attachment":[{"href":"https:\/\/miriamposner.com\/classes\/dh150s26\/wp-json\/wp\/v2\/media?parent=573"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}