Homework 2: Due January 19

This dataset has some problems (at least according to conventional wisdom about data). Please download it and open it in OpenRefine. Then:

  • eliminate leading and trailing whitespace throughout.
  • standardize the county names (the very last column).
  • separate the applicant city and state names (in the column entitled “Applicant City”) into two columns.

Please see our OpenRefine guide if you need a refresher.

Submit the “cleaned” dataset as a CSV, under the appropriate assignment on CCLE.

Further reading

On the ontology and history of data

On how to visualize data

