On data-cleaning in general, see the School of Data’s “Introduction to Data-Cleaning” for a helpful overview.

Our tool of choice is OpenRefine, which is installed on the lab computers and is also a free download.

OpenRefine Tutorials

“Introduction to OpenRefine,” developed by Owen Stephens on behalf of the British Library

“Cleaning Data with OpenRefine,” by Seth van Hooland, Ruben Verborgh and Max De Wilde

Verborgh, Ruben, and Max De Wilde. Using OpenRefine: The Essential OpenRefine Guide That Takes You From Data Analysis and Error Fixing to Linking Your Dataset to the WebBirmingham, UK: Packt Publishing, 2013. (To get to the book, click on “EBSCO eBooks.”)

I like OpenRefine, but if you prefer, you can use Excel to do some data-cleaning. Lynda offers several video tutorials on using Excel to clean data. Search for “Cleaning Up Your Excel 2013 Data.”

Got the data bug? Want to try some programming? I recommend Automate the Boring Stuff with Python and “Cleaning OCR’d Text with Regular Expressions.”