Exploring Open Refine!

Hello!

This week we are going to talk about a dataset about prisoners in Eastern State Penitentiary from 1830 and 1839 and the data program “Open Refine”. We received two datasets of about 500 prisoner records each. The records are divided into first name, last name, age, ethnicity/religion/occupation, prisoner number, admission date, sentencing location, offence, sentencing, number of convictions, column note, discharge note and description.

Open Refine is a program that can help you clean up the data. At first glance this program looks very complicated, but once you figure it out, it can be very helpful. It can easily clean up errors like misspelling, unnecessary spaces etc. The program can easily divide into categories merge different categories and it can help you sort the data by for example age, offence or sentencing. We can manipulate the data so we can compare the different categories easier in order to help us answering research questions. I would like to be able to split columns, so the “ethnicity/religion/occupation” column could be three different categories and we could easier see the correlations with especially ethnicity and other categories.

Leave a Reply