My data set documents the probate records, or inventories, of household objects of families living in York County, Pennsylvania in the 18th century. At first glance, my data set makes little sense. The columns have no relation to each other, and the old English doesn’t help with comprehension either. However, using OpenRefine can at least help clean up the messy data to make it easier to understand.
Although my group doesn’t know what each column means yet, we know that we are focusing our research questions on gender roles of the household objects. So, we can manipulate our data by deleting data that is unrelated to gender. Fixing capitalization by using facets, clearing white spaces, and merging clusters can also help clean our data.
In our data set, there is a column that lists the different household items. However, the items in each row differ and are not separated by commas. What I would like to do with this column is separate the items by commas or columns but cannot due to the varying amount of words for each object. Furthermore, clustering the names by gender would be something that would help answer our research questions but unfortunately, the OpenRefine technology is not advanced enough for this.