Exploring OpenRefine

Having completed the Open Refine tutorial there are many possibilities for the manipulation of the nineteenth century children’s book publishers dataset. As much of our data is centered around the addresses and locations of the publishers and tracking their movement, we could refine our data to only focus on publishing houses who moved across states rather than just those who moved small distances. We could also choose to focus only on publishing houses rather than the many roles which are included in the data such as ‘Binding Designer’ and ‘Binder’ for which there are only small amounts of data.

To achieve this we will have to use the cluster and facets tools to select what we want to focus on. Also to refine the locations of the publishers we can use the ‘split multi-valued columns’  tool to refine the addresses by state, and city. Also creating continuity within the cases and trailing white spacing of the columns will help to break down our over 5900 pieces of data into a more manageable dataset.

I would like to refine the data in order to track the movement of a single publisher, so that if one publisher has multiple locations the data doesn’t read it as more than one publisher but instead understands that the house has just moved. Therefore we would be able to track movement and mapping, rather than just statistics of how many exist in certain areas.

 

One comment

Leave a Reply