Using OpenRefine

For this week’s post, we used OpenRefine to clean up some data regarding shipwrecks off the New Jersey coast. Immediately, it was clear that OpenRefine would be very useful for my group’s data set on the Carnegie Museum of Art’s Contemporary Art Collection. Some things I would like to do to the data is to take out some of the data types which are used more for administrative purposes. These columns have info for Carnegie’s own internal referencing system, but are not as useful for our narrative. By removing these data types, we would be able to tell a better story regarding the art pieces and their creators.

Some of the techniques I would need to use are trimming and merging and cluster techniques. These techniques would help clean our data, making it easier to work with. As far as things I would like to do but don’t know how, I think it’d be useful to know how to take out any entity that had an incomplete data type. In doing so, we would be sure that all the data we use would be complete, which would also make our analysis easier. It would also be helpful if we were able to visually see each of the artworks, but unfortunately that is not included in the dataset. 

One comment

  1. I really liked how concise your post was. It’s good that you listed both the pros and cons of the program regarding your specific dataset; I hope you find a way to work around the drawbacks!

Leave a Reply