OpenRefine- Coma Photography

In this week’s blog post we learned to look at dataset using OpenRefine, after completing the tutorial I got a better idea of how to manipulate data and organize them.

Our group explores the data set of Carnegie Museum of Arts Photography. Since our data set is large, I immediately notice that OpenRefine only display 10 records which I initially thought was a problem because it was unable to show my dataset. However, I later on learned that we are not supposed to work out data record by record, rather, we need to learn to group it into batches.

To answer our research question, I think the skill we learned to clean up the data will be extremely helpful. Since our dataset is large, it is difficult to check the information one by one, however, facet(text facet and cluster) allow us to clean up the data by editing the names and merging terms. This way, we can control the consistency of our dataset.

Unsure of some of the technique OpenRefine can perform, I went ahead and explore some of the other settings, including making timelines, scatter plot. I think these techniques will be very useful when we need to see the actual locations of death place for the artists in our dataset, and the relationship of the death and time. I am unsure how to sufficiently delete some less useful or not needed column, I believe one of the classmates also mentioned in their blog that it would be useful to learn how to delete column and information at once. I also wish to explore this setting a little deeper.  

Leave a Reply