OpenRefine: Contemporary Art at the Carnegie Museum of Art

How might you manipulate the data to be more useful in answering your research questions?

Our dataset contains empty or irrelevant columns and rows that can be eliminated; for example, based on the research question that our group is leaning towards, the “credit line” column may not be necessarily useful so it will likely be cut from the dataset altogether in order to save space. Largely, the manipulation of our dataset will be in the form of elimination and of grouping similar or closely-related entries.

What OpenRefine operations will you need to perform in order to do so?

The facet operation will definitely be useful in performing the “cleaning” of our dataset. Already, through the tutorial, I was able to combine multiple art entries that “cleaned up” the dataset. For instance, I randomly chose the “medium” column to practice using the facet tool and I noticed that there were multiple entries of “16mm to digital format”; however, they were all named slightly differently so instead of a grouping like “16mm to digital format (25)” there were 25 “16mm to digital format” entries (I combined them to one grouping). Likewise, adding characters to specific entries will enable the group to refine the dataset to better suit our research question.

What would you like to be able to do to your data that you’re not sure how to do?

At the moment I cannot come up with methods that I do not know how to do (through the OpenRefine tutorial) to further refine the data; however, I am sure that as my group’s research question becomes more specific over time I will have to learn new methods of refining the data to better answer the research question at hand.

Leave a Reply Cancel reply