Working with OpenRefine
This week we had an opportunity to test out the OpenRefine software and learn about the various methods to cleaning up and rearranging data. We started by creating “facets” which allowed us to break down an individual column into another dataset. Through this facet, we were able to remove extra white spacing, merge and re-cluster overlapping data, create and separate data into separate columns, and add characters to certain records.

Our Data: Carnegie Museum of Contemporary Art
Group 15 is responsible for the Carnegie Museum of Contemporary Art dataset. I think our group will find it most helpful to separate some of our data in the “medium” section into multiple medium columns. Some pieces of art have multiple mediums, such as “watercolor and pencil on paper” versus “watercolor on paper.” This is also true for the classification of different pieces, as some are classified into more than one type of art. Using the “merge and re-cluster” functions will also be especially helpful for this.
Currently, we’re not able to load many of the images of the artwork, so it would be helpful to see if there was a way to manipulate the data to either view the actual artwork or to be able to analyze information about the art without actually seeing it. For the most part, however, our data is fairly clean already. We are currently on a mission to fill in missing holes within the data, but OpenRefine will really help us to better sort through and understand what we have.
Completely agree, we lucked out with a good dataset!
In my dataset, we also have the issue where some of the images of the artwork are not available or it won’t load. If we figure out a way to manipulate the data so that the images are more accessible/ reliable to open, we will let you know! The software is pretty DOPE. Great work!
I really liked your title and how enthusiastic you are about using the program. I hope sorting through your dataset goes smoothly!
I love the title of this blogpost. Your group is really lucky to have received the contemporary art dataset at CMOA!!! The museum currently has a show from Ian Cheng up that is kind of relevant to our class in the sense that it’s new media; it’s definitely worth a glance.
I really loved your enthusiasm about using OpenRefine! I feel the same way about the program and its use on my group project too. It is such a great tool to be able to organize data by categories. It makes the dataset extremely more manageable because when you organize it in a specific way, you are then able to more visually see all of the rows that populate from that type of organization.