OpenRefine & Data Manipulation

Data manipulation helps illuminate conclusions that may be hidden from an initial overview of a dataset. For datasets with thousands (or even millions) of points, however, this task can be easier said than done. Tools like OpenRefine help streamline the process of mass data manipulation, assisting the process of drawing conclusions within datasets.

This is done similarly to excel, except that instead of using individual data points on a sheet, we are using batches of data marked with a similar header. This first case is a great example of why it is important to organize and coordinate data with common-sense and efficiency. To help you do this, facets were created to link together similar pieces of data in structure.

This allows linking and manipulation that would take hours by hand. In the case of my group’s dataset on graphic novels, this could be used to aggregate different aspects of graphic novel details, as well as more concrete information like when and where a novel was produced.

In order to do this, we can create clusters of related data, merge and de-merge clusters, and transform datapoints to match one another. Overall, this will help my group paint a clearer picture of the data we want to present, and tailor our project to display exactly what we want it to.

In terms of manipulating data to do what we want it to do, I would love to be able to automatically group datapoints within a certain region, sorted by production date. This would help illuminate the exact evolution of publishers, and content within an area and better reflect what a graphic novel could say about the political or social climate at any given moment in time. This could then be compared to regions at the same time across the US, to see what changes between the publishers of this content.

One comment

  1. Really good question, re. how to group points in a similar region. I’m trying to think about whether there’s an automated way to do that. I’ll take a look at your data and do some thinking!

Leave a Reply