The Benefits of OpenRefine: Blog Post 4

Whenever handling a large amount of information, it can be difficult to categorize and clean up one’s data. After engaging in the OpenRefine tutorial, I am better equipped with tools to manipulate my group’s data set. In particular, OpenRefine would be a useful tool to comb through the many sentences listed for every recording, picking out key phrasing and word reoccurrences. By utilizing the merge and resorting feature, I would be able to pull out the names of the officials that president Nixon was talking with in the audio recordings. Having such a feature is useful in understanding the frequency with which the former president talked with these officials, lending a fresh perspective to the Watergate scandal.

Also, OpenRefine’s ability to select only certain columns and rows would allow our group to exclude information that is separate from our point of focus. For example, many of the recordings are simply a result of the white house operator transferring the call over to the president. Deleting these particular calls would help keep the focus only on Nixon and his conversation interactions. Finally, using the common manipulations option would allow for the data set spelling to become standardized through consistent capitalization. Consistent capitalization ensures that data that belongs together is kept together, rather than being simply separated due to spelling differences.

One feature that I would like to know how to do would be to calculate the difference between the values of two separate columns. The start and end time of the recordings are in separate columns, so in order to know the duration of each call, the start time needs to be subtracted from the end time. If there was a feature that allowed for these times to be converted from strings to numbers, and then calculated, that would be extremely helpful in know the length of Nixon conversations.

3 comments

  1. What a thorough analysis of your dataset! You seem to have already adapted OpenRefine for your data’s needs. I have to agree that this program really allows us to take a hammer to our data and knock it into place for analysis. I also can not wait to see what you discover from your dataset on Nixon, especially in regards to those deleted calls.

  2. Hi there,
    I found your blog post really helpful. I had a hard time understanding OpenRefine based off of the class tutorial, but after reading your blog post I feel that I have a better understanding of its tools. It was helpful that you gave an example of how you would use each tool, rather than just naming the tool. This made me think of how I could use those tools for my own dataset.

  3. Oh, that’s a really good question about how to calculate the length of the call from the start time and end time. Let me do some thinking and research!

Leave a Reply