
My group’s dataset is the Nixon White House Tapes, which I am both excited and anxious about. At first, our extremely large dataset seemed a bit overwhelming, as were hesitant on how to narrow it down to approximately 2,000 records. But after exploring OpenRefine using the sample data file, NJShipWrecks.csv, I like how the data can be broken down into digestible pieces. Two of our research questions ask, “What are the correlations between the tapes and historical/societal events of his presidency?” and “What are the connections between Nixon correspondents, and how do they relate to Watergate?” For these questions, OpenRefine could be extremely helpful for cleaning up spelling and capitalization inconsistencies. As a result, this will help with merging terms––or in this case, perhaps the names of correspondents––and making the data more accurate and condense to work with. Because of this, my favorite feature to experiment with on OpenRefine was the “Merge Selected and Re-Cluster” button in the Facet → Cluster section.
Since I am the data specialist for our group, OpenRefine has really calmed my nervousness about working with such a large dataset, especially for my first time working on a project such as this. I hope to do the data justice and create a final product that is valuable and visualizes this historical time period in a meaningful way.