Open(ly)Refin(ing) – Introduction to Digital Humanities

As I was going through the OpenRefine tutorial, I didn’t quite understand exactly what the function of the program was until a few steps in. Then it clicked that this process of narrowing and reducing discrepancies between data entries would be invaluable for cleaning up the dataset as a whole. It would be much more manageable when creating data visualizations. I couldn’t help but think about when I tried to fit the whole dataset into RAW to create a data viz for the project – what came out was a garbled mess of text overlain on text – creating these black blobs of entries. By reducing the amount of entries, it would make sense that this would not only provide cleaner data visualizations, but also more accurate ones. I wonder how exactly these datasets are generated, perhaps they are programmed or input manually. I could see room for error and discrepancies resulting from both of these methods.

For the Nixon Tapes dataset, I could see how splitting multi-valued sets would be integral to separating Nixon from who he participated in the conversation with. This would give a category specifically for who was interacting with Nixon in the moment of recording, and provide for better network analysis data.

Leave a Reply Cancel reply