OpenRefine + Okeh Cylinder Data

Using OpenRefine to reorganize and clean up my group’s data set will help us in answering our research questions. Clustering similar names for the “Description” column that describes the content of each cylinder recording will help us categorize them easier rather than reading whole descriptions to see if they say pretty much the same thing. There are also a few columns that state the same information such as “Main Talent” and “MainTalent Display”, “Primary Title” and “TitleSort”, and “FirstTakeDate” and “FirstTakeDateText”. I would remove the extra columns so we minimize the amount of repetitive data.

Once the data is cleaned up, I could examine facets of different columns. For example taking the “FirstTakeDateText” column and looking at the text facets. Within each year, I could see if there are any similarities in the “Type” or “Description” column and whether the title or actual recording reflects the culture, politics, or current events of the time period. I could also compare different years to see if there is any clear changes over time in terms of style of music and instruments being used. I would also want to figure out what is left out of this data so I can understand the usage of cylinder recording technology. A question: What differentiates the content in this data set from other music/recordings in the world? Hopefully as I play around more with OpenRefine and used the resources, I will find more ways to analyze the dataset.

Leave a Reply