OpenRefine Cylinder Data

OpenRefine has huge potential for organizing our dataset on cylinders, which appears very clunky at first glance. Our dataset includes many repetitive columns such as FirstTakeDate and FirstTakeDateText, and MainTalent and MainTalentDisplay; these pairs of columns give essentially the exact same information. I will clean this up by renaming each title in the facet list so it becomes just one column, or Cluster and Merge them together.

Once this dataset becomes more visually pleasing through cell transformation, it is important to see how I can manipulate the data to more effectively answer my research questions. I will use the facet function and sort each heading by count to see the frequency of language, main talent, and date group of these recordings to find out how many times each appears. From this, I can discover which type of talent and in which language is most prominent in these cylinder recordings during a specific time period; this will allow me to better understand its historical relevance, cultural implications, and language prominence. I can also better understand who and what is getting left out by this data, and analyze the ontology. This can be done by putting this data set in context of the entire historical and cultural time period and see who may be represented heavily or underrepresented.

To make even more effective chronological analysis, something I would like to be able to do with my data that I’m not sure how to yet would be to create a timeline; to do so would help us better understand and visualize trends.

One comment

Leave a Reply