In dealing with the numerous amount of information in the graphic novels dataset, there are many aspects of it that can be manipulated to be better understood. After playing around with OpenRefine, I learned many ways in which this can successfully be done. For example, creating facets is a quick and easy way to organize a specific category by count, alphabetical order, and more. This way, the data set is condensed and can be more easily read to find trends and answer some of the research questions that we have. One research question we had was how geography had an effect on the content and popularity of the graphic novel. Through OpenRefine, we can create facets and sort categories, such as “mentions,” and compare the most popular novels with where they were published. In addition, I noticed that many categories have slight spelling and capitalization differences even though they are referring to the same thing. Therefore, I can use the skills I learned in the tutorial and clean up some of the data by merging and re-clustering.
At first glance, I noticed that much of our data is not easily quantifiable. I questioned how we would put things such as content and theme of a book into a data visualization. I would like to be able to figure out how to put things like this into a table and/or chart so the audience can easily see some of the conclusions we’ll come to.
Hi there,
My group is in a similar boat as you because we also have less data that is quantifiable or nominal data as we learned it is called in class. I would also like to learn how to put this information into a data visualization. One way to do this, might be to have a graph that relates the theme of each book to the date it was written.