My group’s dataset is on American Classicists and Archaeologists. After completing the tutorial on OpenRefine, I find some tools on there that can be useful in making our data easier to analyze. For instance, based upon our research questions, there may only be a few number of columns that we will focus on and analyze. The split column tool can break down the columns that we want look at to even more specific categories. This may allow us to perform deeper analysis on categories relevant to our research question. This tool can be applied to the “Birthplace” column of our dataset to create 2 separate columns of “Birth City” and “Birth State.” Some other tools that I would use are Text Facet, Merge and Cluster to clean through our data.
Since our main concern with our dataset is the significant amount of missing data, I would more so like to know what are some methods in which we can cut out certain data that doesn’t have sufficient information to contribute to our analysis and answering our research. However, we are not sure to what extent should we cut out data to avoid misrepresentation of the dataset. OpenRefine would also be helpful in revealing data that are insufficient.