OpenRefine seems like a very powerful tool for visualising large datasets and understanding different trends and patterns in it. After going through the tutorial, I now have a better understanding of how it can be utilised for examining the dataset for our own group project.
Our group will be researching the database of Scottish witches and witchcraft trials. We have two datasets – accused and cases. Both of them are rather large and so it would be useful to use OpenRefine to filter out some trivial information and clean up numbers (such as making all the column names lower case to avoid issues with things like “Occupation” and “occupation”. We still have many different variables and columns to explore and so we want to ensure that our dataset is valid and that we don’t have redundant values, false positives/negatives, or misleading patterns.
Unfortunately, there are a lot of null values and missing information and so OpenRefine could be used to help clean up column types and also recluster the data columns to give us more reasonable results. The split column could us look at the notes category – which might have drastically different entires for how an accused person was like before they got accused (and how smoothly the trial went depending on their reputation and occupation). We can also merge related categories together if there isn’t enough information in each one so that we can get a more holistic view of a group of individuals.
If there was some way to fill in some missing information such as economic status of the accused individuals, or create some sort of heat map with the locations of where the most witchcraft trials were (accounting for different geographical areas and borderlines from past years) that would help give us a better historical context. Reaching out to the people who curated the database could also help, since they may provide more information on specific categories that were not included in the overall dataset.