For my groups dataset on 19th century children’s book publishers, there can be a lot of information that can be removed. The dataset can definitely benefit from cleaning. With our datasets, we have the ability to manipulate the data in many ways. The way I see it most beneficial is to focus our data on only certain authors from the United States. In addition to this, we can also focus on only covering the data about publishers as opposed to all the extra types of roles there are in the dataset.
Some tools on OpenRefine my group can use to perform this is the facet and cluster tool. The main problem of this dataset is that there is too much information. This is all information that our group can hopefully clean up using OpenRefine operations. We can clean up blank cells and repetitions in our data.
There are a few things I would like to do to my data that I’m not sure how to do. First, I would like to merge data from two different datasets into one to make is simpler and more effective. My group has multiple datasets and some of them are connected to each other through certain types of information. Another thing I would like to do with my dataset is eliminate data repetitions of authors moving places of residence. For some, it is just moving down the street. It would be much more beneficial to just narrow it down and include only the major movements of an author.
The dataset that your group is working (19th century children’s book publishers) with seems really interesting! You mentioned that you would want to focus only on certain publishers and authors – how will your group determine these roles and could it potentially lead to slightly different trends if you filter out the “less important” authors? For example, when looking at overall trends in book publishing, one publisher might favor certain themes or genres more than others, but if you choose to focus on that one then it might lead to the conclusion that those themes were more popular at that time period when in reality there could be other genres too.