First Day of OpenRefine

Having assigned the Osage Public Library dataset, my group and I did not know where to begin with our project. First, we did not know anything about Osage Public Library, and second, having more than 38,000 entries did not help with digesting the dataset. However, after encountering OpenRefine, I feel a lot more at ease.

For our project, I am in charge of data management, which includes cleaning our dataset. Previously, I had tried to use Excel in order to grasp a sense of the dataset. But, with thousands of entries, Excel kept on crashing, leaving me with nothing to work with. With OpenRefine, I can finally make sense of the dataset, and perhaps come up with more defined research questions for the project. Filtering out the target data will be a breeze with OpenRefine, as I can focus on single category such as dewey number, year published, or year acquired. I also noticed that some of the entries are not too accurate. For example, I have noticed that for one of the year acquired, 8923 was listed. OpenRefine can help me throw away these outliers and keep our analysis error free.

As of right now, I am still figuring out most of the functions of OpenRefine. I have not yet discovered how to group the data points into each decades, but I have a feeling that if I continue experimenting with it, I can find a way to do it. I believe that OpenRefine is, without a doubt, a valuable asset to our project.

Leave a Reply