OpenRefine is SO cool

Howdy Bloggers! I’m so excited we are finally getting into some technical work with our data and learning some new skills. This week we were asked to use the software, OpenRefine, a program that allows a user to easily clean up and manipulate to their own benefit, a large dataset. I, personally, have little to no experience working with programs as such, and found the tutorial Professor Posner had us go through extremely helpful! I really found it helpful because as I was looking through my dataset for our final project, I found many discrepancies in  the data and found myself thinking how useful it would be to not have to clean it up manually. Some of the issues I found were differences in the spelling of terms, or uppercase versus lowercase use of letters. With OpenRefine it will be much easier to locate all those errors and change them to a format we can use for our project. Another tool I think will be useful is the program’s capability to split columns. One of our category’s included three different observations, so if we can go in and separate those into their own columns it will be much more beneficial for us. I’m looking forward to reading the supplementary pieces on the software and hopefully will learn some more about different ways we can use it to our advantage. One thing I’m not sure how to do is locate key terms that are in different categories, but still come up multiple times throughout the data. It would be helpful if we could identify those terms and then extract that data and use it somehow. I bet there is a way to do it through OpenRefine, so I will keep exploring! Cheers to fun datasets and a Happy Halloween!!

3 comments

  1. I really enjoyed the excitement in your blog post! I can tell you are thrilled with picking up a new DH skill that is so viable for our studies and the work we do. Your suggestion for locating key terms in the data set definitely seems viable and super helpful. I would love to know all the knitty gritty functions and tools that OpenRefine offers for us. Who knows what we can do with it!?

    Keep up the excitement and Happy Halloween!

  2. Howdy! I share the same excitment with you when I get to play with the program. I agree it can be dreadful to look at dataset that are filled with inconsistent wordings, especially when the dataset is huge. I believe, by knowing more about how the program’s features, it will definitely enhance the data-analytical procress.

Leave a Reply