As a Statistics major at UCLA, I typically use R programming to clean and manipulate my data before analysis. With OpenRefine, anyone can easily take advantage of similar tools to easily clean their data! OpenRefine has many simple to use, built in functions that make tricky tasks easy. As a side note, the NJShipwrecks dataset seems really neat! Hopefully it will be used again in the future.
Our group is utilizing the Scottish Witchcraft trial data. I will likely use OpenRefine to clean variables like Names and Counties that need to be separated with functions like Edit Columns. Edit Rows also is a simple task that is handled much better in OpenRefine than in R. Overall, I plan to use OpenRefine for cleaning prior to uploading to R.
I would love to see some of the more complicated functions made available through OpenRefine. For our data, we have a bunch of columns that could be condensed or combined to make better sense of the data. This seems like something OpenRefine may be equipped to handle. The one benefit of R over OpenRefine is that you are coding your own solutions, which allows you to do more detailed and complicated edits of the data you are working with.
Overall, this is a neat new program to add to our list of data resources!!
I wasn’t sure what our stats people would think of OpenRefine! I’m glad you liked it — it’s really quick and easy and accessible for everyone. Here are a couple of cool things you can do with OpenRefine: data reconciliation and calling web services.