My group was given the dataset about what was in people’s homes in 1700s Pennsylvania. It’s a huge dataset with a seemingly infinite and ‘unsortable’ amount of data. Scrolling through the data initially, the day that we received the email from Professor Posner, I was overwhelmed – no doubt about that. Learning about Excel pivot tables in discussion Friday put me at ease a bit, but after a fifteen-minute tutorial in Open Refine I finally feel equipped to manipulate and conquer our dataset.
Holistically, Open Refine seems far more straightforward in its usage than excel does. That may simply be because it does more of the work for you…either way, I’m loving it. For our specific dataset, Open Refine could be utilized to clean up the content of each cell. Whether its trimming whitespaces or fixing small differences in capitalization, our data could use a little bit of a makeover. Moreover, I find the sorting tool, or “Faceting” option, on open refine far easier than the pivot table in excel, so it’s likely I’ll be using that.
Ideally, if Open Refine could search for seemingly “female names” and cluster them together so that we could view them separately from the males, that would be incredibly timesaving. However, that may be asking too much.
One comment
Leave a Reply
You must be logged in to post a comment.
I’m glad this program can help you out with organising your huge dataset. From your brief description it sounds like there definitely is alot of data to create an overwhelming topic. I hope your group can find the best way to sort through everything and answer your research questions!