OpenRefine Blog Post – Introduction to Digital Humanities

This week I explored OpenRefine and learned how to clean data to make the data more analyzable. It was very eye-opening to see how easy it is to clean data using OpenRefine and that data needed to be cleaned so much. I have never thought about how one extra space at the end may make the computer not recognize that it is the same as without that space. I will definitely be using OpenRefine in the future if I need to clean any data.

For my group’s dataset regarding people’s belongings in 1700 to 1800s in US, the data is pretty clean so there is not much cleaning we need to do. However, the full name is divided into first name and last name and since some people may have the same first names or same last names, it makes the pivot table look very odd since we are not trying to focus on the name aspect of this data. Thus, to make the pivot table look more aesthetically pleasing, we can combine the first name and last name into full name. We can use the OpenRefine operation of ‘ cells[“Column 1”].value + cells[“Column 2”].value ‘ .

Also, the data includes an item description of the specific item(s) and includes the number of items that fall into the category. It may be useful for us to see how many items does the person possess from the category so we would need to separate the column, which we learned how to do through the OpenRefine tutorial with split multivalue cell.

I’d like to be able to learn how separate a cell into multivalues if they are not consistent. For example, for the column about item descriptions the unit of measurement varies so separating the column would need to match it too.

Leave a Reply Cancel reply