Using OpenRefine – Rebecca Tan

Our group received a dataset on what was in people’s homes in 1700s Pennsylvania. Our research questions focused mainly on the issue of gender, although we also had a question on which items were considered luxury items or more costly.

In answering our research question on which items were considered more luxury items, I could use the sort function to sort cell values (under the column Real Penn) as numbers, with largest first. This will then allow me to see the kind of items considered most monetarily valuable, as well as the categories they may fall into. For example, “bonded” items tended to be much more valuable because they referred to slaves, while random items like a bed and bolster had a value of 0 Real Penn.

Another one of our research questions was whether women owned greater quantities of domestic items and why. To start with, we could determine all the instances of women in the dataset. This could be started by creating a text facet for the first name column, and then only including those names which sound indubitably female (e.g. Anne, Betty). However, determining the instances of women is more easily done using Excel pivot tables. In Excel, I can create a new column with both first and last name, and then create a pivot table to better understand the number of individuals in the dataset and how many records each person has. In the original dataset, I can create a new column that assigns gender to these records. However, sometimes in the pivot table, names may appear twice due to white space. I can use OpenRefine to trim the leading and trailing whitespace. This can be done by going to “Edit cells,” choosing “Common transforms” and then clicking “Trim leading and trailing whitespace.”

I would then want to sum up the amount of domestic items owned per individual in the dataset. This could be done by summing up the values of columns such as sewing and laundry per individual. Unfortunately, however, I am not sure how to do a sum of values for each column in OpenRefine, and then combine the sums of all the values for each column relating to domestic goods. I also need to determine the traits that make up a domestic item, so that I can pull out certain items from the content types and create a new dataset.

Leave a Reply