Blog Post 4: OpenRefine

Hello! Finally we have gotten to, what I think, is the interesting part of the class, which is learning about new programs and software! My group’s dataset is about prisoners in the Eastern State Penitentiary from 1830 to 1839. We have two large datasets with the records of 500 prisoners each. It’s super nice that openrefine helps us clean up the data because I feel like there is a lot of entries where miniscule things like uppercase v. lowercase or minor spelling discrepencies that is making the data less digestible. Our research questions were centered around the relations between things like their gender, age, race, religion, etc. so it will be much easier to compare different categories and analyze them quickly rather than having to sift through all the raw data. Being able to split columns is really great because we have a column that says “EthnicityReligionOccupation” so splitting them into their own distinct categories would help us tremendously in being able to compare the data and answer our research questions. One thing I would like to know if its possible to do is to sort and match similar descriptions of crimes through keywords and such.

Leave a Reply