{"id":1849,"date":"2017-10-30T01:51:06","date_gmt":"2017-10-30T08:51:06","guid":{"rendered":"http:\/\/miriamposner.com\/classes\/dh101f17\/?p=1849"},"modified":"2017-10-30T01:51:06","modified_gmt":"2017-10-30T08:51:06","slug":"open-refine-library-book-acquisition-in-osage-iowa-spencer-chau","status":"publish","type":"post","link":"http:\/\/miriamposner.com\/classes\/dh101f17\/2017\/10\/30\/open-refine-library-book-acquisition-in-osage-iowa-spencer-chau\/","title":{"rendered":"Open Refine: \u201cLibrary Book Acquisition in Osage, Iowa\u201d (Spencer Chau)"},"content":{"rendered":"<p>OpenRefine is an extremely helpful tool to better analyze datasets, especially when the dataset is huge in quantity, imperfect or disorganized.<\/p>\n<p>Our group\u2019s dataset is \u201cLibrary Book Acquisition in Osage, Iowa\u201d which consists of around 40,000 entries of books. Each book\u2019s information is subdivided into categories which include the title, publisher, ID number, publisher, year of publication, and year of acquisition etc. Throughout the dataset, however, there are also a lot of incomplete information, especially for data regarding age, race, and language. It is rather quick and easy utilizing OpenRefine to clean up data (i.e. merging same but differently-spelled categories), whitespaces as well as editing mass data all at once. The application also makes navigating the large dataset via categories much more convenient as it allows users to view data with certain similar characteristics only.<\/p>\n<p>One of our preliminary research questions is \u201cWhy is there an influx of autobiographies and biographies compared to other genres such as science and psychology books?\u201d It is very likely that there will be certain typing inconsistencies and spacing errors with the humungous 40k entries. With OpenRefine, it will be much easier for us to merge differently-spelled publisher, author, and book title into fewer categories such that we can have clearer categories to analyze our data with. In addition, being able to click through and look at data trends by only looking at certain category items (i.e. certain year, publisher), we can compare the trend of such influx in a much clearer and easier way, and these features can even potentially lead us to notice new and surprising findings.<\/p>\n<div class=\"grammarly-disable-indicator\"><\/div>\n","protected":false},"excerpt":{"rendered":"<p>OpenRefine is an extremely helpful tool to better analyze datasets, especially when the dataset is huge in quantity, imperfect or<\/p>\n","protected":false},"author":159,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_eb_attr":"","footnotes":""},"categories":[1],"tags":[],"class_list":["post-1849","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/posts\/1849","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/users\/159"}],"replies":[{"embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/comments?post=1849"}],"version-history":[{"count":0,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/posts\/1849\/revisions"}],"wp:attachment":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/media?parent=1849"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/categories?post=1849"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/tags?post=1849"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}