{"id":1922,"date":"2017-11-01T09:38:33","date_gmt":"2017-11-01T16:38:33","guid":{"rendered":"http:\/\/miriamposner.com\/classes\/dh101f17\/?p=1922"},"modified":"2017-11-01T09:38:33","modified_gmt":"2017-11-01T16:38:33","slug":"trying-out-openrefine","status":"publish","type":"post","link":"http:\/\/miriamposner.com\/classes\/dh101f17\/2017\/11\/01\/trying-out-openrefine\/","title":{"rendered":"Trying Out OpenRefine"},"content":{"rendered":"<p><span style=\"font-weight: 400\">From poking around my dataset(s) before this assignment, I knew that there\u2019s a good amount of data cleaning to be done. My data on 19th century American children\u2019s book publishers is actually made up of multiple datasets, so I focused on the one of publishers and their addresses, with the following fields: name, start (date), end (date), street, city, state, and country.<\/span><\/p>\n<p><span style=\"font-weight: 400\">When I uploaded my data to OpenRefine and tried out the facet tool, I got a message that my data has too many different values for the facets to display. I think the most useful tool for me was the ability to sort may data, in order to find values that that stand out. For example, sorting the start year in ascending order revealed start dates like 1, 8, or 183&#8211;which I know, from context, are erroneous.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-1923 size-full\" src=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-31-at-6.54.02-PM.png\" alt=\"\" width=\"1672\" height=\"856\" srcset=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-31-at-6.54.02-PM.png 1672w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-31-at-6.54.02-PM-300x154.png 300w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-31-at-6.54.02-PM-768x393.png 768w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-31-at-6.54.02-PM-1024x524.png 1024w\" sizes=\"auto, (max-width: 1672px) 100vw, 1672px\" \/><\/p>\n<p><span style=\"font-weight: 400\">It\u2019s also helpful for finding blank cells, like entries with no dates listed. One possibility for this data would be a timeline of active book publishers; to do this, I would begin my removing rows with no date data. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Since this dataset is of American publishers, I was confused to see addresses in other countries listed. If i wanted to look at the locations and movement of publishes in the United States, I would remove the rows with addresses in other countries.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-1925 size-full\" src=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-31-at-6.56.34-PM.png\" alt=\"\" width=\"1904\" height=\"880\" srcset=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-31-at-6.56.34-PM.png 1904w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-31-at-6.56.34-PM-300x139.png 300w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-31-at-6.56.34-PM-768x355.png 768w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-31-at-6.56.34-PM-1024x473.png 1024w\" sizes=\"auto, (max-width: 1904px) 100vw, 1904px\" \/><\/p>\n<p><span style=\"font-weight: 400\">Something I would like to be able to do is merge 2 related datasets. My group\u2019s datasets have columns labelled RTID and HDID, with numerical data in the columns. These numbers don\u2019t have any meeting within their original datasets, but a look through the other datasets reveals that the numbers correspond to information. For example, a \u20181\u2019 in the RTID column corresponds to the role of \u2018Publisher\u2019.<\/span><\/p>\n<p>I&#8217;d like to learn more about the tools that are specific to data-processing tools like OpenRefine, because I feel like the functions I mentioned above could also be carried out in Excel, which I am more familiar with.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>From poking around my dataset(s) before this assignment, I knew that there\u2019s a good amount of data cleaning to be<\/p>\n","protected":false},"author":108,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_eb_attr":"","footnotes":""},"categories":[1],"tags":[],"class_list":["post-1922","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/posts\/1922","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/users\/108"}],"replies":[{"embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/comments?post=1922"}],"version-history":[{"count":0,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/posts\/1922\/revisions"}],"wp:attachment":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/media?parent=1922"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/categories?post=1922"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/tags?post=1922"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}