{"id":1824,"date":"2017-10-30T13:53:38","date_gmt":"2017-10-30T20:53:38","guid":{"rendered":"http:\/\/miriamposner.com\/classes\/dh101f17\/?p=1824"},"modified":"2017-10-30T13:53:38","modified_gmt":"2017-10-30T20:53:38","slug":"using-openrefine-to-clean-classicists-dataset","status":"publish","type":"post","link":"http:\/\/miriamposner.com\/classes\/dh101f17\/2017\/10\/30\/using-openrefine-to-clean-classicists-dataset\/","title":{"rendered":"Using OpenRefine to Clean Classicists Dataset"},"content":{"rendered":"<p><span style=\"font-weight: 400\">Our dataset is about American Classicists &amp; Archaeologists. The research questions we have come up with are as follows:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Has the prevalence of female classicists increased over time? <\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Do the regions of granting institutions have any effect on a classicists\u2019 area of study? Has there been a shift in regional activity over time? <\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Is there a trend in areas of study based on classicists\u2019 affiliate institutions?<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">How did the refugee status affect the classicists\u2019 field of study?<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">How did the birthplace and origin of classicists affect their main affiliation and\/or work?<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400\">For the research questions we are working on above, there are following columns can be manipulated by OpenRefine so that data can be analyzed by other tools better: <\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Area of study &amp; Granting Institution &amp; Main Affiliation &amp; Refuge: Use \u2018Text Facet\u2019 to see if there are terms been grouped separately with the same meaning, and combine them together.<\/span>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">OpenRefine: <\/span>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">On the <\/span><b>Research area &amp; Granting Institution &amp; Main Affiliation &amp; Refuge <\/b><span style=\"font-weight: 400\">column, click the down arrow, then click <\/span><b>Edit cells<\/b><span style=\"font-weight: 400\">, then <\/span><b>Common transforms<\/b><span style=\"font-weight: 400\">. Finally, click <\/span><b>Trim leading and trailing whitespace<\/b><span style=\"font-weight: 400\">.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Click on the down-arrow right next to the <\/span><b>Research area <\/b><span style=\"font-weight: 400\">column heading. Then select <\/span><b>Facet<\/b><span style=\"font-weight: 400\">, and then <\/span><b>Text Facet<\/b><span style=\"font-weight: 400\">.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Click on <\/span><b>Cluster <\/b><span style=\"font-weight: 400\">to see if there is any terms have the same meaning, then edit the text to combine them together.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Double check if there is any left.<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-1854 aligncenter\" src=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-30-at-1.45.29-PM-252x300.png\" alt=\"\" width=\"252\" height=\"300\" srcset=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-30-at-1.45.29-PM-252x300.png 252w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-30-at-1.45.29-PM.png 580w\" sizes=\"auto, (max-width: 252px) 100vw, 252px\" \/> <img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-1853 aligncenter\" src=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-30-at-1.45.50-PM-300x176.png\" alt=\"\" width=\"300\" height=\"176\" srcset=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-30-at-1.45.50-PM-300x176.png 300w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-30-at-1.45.50-PM.png 592w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Birthplace: divided into two columns, one with \u2018Birth City\u2019 and one with \u2018Birth State\u2019.<\/span>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">OpenRefine: <\/span>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">On the <\/span><b>Birthplace <\/b><span style=\"font-weight: 400\">column, click the down arrow, then click <\/span><b>Edit cells<\/b><span style=\"font-weight: 400\">, then <\/span><b>Common transforms<\/b><span style=\"font-weight: 400\">. Finally, click <\/span><b>Trim leading and trailing whitespace<\/b><span style=\"font-weight: 400\">.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Click the down arrow next to the Birthplace, then <\/span><b>Edit columns<\/b><span style=\"font-weight: 400\">, and finally <\/span><b>Split multi-valued cells<\/b><span style=\"font-weight: 400\">. Enter a comma and space, since those are the two characters that lie between city and state. Then click <\/span><b>OK<\/b><span style=\"font-weight: 400\">.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Then click on the down arrow, <\/span><b>Edit column<\/b><span style=\"font-weight: 400\"> and then <\/span><b>Rename <\/b><span style=\"font-weight: 400\">two columns.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Click on the down-arrow right next to the <\/span><b>Refugee<\/b><span style=\"font-weight: 400\"> column heading. Then select <\/span><b>Facet<\/b><span style=\"font-weight: 400\">, and then <\/span><b>Text Facet<\/b><span style=\"font-weight: 400\">.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Click on <\/span><b>Cluster <\/b><span style=\"font-weight: 400\">to see if there is any terms have the same meaning, then edit the text to combine them together.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Double check if there is any left.<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1855 alignnone\" src=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-30-at-1.48.53-PM-52x300.png\" alt=\"\" width=\"75\" height=\"433\" srcset=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-30-at-1.48.53-PM-52x300.png 52w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-30-at-1.48.53-PM.png 174w\" sizes=\"auto, (max-width: 75px) 100vw, 75px\" \/> <img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1856 alignnone\" src=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-30-at-1.49.30-PM-300x155.png\" alt=\"\" width=\"517\" height=\"267\" srcset=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-30-at-1.49.30-PM-300x155.png 300w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-30-at-1.49.30-PM-768x397.png 768w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-30-at-1.49.30-PM-1024x529.png 1024w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/Screen-Shot-2017-10-30-at-1.49.30-PM.png 1266w\" sizes=\"auto, (max-width: 517px) 100vw, 517px\" \/><\/p>\n<p><span style=\"font-weight: 400\">Moreover, I am interested in how to \u2018sort\u2019 the data and when we gonna to use \u2018transpose\u2019 function. However, the most significant problem with our dataset is there is so many data are missing. We are going to talk with the subject-matter specialist as soon as possible.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Our dataset is about American Classicists &amp; Archaeologists. The research questions we have come up with are as follows: Has<\/p>\n","protected":false},"author":164,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_eb_attr":"","footnotes":""},"categories":[1],"tags":[],"class_list":["post-1824","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/posts\/1824","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/users\/164"}],"replies":[{"embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/comments?post=1824"}],"version-history":[{"count":0,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/posts\/1824\/revisions"}],"wp:attachment":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/media?parent=1824"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/categories?post=1824"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/tags?post=1824"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}