{"id":1958,"date":"2017-10-31T23:07:22","date_gmt":"2017-11-01T06:07:22","guid":{"rendered":"http:\/\/miriamposner.com\/classes\/dh101f17\/?p=1958"},"modified":"2017-10-31T23:07:22","modified_gmt":"2017-11-01T06:07:22","slug":"blog-4-openrefine","status":"publish","type":"post","link":"http:\/\/miriamposner.com\/classes\/dh101f17\/2017\/10\/31\/blog-4-openrefine\/","title":{"rendered":"Blog #4: OpenRefine"},"content":{"rendered":"<p><span style=\"font-weight: 400\">My group\u2019s dataset is about classicists, their basic background information (such as date of birth and birth places), and information about their studies (such as main affiliated university and highest level of education obtained). <\/span><\/p>\n<p><span style=\"font-weight: 400\">One stark problem that OpenRefine made visible was that the dataset most likely did not have allowed vocabulary during the data collection process. As a result, many of the same items or facets are expressed in different ways. For instance, \u201cBoston University,\u201d is also expressed as \u201cBoston U\u201d or \u201cBoston U.\u201d (with the period). Proper facets for the main affiliated institutions would be useful in answering our research questions because we would then be able to measure accurately where the classicists were concentrated. This would inform us on any geographical segmentation of classicists. To solve this issue, OpenRefine\u2019s categorization by facet tool can be useful. Within that tool, I can combine these similar wordings that mean the same thing, into one. However, this tool alone will take an excrutiatingly long time. Alternatively, I can also use the cluster and trim whitespace functions to expedite the process. What I\u2019m struggling with is how to make sure I don\u2019t miss out on the numerous variations of how to express \u201cBoston University,\u201d even after the mentioned OpenRefine tools have been employed. In other words, I would be curious to know whether there is a way of verifying that the facets are all independent, without having to scroll through and eyeballing the list of facets themselves.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>My group\u2019s dataset is about classicists, their basic background information (such as date of birth and birth places), and information<\/p>\n","protected":false},"author":126,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_eb_attr":"","footnotes":""},"categories":[1],"tags":[],"class_list":["post-1958","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/posts\/1958","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/users\/126"}],"replies":[{"embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/comments?post=1958"}],"version-history":[{"count":0,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/posts\/1958\/revisions"}],"wp:attachment":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/media?parent=1958"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/categories?post=1958"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/tags?post=1958"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}