{"id":1987,"date":"2017-11-01T10:08:15","date_gmt":"2017-11-01T17:08:15","guid":{"rendered":"http:\/\/miriamposner.com\/classes\/dh101f17\/?p=1987"},"modified":"2017-11-01T10:08:15","modified_gmt":"2017-11-01T17:08:15","slug":"blog-post-4-exploring-and-using-openrefine","status":"publish","type":"post","link":"http:\/\/miriamposner.com\/classes\/dh101f17\/2017\/11\/01\/blog-post-4-exploring-and-using-openrefine\/","title":{"rendered":"Blog Post 4: Exploring and Using OpenRefine"},"content":{"rendered":"<p><span style=\"font-weight: 400\">OpenRefine will be useful for cleaning up our dataset for the Nixon White House Recordings. I personally do not find it super intuitive however the tutorial was very useful on how to use it. Since my data is extremely large (over 20,000 entries) due to the large scale of the entries there are bound to be some errors that our data set will have. Our data is split into many different sections:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Conversation title <\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Tape number <\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Conversation number <\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Identifier <\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Start date time <\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">End date time <\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Start date <\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">End date <\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Start time <\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">End time <\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">And many more&#8230;<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">It feels like our data set is too large and we are struggling to find the trends that we want to find due to the vastness of the data set. One idea that we had about dataset in order to shrink the amount of calls we have to deal with is to ignore all the calls that were to the white house operator and only include the calls where Nixon was actually speaking. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Though our data set is already set up by chronology it will be useful to use the isolating tool to separate out data that is less pertinent to our research question. The trends in phone calls are very interesting to us. The length of the conversation would also be an interesting thing to seperate getting rid of all the calls that only last one minute or less. In the split multi value column function on OpenRefine we could isolate the calls that are with Richard Nixon only thus limiting our data set and allowing for more clear results. <\/span><\/p>\n<p><span style=\"font-weight: 400\">OpenRefine will be useful in creating columns are lowercase and where I expect there will be a few errors due to the size of the data. The one thing that I know my group is struggling with is how to add another column that is the combination of the start time and the end time to get the length of the conversation. If anyone has figured this out on OpenRefine or some editing data site I would love to hear from you!<\/span><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>OpenRefine will be useful for cleaning up our dataset for the Nixon White House Recordings. I personally do not find<\/p>\n","protected":false},"author":160,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_eb_attr":"","footnotes":""},"categories":[1],"tags":[],"class_list":["post-1987","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/posts\/1987","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/users\/160"}],"replies":[{"embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/comments?post=1987"}],"version-history":[{"count":0,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/posts\/1987\/revisions"}],"wp:attachment":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/media?parent=1987"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/categories?post=1987"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/tags?post=1987"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}