{"id":1938,"date":"2017-10-31T22:20:24","date_gmt":"2017-11-01T05:20:24","guid":{"rendered":"http:\/\/miriamposner.com\/classes\/dh101f17\/?p=1938"},"modified":"2017-10-31T22:20:24","modified_gmt":"2017-11-01T05:20:24","slug":"openrefine","status":"publish","type":"post","link":"http:\/\/miriamposner.com\/classes\/dh101f17\/2017\/10\/31\/openrefine\/","title":{"rendered":"OpenRefine"},"content":{"rendered":"<p>My group\u2019s dataset is on Nixon and there is a lot of complex data to go through. A big part of our data analysis has been looking at the dates of his phone calls and the people he spoke to. Since our data has over a million rows, it has been challenging to comprehend all that is there. Especially since it is not 100% chronological. The dates go chronologically by release date. Within a specific release the tapes are chronological but as it goes from 1971-1973, it jumps back to 71 at the start of another release so this has been challenging in dividing the data. The OpenRefine Cluster and Merge feature would be exceptionally useful in matching data based on date. Then we can see all of the phone calls based on a singular date so we wouldn&#8217;t have to scroll through all of the data points to find the phone calls from April, 1972 for example.\u00a0I would also love to be able to easily cut rows of data using a certain point like date of phone calls. We are looking to only hone in on Watergate to make the data more manageable but since we are running into chronology issues, we need to first re-organize the data to be consistent with these dates.\u00a0Our goal is to highlight the data within the dates of Watergate to see how many observations that gives us and then cut it down more if need be. The best thing we could ask for at this point it to\u00a0be able to delete data before a certain date and after a certain date easily without having to navigate through millions of rows. This way, our data become more manageable and our research will become more focused to a specific time and event.<\/p>\n<p>It would also be useful to merge the data by person that Nixon spoke to on the phone as we are looking to create a network map of all Nixon correspondents. The Split Multi-Valued Cells feature would be good because there is a separation between Nixon and those he speaks to on the sheet. This way we can cluster the cells based on with whom Nixon spoke to detect trends.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>My group\u2019s dataset is on Nixon and there is a lot of complex data to go through. A big part<\/p>\n","protected":false},"author":114,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_eb_attr":"","footnotes":""},"categories":[1],"tags":[],"class_list":["post-1938","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/posts\/1938","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/users\/114"}],"replies":[{"embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/comments?post=1938"}],"version-history":[{"count":0,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/posts\/1938\/revisions"}],"wp:attachment":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/media?parent=1938"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/categories?post=1938"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/tags?post=1938"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}