{"id":1764,"date":"2017-10-26T16:20:44","date_gmt":"2017-10-26T23:20:44","guid":{"rendered":"http:\/\/miriamposner.com\/classes\/dh101f17\/?page_id=1764"},"modified":"2017-10-26T16:20:44","modified_gmt":"2017-10-26T23:20:44","slug":"get-started-with-openrefine","status":"publish","type":"page","link":"http:\/\/miriamposner.com\/classes\/dh101f17\/tutorials-guides\/data-manipulation\/get-started-with-openrefine\/","title":{"rendered":"Get started with OpenRefine"},"content":{"rendered":"<p><span style=\"font-size: 1.125rem;\">DHers spend a TON of time cleaning and manipulating data. Luckily, there&#8217;s a tool that makes all of this easier. It&#8217;s called OpenRefine, and it&#8217;s free!<\/span><\/p>\n<div id=\"wrapper\">\n<div id=\"clarify-article-content\">\n<div class=\"clarify-article-description\">\n<p>This tutorial will walk you through some of the most common data-manipulation tasks you&#8217;ll need to perform. When you&#8217;re done, you should know how to:<\/p>\n<ul>\n<li>clean up spelling inconsistencies<\/li>\n<li>remove leading and trailing whitespace<\/li>\n<li>split cells into multiple columns<\/li>\n<\/ul>\n<p>If you&#8217;re using a computer that already has OpenRefine installed on it, you can skip Step One.<\/p>\n<p><strong>Before you get started<\/strong>, download <a href=\"https:\/\/www.dropbox.com\/s\/w8gz5oifkvh376q\/NJShipwrecks.csv?dl=0\">this file<\/a> somewhere onto your computer. It&#8217;s a sample data file called NJShipwrecks.csv.<\/p>\n<\/div>\n<div class=\"clarify-steps-container\">\n<div id=\"clarify-step-1\" class=\"clarify-step-container\">\n<h2 class=\"clarify-step-title\">1. Install OpenRefine<\/h2>\n<div class=\"clarify-step-instructions\">\n<p>Head to www.openrefine.org\/download and download OpenRefine 2.7 as you would any software. It&#8217;s available for both Windows and Mac.<\/p>\n<p>NOTE: If you&#8217;re on a Mac and, when you try to open OpenRefine, you get a message saying that you can&#8217;t open software from an unidentified developer, do the following: Go to <strong>System Preferences<\/strong>, then <strong>Security and Privacy<\/strong>. On the <strong>General<\/strong> tab, click the lock to make changes, and then click on <strong>Open Anyway. <\/strong>You should now be able open the software.<\/p>\n<\/div>\n<div class=\"clarify-step-image-wrapper\">\n<div class=\"clarify-step-image-container\"><a href=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/install-openrefine.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1765\" src=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/install-openrefine.png\" alt=\"\" width=\"806\" height=\"484\" srcset=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/install-openrefine.png 806w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/install-openrefine-300x180.png 300w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/install-openrefine-768x461.png 768w\" sizes=\"auto, (max-width: 806px) 100vw, 806px\" \/><\/a><img loading=\"lazy\" decoding=\"async\" class=\"clarify-step-image\" src=\".\/images\/docs\/get-started-with-openrefine\/install-openrefine.png\" alt=\"\" width=\"806\" height=\"484\" \/><span style=\"font-size: 1.125rem;\">\u00a0<\/span><\/div>\n<\/div>\n<\/div>\n<div class=\"rule\"><img decoding=\"async\" src=\"images\/ui\/rule.png\" alt=\"\" \/><\/div>\n<div id=\"clarify-step-2\" class=\"clarify-step-container\">\n<h2 class=\"clarify-step-title\">2. Open OpenRefine<\/h2>\n<div class=\"clarify-step-instructions\">\n<p>Double-click on the OpenRefine icon. It should open in your web browser. Occasionally, for whatever reason, OpenRefine doesn&#8217;t launch when you double-click it. If this happens to you, enter <strong>localhost:3333<\/strong> in your browser&#8217;s address bar and press return.<\/p>\n<\/div>\n<div class=\"clarify-step-image-wrapper\">\n<div class=\"clarify-step-image-container\"><img loading=\"lazy\" decoding=\"async\" class=\"clarify-step-image\" src=\".\/images\/docs\/get-started-with-openrefine\/open-openrefine.png\" alt=\"\" width=\"487\" height=\"257\" \/><a href=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/open-openrefine.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1766\" src=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/open-openrefine.png\" alt=\"\" width=\"487\" height=\"257\" srcset=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/open-openrefine.png 487w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/open-openrefine-300x158.png 300w\" sizes=\"auto, (max-width: 487px) 100vw, 487px\" \/><\/a><\/div>\n<\/div>\n<\/div>\n<div class=\"clarify-clear\"><\/div>\n<div class=\"rule\"><img decoding=\"async\" src=\"images\/ui\/rule.png\" alt=\"\" \/><\/div>\n<div id=\"clarify-step-3\" class=\"clarify-step-container\">\n<h2 class=\"clarify-step-title\">3. Open your data file<\/h2>\n<div class=\"clarify-step-instructions\">\n<p>Click on <strong>Create Project<\/strong> and then <strong>Choose Files<\/strong>. Navigate to the NJShipwrecks.csv file and then click <strong>Next<\/strong>.<\/p>\n<\/div>\n<div class=\"clarify-step-image-wrapper\">\n<div class=\"clarify-step-image-container\"><img loading=\"lazy\" decoding=\"async\" class=\"clarify-step-image\" src=\".\/images\/docs\/get-started-with-openrefine\/open-your-data-file.png\" alt=\"\" width=\"628\" height=\"389\" \/><a href=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/open-your-data-file.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1767\" src=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/open-your-data-file.png\" alt=\"\" width=\"628\" height=\"389\" srcset=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/open-your-data-file.png 628w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/open-your-data-file-300x186.png 300w\" sizes=\"auto, (max-width: 628px) 100vw, 628px\" \/><\/a><\/div>\n<\/div>\n<\/div>\n<div class=\"clarify-clear\"><\/div>\n<div class=\"rule\"><img decoding=\"async\" src=\"images\/ui\/rule.png\" alt=\"\" \/><\/div>\n<div id=\"clarify-step-4\" class=\"clarify-step-container\">\n<h2 class=\"clarify-step-title\">4. What the heck is this?<\/h2>\n<div class=\"clarify-step-instructions\">\n<p>This is just a preview of the way your data will look when you&#8217;re working with it in OpenRefine. You shouldn&#8217;t have to make any changes; just click on <strong>Create Project<\/strong>.<\/p>\n<\/div>\n<\/div>\n<div class=\"clarify-clear\">\u00a0<a href=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/what-the-heck-is-this-.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1768\" src=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/what-the-heck-is-this-.png\" alt=\"\" width=\"641\" height=\"308\" srcset=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/what-the-heck-is-this-.png 641w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/what-the-heck-is-this--300x144.png 300w\" sizes=\"auto, (max-width: 641px) 100vw, 641px\" \/><\/a><\/div>\n<div class=\"rule\"><img decoding=\"async\" src=\"images\/ui\/rule.png\" alt=\"\" \/><\/div>\n<div id=\"clarify-step-5\" class=\"clarify-step-container\">\n<h2 class=\"clarify-step-title\">5. What the heck is this (part 2)?<\/h2>\n<div class=\"clarify-step-instructions\">\n<p>This is the main interface you&#8217;ll use to work with your data. It sort of looks like Excel, but notice it shows you only 10 records at a time. That&#8217;s because you&#8217;re not supposed to be working with your data record by record; you&#8217;ll find ways to group it into batches and then work with it. We&#8217;ll try that next.<\/p>\n<\/div>\n<div class=\"clarify-step-image-wrapper\">\n<div class=\"clarify-step-image-container\"><img loading=\"lazy\" decoding=\"async\" class=\"clarify-step-image\" src=\".\/images\/docs\/get-started-with-openrefine\/what-the-heck-is-this--part-2--.png\" alt=\"\" width=\"671\" height=\"505\" \/><\/div>\n<\/div>\n<\/div>\n<div class=\"clarify-clear\"><a href=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/what-the-heck-is-this-part-2-.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1769\" src=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/what-the-heck-is-this-part-2-.png\" alt=\"\" width=\"671\" height=\"505\" srcset=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/what-the-heck-is-this-part-2-.png 671w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/what-the-heck-is-this-part-2--300x226.png 300w\" sizes=\"auto, (max-width: 671px) 100vw, 671px\" \/><\/a><\/div>\n<div class=\"rule\"><img decoding=\"async\" src=\"images\/ui\/rule.png\" alt=\"\" \/><\/div>\n<div id=\"clarify-step-6\" class=\"clarify-step-container\">\n<h2 class=\"clarify-step-title\">6. Create a facet<\/h2>\n<div class=\"clarify-step-instructions\">\n<p>In OpenRefine, a <strong>facet <\/strong>is a way to isolate certain records that share features. It&#8217;s easier to see what I mean when you try it yourself. Click on the down-arrow right next to the <strong>VESSEL TYPE<\/strong> column heading. Then select <strong>Facet<\/strong>, and then <strong>Text Facet<\/strong>.<\/p>\n<\/div>\n<div class=\"clarify-step-image-wrapper\">\n<div class=\"clarify-step-image-container\"><img loading=\"lazy\" decoding=\"async\" class=\"clarify-step-image\" src=\".\/images\/docs\/get-started-with-openrefine\/create-a-facet.png\" alt=\"\" width=\"601\" height=\"445\" \/><a href=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/create-a-facet.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1770\" src=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/create-a-facet.png\" alt=\"\" width=\"601\" height=\"445\" srcset=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/create-a-facet.png 601w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/create-a-facet-300x222.png 300w\" sizes=\"auto, (max-width: 601px) 100vw, 601px\" \/><\/a><\/div>\n<\/div>\n<\/div>\n<div class=\"clarify-clear\"><\/div>\n<div class=\"rule\"><img decoding=\"async\" src=\"images\/ui\/rule.png\" alt=\"\" \/><\/div>\n<div id=\"clarify-step-7\" class=\"clarify-step-container\">\n<h2 class=\"clarify-step-title\">7. Understanding facets<\/h2>\n<div class=\"clarify-step-instructions\">\n<p>Look at the VESSEL TYPE list that appears on the lefthand side of the OpenRefine window. Can you tell what&#8217;s going on there? OpenRefine&#8217;s facet function has grouped together every term that appears in the VESSEL TYPE column, along with how many times it appears.<\/p>\n<p>You can sort the list of terms alphabetically by name, or by count, according to how many times those terms appear on the list. If you click on one of the terms, only those rows that contain that term will be selected. This allows you to work on your data one chunk at a time.<img loading=\"lazy\" decoding=\"async\" class=\"clarify-step-image\" style=\"font-size: 1.125rem;\" src=\".\/images\/docs\/get-started-with-openrefine\/understanding-facets.png\" alt=\"\" width=\"343\" height=\"406\" \/><\/p>\n<\/div>\n<div class=\"clarify-step-image-wrapper\">\n<div class=\"clarify-step-image-container\"><a href=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/understanding-facets.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1772\" src=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/understanding-facets.png\" alt=\"\" width=\"343\" height=\"406\" srcset=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/understanding-facets.png 343w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/understanding-facets-253x300.png 253w\" sizes=\"auto, (max-width: 343px) 100vw, 343px\" \/><\/a><a href=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/clean-up-some-data.png\"><br \/>\n<\/a><\/div>\n<\/div>\n<\/div>\n<div class=\"clarify-clear\"><\/div>\n<div class=\"rule\"><img decoding=\"async\" src=\"images\/ui\/rule.png\" alt=\"\" \/><\/div>\n<div id=\"clarify-step-8\" class=\"clarify-step-container\">\n<h2 class=\"clarify-step-title\">8. Clean up some data<\/h2>\n<div class=\"clarify-step-instructions\">\n<p>Look closely at that list of terms. You&#8217;ll see that it includes two terms that are probably meant to be the same: <strong>Bark steamer<\/strong> and <strong>Bark Steamer<\/strong>. Even though a human can tell they&#8217;re meant to refer to the same thing, a computer doesn&#8217;t know that. So it&#8217;s important to clean up this data to create accurate visualizations and analyses.<\/p>\n<p>Hover over the <strong>Bark Steamer<\/strong> term in the facet list, so that you can see the <strong>Edit <\/strong>option. Press <strong>Edit<\/strong> and, in the box that appears, change <strong>Bark Steamer<\/strong> to <strong>Bark steamer<\/strong> and press <strong>Apply<\/strong>. Now the two terms will merge into one.<\/p>\n<\/div>\n<div class=\"clarify-step-image-wrapper\">\n<div class=\"clarify-step-image-container\"><img loading=\"lazy\" decoding=\"async\" class=\"clarify-step-image\" src=\".\/images\/docs\/get-started-with-openrefine\/clean-up-some-data.png\" alt=\"\" width=\"505\" height=\"254\" \/><\/div>\n<\/div>\n<\/div>\n<div class=\"clarify-clear\">\u00a0<a href=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/clean-up-some-data-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1773\" src=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/clean-up-some-data-1.png\" alt=\"\" width=\"505\" height=\"254\" srcset=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/clean-up-some-data-1.png 505w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/clean-up-some-data-1-300x151.png 300w\" sizes=\"auto, (max-width: 505px) 100vw, 505px\" \/><\/a><\/div>\n<div class=\"rule\"><img decoding=\"async\" src=\"images\/ui\/rule.png\" alt=\"\" \/><\/div>\n<div id=\"clarify-step-9\" class=\"clarify-step-container\">\n<h2 class=\"clarify-step-title\">9. Another way to clean up some data<\/h2>\n<div class=\"clarify-step-instructions\">\n<p>Look again at the <strong>Facet<\/strong> box. You&#8217;ll see a button marked <strong>Cluster. <\/strong>Click it.<\/p>\n<p>The resulting box shows you terms that OpenRefine thinks should be merged together. Check the boxes of the terms you think should be merged and then click <strong>Merge Selected and Re-Cluster<\/strong>.<\/p>\n<p>Now experiment with some of the other items on the <strong>Method<\/strong> dropdown menu. What happens when you try different methods? Each uses a different algorithm to try to match terms.<\/p>\n<p>When you&#8217;re finished experimenting, click <strong>Close<\/strong>. You&#8217;ll notice you have fewer terms in your facet list.<\/p>\n<\/div>\n<div class=\"clarify-step-image-wrapper\">\n<div class=\"clarify-step-image-container\"><img loading=\"lazy\" decoding=\"async\" class=\"clarify-step-image\" src=\".\/images\/docs\/get-started-with-openrefine\/another-way-to-clean-up-some-data.png\" alt=\"\" width=\"816\" height=\"652\" \/><\/div>\n<\/div>\n<\/div>\n<div class=\"clarify-clear\"><a href=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/another-way-to-clean-up-some-data.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1774\" src=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/another-way-to-clean-up-some-data.png\" alt=\"\" width=\"816\" height=\"652\" srcset=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/another-way-to-clean-up-some-data.png 816w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/another-way-to-clean-up-some-data-300x240.png 300w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/another-way-to-clean-up-some-data-768x614.png 768w\" sizes=\"auto, (max-width: 816px) 100vw, 816px\" \/><\/a><\/div>\n<div class=\"rule\"><img decoding=\"async\" src=\"images\/ui\/rule.png\" alt=\"\" \/><\/div>\n<div id=\"clarify-step-10\" class=\"clarify-step-container\">\n<h2 class=\"clarify-step-title\">10. Change the case of an entire column<\/h2>\n<div class=\"clarify-step-instructions\">\n<p>A lot of the problems with the data in the <strong>VESSEL TYPE<\/strong> were the result of variant cases (e.g., <strong>Pilot schooner<\/strong> versus <strong>Pilot Schooner<\/strong>). One way to eliminate these problems would be to make all of the terms lowercase. Let&#8217;s do that now.<\/p>\n<p>Click on the down arrow next to <strong>VESSEL TYPE<\/strong>. From the dropdown menu, click <strong>Edit cells<\/strong>, and then <strong>Common transforms<\/strong>. Finally, select <strong>To lowercase<\/strong>. Voila! All the vessel types are now lowercase.<\/p>\n<\/div>\n<div class=\"clarify-step-image-wrapper\">\n<div class=\"clarify-step-image-container\"><img loading=\"lazy\" decoding=\"async\" class=\"clarify-step-image\" src=\".\/images\/docs\/get-started-with-openrefine\/change-the-case-of-an-entire-column.png\" alt=\"\" width=\"638\" height=\"539\" \/><a href=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/change-the-case-of-an-entire-column.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1775\" src=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/change-the-case-of-an-entire-column.png\" alt=\"\" width=\"638\" height=\"539\" srcset=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/change-the-case-of-an-entire-column.png 638w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/change-the-case-of-an-entire-column-300x253.png 300w\" sizes=\"auto, (max-width: 638px) 100vw, 638px\" \/><\/a><\/div>\n<\/div>\n<\/div>\n<div class=\"clarify-clear\"><\/div>\n<div class=\"rule\"><img decoding=\"async\" src=\"images\/ui\/rule.png\" alt=\"\" \/><\/div>\n<div id=\"clarify-step-11\" class=\"clarify-step-container\">\n<h2 class=\"clarify-step-title\">11. Get rid of extra whitespace<\/h2>\n<div class=\"clarify-step-instructions\">\n<p>One common problem with data is extra spaces before and after the values. Those are easy to get rid of with OpenRefine. On the <strong>Year Built<\/strong> column, click the down arrow, then click <strong>Edit cells<\/strong>, then <strong>Common transforms<\/strong>. Finally, click <strong>Trim leading and trailing whitespace<\/strong>. Much better!<\/p>\n<\/div>\n<div class=\"clarify-step-image-wrapper\">\n<div class=\"clarify-step-image-container\"><img loading=\"lazy\" decoding=\"async\" class=\"clarify-step-image\" src=\".\/images\/docs\/get-started-with-openrefine\/get-rid-of-extra-whitespace.png\" alt=\"\" width=\"574\" height=\"536\" \/><a href=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/get-rid-of-extra-whitespace.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1776\" src=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/get-rid-of-extra-whitespace.png\" alt=\"\" width=\"574\" height=\"536\" srcset=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/get-rid-of-extra-whitespace.png 574w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/get-rid-of-extra-whitespace-300x280.png 300w\" sizes=\"auto, (max-width: 574px) 100vw, 574px\" \/><\/a><\/div>\n<\/div>\n<\/div>\n<div class=\"clarify-clear\"><\/div>\n<div class=\"rule\"><img decoding=\"async\" src=\"images\/ui\/rule.png\" alt=\"\" \/><\/div>\n<div id=\"clarify-step-12\" class=\"clarify-step-container\">\n<h2 class=\"clarify-step-title\">12. Split multi-valued columns<\/h2>\n<div class=\"clarify-step-instructions\">\n<p>Several of our columns contain location, formatted as City, State. But let&#8217;s say we want states to appear in their own column. That&#8217;s easy to do with OpenRefine.<\/p>\n<p>Scroll to the <strong>Departure Point<\/strong> column. Click the down arrow, then <strong>Edit columns<\/strong>, and finally <strong>Split multi-valued cells<\/strong>. The popup window asks which separator currently separates the values. Enter a comma and a space, since those are the two characters that lie between city and state. Then click <strong>OK<\/strong>.<\/p>\n<p>You now have two columns! You can rename them by clicking on the down arrow, then <strong>Edit column<\/strong> and then <strong>Rename<\/strong>.<\/p>\n<\/div>\n<div class=\"clarify-step-image-wrapper\">\n<div class=\"clarify-step-image-container\"><img loading=\"lazy\" decoding=\"async\" class=\"clarify-step-image\" src=\".\/images\/docs\/get-started-with-openrefine\/split-multi-valued-columns.png\" alt=\"\" width=\"584\" height=\"372\" \/><\/div>\n<\/div>\n<\/div>\n<div class=\"clarify-clear\">\u00a0<a href=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/split-multi-valued-columns.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1777\" src=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/split-multi-valued-columns.png\" alt=\"\" width=\"584\" height=\"372\" srcset=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/split-multi-valued-columns.png 584w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/split-multi-valued-columns-300x191.png 300w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/split-multi-valued-columns-360x230.png 360w\" sizes=\"auto, (max-width: 584px) 100vw, 584px\" \/><\/a><\/div>\n<div class=\"rule\"><img decoding=\"async\" src=\"images\/ui\/rule.png\" alt=\"\" \/><\/div>\n<div id=\"clarify-step-13\" class=\"clarify-step-container\">\n<h2 class=\"clarify-step-title\">13. Undo an action<\/h2>\n<div class=\"clarify-step-instructions\">\n<p>If you make a mistake in OpenRefine, no worries! It&#8217;s easy to undo. Just click on the <strong>Undo\/Redo<\/strong> link on the lefthand side of the screen. Then click on the next-to-last step in the list. Your last action will be reversed. If you change your mind about redoing it, you can just click the last step.<\/p>\n<\/div>\n<div class=\"clarify-step-image-wrapper\">\n<div class=\"clarify-step-image-container\"><img loading=\"lazy\" decoding=\"async\" class=\"clarify-step-image\" src=\".\/images\/docs\/get-started-with-openrefine\/undo-an-action.png\" alt=\"\" width=\"361\" height=\"486\" \/><\/div>\n<\/div>\n<\/div>\n<div class=\"clarify-clear\">\u00a0<a href=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/undo-an-action.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1778\" src=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/undo-an-action.png\" alt=\"\" width=\"361\" height=\"486\" srcset=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/undo-an-action.png 361w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/undo-an-action-223x300.png 223w\" sizes=\"auto, (max-width: 361px) 100vw, 361px\" \/><\/a><\/div>\n<div class=\"rule\"><img decoding=\"async\" src=\"images\/ui\/rule.png\" alt=\"\" \/><\/div>\n<div id=\"clarify-step-14\" class=\"clarify-step-container\">\n<h2 class=\"clarify-step-title\">14. Add characters to selected data<\/h2>\n<div class=\"clarify-step-instructions\">\n<p>Let&#8217;s say we want to add the prefix <strong>S.S.<\/strong> to the name of any boat that has the vessel type <strong>schooner<\/strong>. We&#8217;ll do that by first using our vessel type facet to select all the rows with the term <strong>schooner<\/strong> in the <strong>VESSEL TYPE<\/strong> column.<\/p>\n<p>Once you have all of the schooners selected, head to the <strong>SHIP&#8217;S NAME<\/strong> column. Click on the down arrow, then select <strong>Edit cells<\/strong>, and then <strong>Transform&#8230;<\/strong><\/p>\n<p>The popup box that follows wants you to use a language called the Google Refine Expression Language (GREL) to transform your data. You don&#8217;t have to actually know GREL; you just have to be able to look up the pattern for the expression you want to write.<\/p>\n<p>When you want to add a prefix to some data in OpenRefine, the pattern looks like this:<\/p>\n<p>&#8220;prefix&#8221;+value<\/p>\n<p>So in the blank text box, type<\/p>\n<p>&#8220;S.S. &#8220;+value<\/p>\n<p>You&#8217;ll see a preview of how your data will look in the lower right-hand column. When you&#8217;re satisfied, press <strong>OK<\/strong>.<\/p>\n<p>Now the title of every schooner is prefaced with &#8220;S.S.&#8221;!<\/p>\n<\/div>\n<div class=\"clarify-step-image-wrapper\">\n<div class=\"clarify-step-image-container\"><img loading=\"lazy\" decoding=\"async\" class=\"clarify-step-image\" src=\".\/images\/docs\/get-started-with-openrefine\/add-characters-to-selected-data.png\" alt=\"\" width=\"719\" height=\"517\" \/><\/div>\n<\/div>\n<\/div>\n<div class=\"clarify-clear\">\u00a0<a href=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/add-characters-to-selected-data.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1779\" src=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/add-characters-to-selected-data.png\" alt=\"\" width=\"719\" height=\"517\" srcset=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/add-characters-to-selected-data.png 719w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/add-characters-to-selected-data-300x216.png 300w\" sizes=\"auto, (max-width: 719px) 100vw, 719px\" \/><\/a><\/div>\n<div class=\"rule\"><img decoding=\"async\" src=\"images\/ui\/rule.png\" alt=\"\" \/><\/div>\n<div id=\"clarify-step-15\" class=\"clarify-step-container\">\n<h2 class=\"clarify-step-title\">15. Export your data<\/h2>\n<div class=\"clarify-step-instructions\">\n<p>Once you&#8217;ve cleaned up your data, you&#8217;ll want to get it out of OpenRefine. To do that, click on the <strong>Export<\/strong> button in the upper right-hand corner. Then click on <strong>Comma-separated value<\/strong>. Your cleaned-up spreadsheet should begin downloading. You can download your data as many times as you want, at any stage of the project.<\/p>\n<p>To close OpenRefine, just close the window or tab in your browser.<\/p>\n<\/div>\n<div class=\"clarify-step-image-wrapper\">\n<div class=\"clarify-step-image-container\"><img loading=\"lazy\" decoding=\"async\" class=\"clarify-step-image\" src=\".\/images\/docs\/get-started-with-openrefine\/export-your-data.png\" alt=\"\" width=\"503\" height=\"489\" \/><a href=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/export-your-data.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1780\" src=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/export-your-data.png\" alt=\"\" width=\"503\" height=\"489\" srcset=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/export-your-data.png 503w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/export-your-data-300x292.png 300w\" sizes=\"auto, (max-width: 503px) 100vw, 503px\" \/><\/a><\/div>\n<\/div>\n<\/div>\n<div class=\"clarify-clear\"><\/div>\n<div class=\"rule\"><img decoding=\"async\" src=\"images\/ui\/rule.png\" alt=\"\" \/><\/div>\n<div id=\"clarify-step-16\" class=\"clarify-step-container\">\n<h2 class=\"clarify-step-title\">16. That&#8217;s just the beginning!<\/h2>\n<div class=\"clarify-step-instructions\">\n<p>These are some of the most common tasks you&#8217;ll want to perform in OpenRefine, but OpenRefine can also handle tasks of much greater complexity. To get a sense of some of these tasks, see the resources on the <strong>OpenRefine Resources<\/strong> page: <a href=\"http:\/\/miriamposner.com\/classes\/dh101f17\/tutorials-guides\/data-manipulation\/openrefine-resources\/\">http:\/\/miriamposner.com\/classes\/dh101f17\/tutorials-guides\/data-manipulation\/openrefine-resources\/<\/a><\/p>\n<\/div>\n<div class=\"clarify-step-image-wrapper\">\n<div class=\"clarify-step-image-container\"><img loading=\"lazy\" decoding=\"async\" class=\"clarify-step-image\" src=\".\/images\/docs\/get-started-with-openrefine\/that-s-just-the-beginning-.png\" alt=\"\" width=\"771\" height=\"579\" \/><\/div>\n<\/div>\n<\/div>\n<div class=\"clarify-clear\">\u00a0<a href=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/that-s-just-the-beginning-.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1781\" src=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/that-s-just-the-beginning-.png\" alt=\"\" width=\"771\" height=\"579\" srcset=\"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/that-s-just-the-beginning-.png 771w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/that-s-just-the-beginning--300x225.png 300w, http:\/\/miriamposner.com\/classes\/dh101f17\/wp-content\/uploads\/sites\/7\/2017\/10\/that-s-just-the-beginning--768x577.png 768w\" sizes=\"auto, (max-width: 771px) 100vw, 771px\" \/><\/a><\/div>\n<\/div>\n<\/div>\n<div class=\"rule\"><img decoding=\"async\" src=\"images\/ui\/rule.png\" alt=\"\" \/><\/div>\n<div class=\"footer\"><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>DHers spend a TON of time cleaning and manipulating data. Luckily, there&#8217;s a tool that makes all of this easier.<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":139,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_eb_attr":"","footnotes":""},"class_list":["post-1764","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/pages\/1764","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/comments?post=1764"}],"version-history":[{"count":0,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/pages\/1764\/revisions"}],"up":[{"embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/pages\/139"}],"wp:attachment":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/media?parent=1764"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}