Let’s say you have a list of lat/longs for cities. You also have a much larger spreadsheet that contains the city names, but no latitudes or longitudes. How can you add the lat/longs to the big spreadsheet, without using “find and replace” hundreds of times?

You’ll use our beloved OpenRefine! This time, we’ll use the “Add a column based on this column” function, entering a special formula that instructs OpenRefine to grab data from another spreadsheet.

Open both your spreadsheets in OpenRefine

media_1446703310466.png

I have two spreadsheets. One is called cushman_metadata.csv and is the original dataset, which contains 14,425 records. The second spreadsheet, called citylist.csv, contains three columns: City, Latitude, and Longitude. Because citylist.csv only contains each individual city name one time, it has only 928 rows.

Select the column in your original, big dataset that contains the city names

media_1446703495764.png

In this case, I geocoded my data to the City level. So I’m going to click on the little down-arrow to the left of the name of the City column.

Select “Add Column Based on This Column”

media_1446703610803.png

From that same column’s menu, select Add Column Based on This Column.

Add this formula to the text box in the window that pops up

media_1446704350649.png

Paste this formula into the Expression text box and edit it so that it reflects the names of your other spreadsheet and the relevant columns:

if (value!='null',cell.cross("citylist csv", "City").cells["Latitude"].value[0],'')

(One way to restate this formula in English would be to say, “As long as there’s something in the cells in this column, look in this other spreadsheet and use the column in that spreadsheet called ‘City’ as an index. Grab the corresponding data for that cell from the ‘Latitude’ column, or if you run into a problem, don’t put anything in the cell.”)

Here’s what that looks like

media_1446704546093.png

Once you’ve got the formula right, latitude data will show up in the new column. I’ll call that new column Latitude, since it contains my latitude info. When you’re happy, press OK. 

Hey, it worked!

media_1446704600622.png

Now you have a new column called Latitude that contains information pulled from your other spreadsheet.

Repeat that step for Longitude data.

media_1446704752834.png

Again, working from the City column in your big, original spreadsheet (mine is called cushman_metadata.csv), paste the edited formula in the Expression text box. This time, though, change the word Latitude to Longitude.

You did it!

media_1446704868250.png

Your data is fully geocoded.