{"id":3372,"date":"2017-11-20T23:11:51","date_gmt":"2017-11-21T07:11:51","guid":{"rendered":"http:\/\/miriamposner.com\/classes\/dh101f17\/?page_id=3372"},"modified":"2017-11-20T23:24:20","modified_gmt":"2017-11-21T07:24:20","slug":"derive-gender-from-a-column-of-first-names","status":"publish","type":"page","link":"http:\/\/miriamposner.com\/classes\/dh101f17\/tutorials-guides\/data-manipulation\/derive-gender-from-a-column-of-first-names\/","title":{"rendered":"Derive gender from a column of first names"},"content":{"rendered":"<p><em>This tutorial is based on\u00a0<a href=\"http:\/\/blog.silk.co\/post\/127234807482\/from-ombd-to-gender-data-on-film-directors-how-to\" rel=\"nofollow\">&#8220;From OMBD to Gender Data on Film Directors.&#8221;<\/a><\/em><\/p>\n<p>What can you do if you want to perform a gender-based analysis of your dataset, but &#8220;gender&#8221; isn&#8217;t a category in your data? You can use computational methods to perform an educated guess, based on the first name of the person.<\/p>\n<p>Is it flawless? No way. First names can often be ambiguous, and a woman could easily have a &#8220;man&#8217;s&#8221; name, or vice versa. But often a name is all we have, and sometimes the benefits of performing a gender-based analysis outweigh the problems of computationally deriving gender.<\/p>\n<p>In this tutorial, we&#8217;ll use a tool called genderize.io. Genderize takes advantage of a database of thousands of names and genders to give you a probable gender for a name. It also gives you a probability for each guess at gender. You can read more about it\u00a0<a href=\"https:\/\/genderize.io\/\" rel=\"nofollow\">here<\/a>.<\/p>\n<p>An important caveat: Genderize will only give you 1,000 guesses at gender per day, so you may have to divide up your names among team members, or use Genderize in installments.<\/p>\n<h2><a id=\"user-content-make-sure-you-have-a-column-that-contains-only-first-names\" class=\"anchor\" href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/derive-gender-from-a-column-of-first-names.md#make-sure-you-have-a-column-that-contains-only-first-names\" aria-hidden=\"true\"><\/a>Make sure you have a column that contains only first names.<\/h2>\n<p>If your column contains first and last names, you&#8217;ll have to use OpenRefine&#8217;s &#8220;split cells&#8221; function to isolate first names in their own column.<\/p>\n<p><a href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/images\/derive-gender-from-a-column-of-first-names\/make-sure-you-have-a-column-that-contains-only-first-names.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/github.com\/miriamposner\/derive_gender\/raw\/master\/images\/derive-gender-from-a-column-of-first-names\/make-sure-you-have-a-column-that-contains-only-first-names.png\" alt=\"\" \/><\/a><\/p>\n<h2><a id=\"user-content-load-your-spreadsheet-into-google-drive\" class=\"anchor\" href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/derive-gender-from-a-column-of-first-names.md#load-your-spreadsheet-into-google-drive\" aria-hidden=\"true\"><\/a>Load your spreadsheet into Google Drive<\/h2>\n<p>Open your CSV in Google Sheets.<\/p>\n<p><a href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/images\/derive-gender-from-a-column-of-first-names\/load-your-spreadsheet-into-google-drive.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/github.com\/miriamposner\/derive_gender\/raw\/master\/images\/derive-gender-from-a-column-of-first-names\/load-your-spreadsheet-into-google-drive.png\" alt=\"\" \/><\/a><\/p>\n<h2><a id=\"user-content-insert-five-blank-columns-to-the-right-of-the-column-of-first-names\" class=\"anchor\" href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/derive-gender-from-a-column-of-first-names.md#insert-five-blank-columns-to-the-right-of-the-column-of-first-names\" aria-hidden=\"true\"><\/a>Insert five blank columns to the right of the column of first names<\/h2>\n<p>You&#8217;ll need those columns to store the information from Genderize.<\/p>\n<p><a href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/images\/derive-gender-from-a-column-of-first-names\/insert-five-blank-columns-to-the-right-of-the-column-of-first-names.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/github.com\/miriamposner\/derive_gender\/raw\/master\/images\/derive-gender-from-a-column-of-first-names\/insert-five-blank-columns-to-the-right-of-the-column-of-first-names.png\" alt=\"\" \/><\/a><\/p>\n<h2><a id=\"user-content-enter-the-formula-to-query-genderize\" class=\"anchor\" href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/derive-gender-from-a-column-of-first-names.md#enter-the-formula-to-query-genderize\" aria-hidden=\"true\"><\/a>Enter the formula to query Genderize<\/h2>\n<p>In the column to the right of you column of first names, enter the following formula:<\/p>\n<pre><code> =\"https:\/\/api.genderize.io\/?name=\"&amp;lower(A2)\r\n<\/code><\/pre>\n<p>Except instead of\u00a0<strong>A2<\/strong>, enter the letter and number that corresponds to the cell containing the first name in your first-name column.<\/p>\n<p><a href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/images\/derive-gender-from-a-column-of-first-names\/enter-the-formula-to-query-genderize.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/github.com\/miriamposner\/derive_gender\/raw\/master\/images\/derive-gender-from-a-column-of-first-names\/enter-the-formula-to-query-genderize.png\" alt=\"\" \/><\/a><\/p>\n<h2><a id=\"user-content-copy-that-formula-into-every-cell-in-that-column\" class=\"anchor\" href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/derive-gender-from-a-column-of-first-names.md#copy-that-formula-into-every-cell-in-that-column\" aria-hidden=\"true\"><\/a>Copy that formula into every cell in that column<\/h2>\n<p>You can do that by grabbing the tiny blue square at the bottom right of the cell and dragging it all the way to the bottom of the column. Excel will automatically modify the cell reference (like A2) so that it corresponds to the cell in the appropriate row.<\/p>\n<p><a href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/images\/derive-gender-from-a-column-of-first-names\/copy-that-formula-into-every-cell-in-that-column.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/github.com\/miriamposner\/derive_gender\/raw\/master\/images\/derive-gender-from-a-column-of-first-names\/copy-that-formula-into-every-cell-in-that-column.png\" alt=\"\" \/><\/a><\/p>\n<h2><a id=\"user-content-enter-the-formula-to-send-your-query-to-genderize\" class=\"anchor\" href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/derive-gender-from-a-column-of-first-names.md#enter-the-formula-to-send-your-query-to-genderize\" aria-hidden=\"true\"><\/a>Enter the formula to send your query to Genderize<\/h2>\n<p>That formula looks like this:<\/p>\n<pre><code> =IMPORTDATA(B2)\r\n<\/code><\/pre>\n<p>except instead of B2, reference the cell in your own spreadsheet that includes the formula you added in the last step.<\/p>\n<p>Now drag that formula all the way down to the end of the column, just the way you did in the previous step.<\/p>\n<p>As you drag, the contents of the cell will read &#8220;Loading&#8230;&#8221; indicating that Genderize is querying its database.<\/p>\n<p><a href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/images\/derive-gender-from-a-column-of-first-names\/enter-the-formula-to-send-your-query-to-genderize.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/github.com\/miriamposner\/derive_gender\/raw\/master\/images\/derive-gender-from-a-column-of-first-names\/enter-the-formula-to-send-your-query-to-genderize.png\" alt=\"\" \/><\/a><\/p>\n<h2><a id=\"user-content-you-have-gender\" class=\"anchor\" href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/derive-gender-from-a-column-of-first-names.md#you-have-gender\" aria-hidden=\"true\"><\/a>You have gender!<\/h2>\n<p>In the blank columns you added earlier, Genderize will fill in the following information: gender, the degree of certainty of that gender (from 0 to 1), and the number of data entries it examined to arrive at the response.<\/p>\n<p>You may not need the probability and count information, but it&#8217;s good to know.<\/p>\n<p><a href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/images\/derive-gender-from-a-column-of-first-names\/you-have-gender-.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/github.com\/miriamposner\/derive_gender\/raw\/master\/images\/derive-gender-from-a-column-of-first-names\/you-have-gender-.png\" alt=\"\" \/><\/a><\/p>\n<h2><a id=\"user-content-copy-the-gender-column-and-paste-it-as-a-value-1\" class=\"anchor\" href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/derive-gender-from-a-column-of-first-names.md#copy-the-gender-column-and-paste-it-as-a-value-1\" aria-hidden=\"true\"><\/a>Copy the gender column and paste it as a value (1)<\/h2>\n<p>You&#8217;ll probably want to modify the cells that begin with\u00a0<strong>gender:&#8221;<\/strong>\u00a0so that they simply read\u00a0<strong>male<\/strong>,\u00a0<strong>female<\/strong>, and\u00a0<strong>null<\/strong>. But right now, if you try to modify them, Google Sheets will get confused, because it wants to display the results of its query to Genderize.<\/p>\n<p>To get around this, first insert a new column after the column that contains the\u00a0<strong>count<\/strong>\u00a0information.<\/p>\n<p><a href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/images\/derive-gender-from-a-column-of-first-names\/copy-the-gender-column-and-paste-it-as-a-value--1-.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/github.com\/miriamposner\/derive_gender\/raw\/master\/images\/derive-gender-from-a-column-of-first-names\/copy-the-gender-column-and-paste-it-as-a-value--1-.png\" alt=\"\" \/><\/a><\/p>\n<h2><a id=\"user-content-copy-the-gender-column-and-paste-it-as-a-value-2\" class=\"anchor\" href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/derive-gender-from-a-column-of-first-names.md#copy-the-gender-column-and-paste-it-as-a-value-2\" aria-hidden=\"true\"><\/a>Copy the gender column and paste it as a value (2)<\/h2>\n<p>Now copy the entire column that contains gender information.<\/p>\n<p><a href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/images\/derive-gender-from-a-column-of-first-names\/copy-the-gender-column-and-paste-it-as-a-value--2-.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/github.com\/miriamposner\/derive_gender\/raw\/master\/images\/derive-gender-from-a-column-of-first-names\/copy-the-gender-column-and-paste-it-as-a-value--2-.png\" alt=\"\" \/><\/a><\/p>\n<h2><a id=\"user-content-copy-the-gender-column-and-paste-it-as-a-value-3\" class=\"anchor\" href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/derive-gender-from-a-column-of-first-names.md#copy-the-gender-column-and-paste-it-as-a-value-3\" aria-hidden=\"true\"><\/a>Copy the gender column and paste it as a value (3)<\/h2>\n<p>Finally, place your cursor in the first cell of your new, blank column. From the\u00a0<strong>Edit<\/strong>\u00a0menu, choose\u00a0<strong>Paste special<\/strong>\u00a0and then choose\u00a0<strong>Values only<\/strong>.<\/p>\n<p>This will paste only the contents of your gender cells, without any of the formulas used to calculate those values.<\/p>\n<p><a href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/images\/derive-gender-from-a-column-of-first-names\/copy-the-gender-column-and-paste-it-as-a-value--3-.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/github.com\/miriamposner\/derive_gender\/raw\/master\/images\/derive-gender-from-a-column-of-first-names\/copy-the-gender-column-and-paste-it-as-a-value--3-.png\" alt=\"\" \/><\/a><\/p>\n<h2><a id=\"user-content-get-rid-of-the-extra-characters\" class=\"anchor\" href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/derive-gender-from-a-column-of-first-names.md#get-rid-of-the-extra-characters\" aria-hidden=\"true\"><\/a>Get rid of the extra characters<\/h2>\n<p>The easiest way is to use\u00a0<strong>Find and replace<\/strong>\u00a0to first replace\u00a0<strong>gender:<\/strong>\u00a0with nothing and then replace **&#8221; **with nothing.<\/p>\n<p><a href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/images\/derive-gender-from-a-column-of-first-names\/get-rid-of-the-extra-characters.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/github.com\/miriamposner\/derive_gender\/raw\/master\/images\/derive-gender-from-a-column-of-first-names\/get-rid-of-the-extra-characters.png\" alt=\"\" \/><\/a><\/p>\n<h2><a id=\"user-content-you-have-a-column-of-just-gender\" class=\"anchor\" href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/derive-gender-from-a-column-of-first-names.md#you-have-a-column-of-just-gender\" aria-hidden=\"true\"><\/a>You have a column of just gender!<\/h2>\n<p>Not too hard! You can get rid of the extra columns (columns\u00a0<strong>B<\/strong>\u00a0through\u00a0<strong>F<\/strong>\u00a0in the spreadsheet below) if you want.<\/p>\n<p><a href=\"https:\/\/github.com\/miriamposner\/derive_gender\/blob\/master\/images\/derive-gender-from-a-column-of-first-names\/you-have-a-column-of-just-gender-.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/github.com\/miriamposner\/derive_gender\/raw\/master\/images\/derive-gender-from-a-column-of-first-names\/you-have-a-column-of-just-gender-.png\" alt=\"\" \/><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This tutorial is based on\u00a0&#8220;From OMBD to Gender Data on Film Directors.&#8221; What can you do if you want to<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":139,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_eb_attr":"","footnotes":""},"class_list":["post-3372","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/pages\/3372","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/comments?post=3372"}],"version-history":[{"count":0,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/pages\/3372\/revisions"}],"up":[{"embeddable":true,"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/pages\/139"}],"wp:attachment":[{"href":"http:\/\/miriamposner.com\/classes\/dh101f17\/wp-json\/wp\/v2\/media?parent=3372"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}