{"id":678,"date":"2011-03-05T20:35:06","date_gmt":"2011-03-06T03:35:06","guid":{"rendered":"http:\/\/miriamposner.com\/blog\/?p=678"},"modified":"2012-07-04T12:39:59","modified_gmt":"2012-07-04T19:39:59","slug":"batch-processing-photos-from-your-archive-trip","status":"publish","type":"post","link":"https:\/\/miriamposner.com\/blog\/batch-processing-photos-from-your-archive-trip\/","title":{"rendered":"Batch-processing photos from your archive trip"},"content":{"rendered":"<figure id=\"attachment_696\" aria-describedby=\"caption-attachment-696\" style=\"width: 300px\" class=\"wp-caption alignleft\"><a href=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/DSC02608.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-696  \" title=\"National Archives reading room\" src=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/DSC02608-300x225.jpg\" alt=\"\" width=\"300\" height=\"225\" srcset=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/DSC02608-300x225.jpg 300w, https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/DSC02608-1024x768.jpg 1024w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-696\" class=\"wp-caption-text\">The National Archives in College Park, Maryland, where it seemed most researchers were using digital cameras or scanners.<\/figcaption><\/figure>\n<p>Today at <a href=\"http:\/\/southeast2011.thatcamp.org\/\">THATCamp Southeast<\/a> I helped organize a session (with Andrew Famiglietti from Georgia Tech) called Research Hacks. We brainstormed ways to use technology to enhance research, both at the archive and when examining born-digital sources. After I proposed the session, I had a moment of panic when I realized I didn&#8217;t really have any great hacks to offer. Luckily, I had a few hours and the impetus to finally put together some techniques I&#8217;d been meaning to investigate.<\/p>\n<p>Like many researchers, I use a camera to take photos of documents during archival research trips. My problem comes when I arrive home with a bunch of photos that look like this:<\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-06-at-6.45.40-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-693\" title=\"File full of images labeled &quot;DSC ...&quot;\" src=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-06-at-6.45.40-AM.png\" alt=\"File full of images labeled &quot;DSC ...&quot;\" width=\"192\" height=\"308\" srcset=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-06-at-6.45.40-AM.png 192w, https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-06-at-6.45.40-AM-187x300.png 187w\" sizes=\"auto, (max-width: 192px) 100vw, 192px\" \/><\/a><\/p>\n<p>Ugh. What to do with all these &#8220;DSCs&#8221;? Here&#8217;s a way to convert those images into documents that are actually searchable and usable.<\/p>\n<p><!--more--><\/p>\n<p>So, first, credit where credit is due: the idea to use Hazel and Automator comes from <a href=\"http:\/\/cliotropic.org\/blog\/\">Shane Landrum<\/a>, whose excellent talk, <a href=\"http:\/\/cliotropic.org\/blog\/talks\/camera-laptop-and-what-else\/\">&#8220;Camera, Laptop, and What Else?: Hacking Better Tools for the Short Archival Research Trip,&#8221;<\/a> I saw at Yale in 2009.<\/p>\n<h2>Renaming Photos<\/h2>\n<p><a href=\"http:\/\/www.noodlesoft.com\/hazel.php\">Hazel<\/a> is a Mac utility ($21.95; free trial for 14 days) that keeps an eye on the folders you specify. At your direction, it&#8217;ll perform certain actions on those folders. I use it to rename the images in a folder I&#8217;ve called &#8220;Research Photos.&#8221; In this case, I&#8217;ve told Hazel to change my image names from the dreaded &#8220;DSC000000&#8221; to &#8220;NLM&#8221; (which stands for the National Library of Medicine) and then the date the photos were taken.<\/p>\n<p><a href=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-05-at-9.28.54-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-full wp-image-680\" title=\"Using Hazel to rename files\" src=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-05-at-9.28.54-PM.png\" alt=\"Using Hazel to rename files\" width=\"665\" height=\"300\" srcset=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-05-at-9.28.54-PM.png 665w, https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-05-at-9.28.54-PM-300x135.png 300w\" sizes=\"auto, (max-width: 665px) 100vw, 665px\" \/><\/a><\/p>\n<p>That&#8217;s already a lot better. Now I have some assurance that if my photos get moved around, I&#8217;ll still at least know which archive they&#8217;re from.<\/p>\n<p>But now I have a bunch of JPGs. Personally, I prefer PDFs, because then I can run optical character recognition (OCR) on them with Acrobat Pro, as I explain below.<\/p>\n<h2>JPGs to one Big PDFs<\/h2>\n<p>Please see updated instructions <a href=\"https:\/\/miriamposner.com\/blog\/?p=1253\">here<\/a> for turning all your JPGs into one big PDF.<\/p>\n<h2>Creating Searchable PDFs<\/h2>\n<p>The easiest way to run OCR, or optical character recognition (which recognizes text in your images), is to use Adobe Acrobat Pro. Even if you don&#8217;t have it on your own computer, your university probably has it somewhere. But if you can&#8217;t get access to Acrobat Pro, <a href=\"http:\/\/cliotropic.org\/wip\/\">Shane outlines a couple of other options<\/a>, including Evernote and Ocropus.<\/p>\n<p>Once I&#8217;ve opened the PDF in Acrobat Pro, I click on &#8220;Document,&#8221; &#8220;OCR Text Recognition,&#8221; and then &#8220;Recognize Text Using OCR.&#8221;<\/p>\n<p>Now I have a document with text that can be copied and searched. It&#8217;s really, really dirty, but it&#8217;s better than a plain old image.<\/p>\n<p><a href=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-05-at-9.55.58-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-683\" title=\"Selecting text in my archival photo\" src=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-05-at-9.55.58-PM.png\" alt=\"Selecting text in my archival photo\" width=\"775\" height=\"862\" srcset=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-05-at-9.55.58-PM.png 775w, https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-05-at-9.55.58-PM-269x300.png 269w\" sizes=\"auto, (max-width: 775px) 100vw, 775px\" \/><\/a><\/p>\n<h2>Getting Your PDF into Zotero<\/h2>\n<p>I like to keep all my research PDFs in Zotero, so I create a Zotero item for the PDF. Then I drag the PDF into Zotero, control-click on the PDF, and then click on &#8220;Rename File from Parent Metadata.&#8221; That gives my PDF the same title as the Zotero item record I just created.<\/p>\n<p><a href=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-05-at-9.58.44-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-684\" title=\"Placing my PDF in Zotero\" src=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-05-at-9.58.44-PM-1024x214.png\" alt=\"Placing my PDF in Zotero\" width=\"640\" height=\"133\" srcset=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-05-at-9.58.44-PM-1024x214.png 1024w, https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-05-at-9.58.44-PM-300x62.png 300w, https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-05-at-9.58.44-PM.png 1151w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p>What&#8217;s cool is that I can then search for text right from my Zotero search box. See? It found the word &#8220;human&#8221; in my PDF!<\/p>\n<p><a href=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-05-at-10.00.50-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-685\" title=\"Finding text using Zotero\" src=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-05-at-10.00.50-PM-1024x219.png\" alt=\"Finding text using Zotero\" width=\"640\" height=\"136\" srcset=\"https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-05-at-10.00.50-PM-1024x219.png 1024w, https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-05-at-10.00.50-PM-300x64.png 300w, https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-05-at-10.00.50-PM-940x198.png 940w, https:\/\/miriamposner.com\/blog\/wp-content\/uploads\/2011\/03\/Screen-shot-2011-03-05-at-10.00.50-PM.png 1155w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p>As a couple workshop attendees noted, this is a pretty unwieldy process. I&#8217;d love to see someone make a more streamlined piece of software. <a href=\"http:\/\/jasonpuckett.net\/\">Jason Puckett<\/a> pointed out that <a href=\"http:\/\/www.zotero.org\/support\/commons\">Zotero Commons <\/a>actually goes a long way toward this: it can upload your documents to the <a href=\"http:\/\/www.archive.org\/index.php\">Internet Archive<\/a>, which will then run OCR on them for you. But you have to be sure that your documents can be made publicly available.<\/p>\n<p>What do you use to process your research photos?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today at THATCamp Southeast I helped organize a session (with Andrew Famiglietti from Georgia Tech) called Research Hacks. We brainstormed ways to use technology to enhance research, both at the archive and when examining born-digital sources. After I proposed the session, I had a moment of panic when I realized I didn&#8217;t really have any [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21,17,5],"tags":[],"class_list":["post-678","post","type-post","status-publish","format-standard","hentry","category-history-technology","category-research","category-tools"],"_links":{"self":[{"href":"https:\/\/miriamposner.com\/blog\/wp-json\/wp\/v2\/posts\/678","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/miriamposner.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/miriamposner.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/miriamposner.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/miriamposner.com\/blog\/wp-json\/wp\/v2\/comments?post=678"}],"version-history":[{"count":15,"href":"https:\/\/miriamposner.com\/blog\/wp-json\/wp\/v2\/posts\/678\/revisions"}],"predecessor-version":[{"id":692,"href":"https:\/\/miriamposner.com\/blog\/wp-json\/wp\/v2\/posts\/678\/revisions\/692"}],"wp:attachment":[{"href":"https:\/\/miriamposner.com\/blog\/wp-json\/wp\/v2\/media?parent=678"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/miriamposner.com\/blog\/wp-json\/wp\/v2\/categories?post=678"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/miriamposner.com\/blog\/wp-json\/wp\/v2\/tags?post=678"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}