Last updated May 15, 2013
There’s research, there’s writing, and then there’s that netherworld in between: wrangling all the digital files you gather over the course of your work. Digital files are often easier to deal with than stacks of paper, but they can also proliferate frighteningly quickly.
I teach a workshop on this topic, catchily titled Managing Research Assets (better names welcome). Below is a digital version of the workshop handout, followed by a link dump of my favorite posts about developing and refining digital research workflows. You can also download a PDF version of my handout, or a Word version if you’d like to modify it.
Jump to tools for:
- bibliographic management
- file renaming and management
- indexers and “everything buckets”
- annotation tools
- optical character recognition
Preserving your digital assets
The Library of Congress offers guidelines for preserving digital material, including photographs, audio, video, email, digital records, and websites. For a more technical and specific discussion of digital formats, see the Library of Congress’s “Sustainability of Digital Formats.”
In general, the Library of Congress recommends that you:
- Identify: Make an audit of what you have.
- Decide which of your assets you want to keep and which you don’t need.
- Organize your assets: Give them descriptive filenames, organize them into a logical file structure, and write down your organizational scheme.
- Make copies. It’s a good idea to have copies in a number of locations. Every few years, check your copies to see if you need to export them to a newer format.
Developing a digital research workflow
There’s no “right” research workflow. The practice that makes sense for you will depend on your own research habits and the kinds of material you work with. As you investigate tools, think about:
- Capturing sources. Do you do most of your research online, in an archive, or at the library? You’ll need a tool (or tools) that’s appropriate for the way you really work and easily captures the data you need in a format that’s preservable — and preferably in a way that’s organized.
- Metadata. Few things are more frustrating than locating just the information you need but not being able to determine its origin. That’s why it’s important to think about how you’re capturing information about each asset you gather, like its source and its importance to your research.
- Searching and retrieving. None of this does you any good if you can’t get your hands on the data you need when you need it. Metadata will help you find the right stuff, but you may also want to think about tools for OCR (optical character recognition) and for “fuzzy” searching.
You should also be thinking about whether and how you can export your data. That may seem boring now, but it won’t when the tool you’re using becomes obsolete!
Tools to Consider
This list is not comprehensive. Instead, it reflects my understanding of the tools my colleagues are actually using at the moment. Prices reflect educational discounts, if applicable. Am I missing something important? Please let me know in the comments!
- Time Machine (Mac, already installed on your computer, automatically backs up your data to a hard drive at scheduled intervals)
- Windows Backup and Restore (Windows, already installed on your computer, backs up your data to a hard drive at scheduled intervals)
- Mozy (Mac and Windows, $5.99/month, backs up your data remotely at scheduled intervals)
- BackBlaze (Mac and Windows, $5/month, backs up your data remotely at scheduled intervals)
- SpiderOak (Mac and Windows, free or $100/year, backs up your data remotely)
- DropBox (Mac and Windows, free or $10–$20/month, backs up your data remotely)
There are a lot of good options out there for saving, sorting, and citing your sources. The key point is that you really should be using some kind of bibliographic management system. You’ll regret it if you don’t.
- Zotero (Mac and Windows, free)
- EndNote (Mac and Windows, $249.95)
- Mendeley (Mac, Windows, and Linux, free)
- Sente (Mac and iPad, $89.95)
- Bookends (Mac, $69)
- Papers (Mac and iPhone, $47.40)
File Renaming and Organization
If, for example, you take a lot of photos in an archive, you probably come home with tons of files with totally unintelligible names. Several tools can help you organize these assets and give them human-readable names.
- NameDropper (Windows, $10, batch renamer that allows you to set patterns)
- Belvedere (Windows, free, allows you to set rules to rename and organize files)
- Dropbbox Automator (Windows and Mac, free, allows you to automatically perform actions on files in a Dropbox folder)
- Hazel (Mac, $21.95, allows you to set rules to rename and organize files)
- Automator (Mac, already installed on your computer, allows you to perform many actions on your files)
Indexers and “Everything Buckets”
Depending on how you work, you may find it important to grab and tag things — from the Internet or from “real life” — quickly and easily. There are some very good tools for this. Be careful, though: It’s not enough to grab something. You have to be able to find it again, too!
- EverNote (Windows, Mac, Android, and iPhone; free or $45/year; captures and tags Web pages, photos, and other documents)
- Yojimbo (Mac and iPhone, $38.99, capture and tag notes and documents)
- VoodooPad (Mac and iPhone, $39.96, capture and tag notes and documents)
- SOHO Notes (Mac and iPhone; $39.99; capture, tag, and organize notes and documents and create custom forms)
- DEVONthink (Mac and iPhone; $49.95 for the personal edition; indexes your files, allows you to organize them and add notes and metadata, offers “fuzzy” searching)
This is one of the murkier categories, because many other kinds of tools have annotation capabilities built in: Zotero, EverNote, SOHO Notes, and VoodooPad, to name a few. But some of these solutions might be too much tool if you’re just in the market for annotation.
Tools to annotate websites:
- AnnotateIt (take notes directly on any webpage and share those notes if you want; Windows and Mac, free).
- Crocodoc (annotate and share webpages, PDFs, images, Word docs online; Windows and Mac, free).
- Diigo (collect, highlight, annotate, and share websites; Windows, Mac, iPhone, iPad, and Android, free)
- A.nnotate (annotate and share websites and PDFs; Windows, Mac, free for limited capabilities).
Tools to annotate PDFs and other documents:
- GoodReader (annotate, highlight, comment on a wide range of files; iPhone and iPad, $4.99)
- iAnnotate (annotate, highlight, comment on a wide range of files; iPhone, iPad, and Android, $9.99)
- PDF Expert (annotate, highlight, comment on PDFs; iPad, $9.99)
- Digitate (annotate images; iPad and iPhone, free)
- ClipNotes (annotate video, designed by UCLA TFT prof Stephen Mamber; iPad, $1.99)
Optical Character Recognition (OCR)
Your sources become much more findable when your run OCR on them. Of course, depending on the kinds of sources you gather, OCR may be imperfect (or impossible). Find a more comprehensive list of open-source OCR tools here (thanks to Clemens Neudecker). OCR is often imperfect, but you can sometimes improve your results by using OCR post-correction and enhancement tools.
- ABYY FineReader (Windows and Mac, $49.99 and $99.99, respectively)
- Adobe Acrobat Pro (Windows and Mac, $404.10)
- OCRopus (Mac and Linux, free)
- PDF Scanner (Mac, $14.99, scans documents and performs OCR)
- EverNote (see above; EverNote automatically performs OCR on your documents)
- DEVONthink (see above; DEVONthink automatically performs OCR on your documents)
An “everything bucket” is a database, of course, but sometimes you need a tool that structures your data, too. Structure is great, but you should also be honest with yourself about whether the tool will fit easily into your workflow.
If you are contemplating building a database for your research, I strongly recommend that you first read Mark Merry’s “Designing and Using Databases for Historical Research” (you’ll need to register, but it’s worth the trouble). Merry lays out some basic principles of database design that will serve you well as your research progresses and your database grows.
- askSam (Windows, $149.95, designed to make database-creation quick and easy)
- Microsoft Access (Windows, $139.99)
- FileMaker Pro (Windows, Mac, $179.00)
- Bento (a lighter-weight version of FileMaker, Mac, $49.00)
- Base (free, part of the OpenOffice suite, Windows, Mac, Linux)
Links on Developing a Digital Research Workflow
I’m constantly adding links to research tools and methods here.
My starting point for finding digital research tools is the DiRT (Digital Research Tools) wiki.
William J. Turkel, at the University of Western Ontario, is the master of the digital research methodology, and his “Workflow for Digital Research Using Off-the-Shelf Tools” is an invaluable resource. If off-the-shelf tools are no longer enough for you, The Programming Historian is a wonderful, accessible way to learn programming techniques that will immediately enhance your research.
Profhacker, at the Chronicle of Higher Education, regularly publishes great advice on research tools.
I frequently check Lifehacker for excellent advice and tool recommendations.
The University of Amsterdam hosts the Digital Methods Initiative, which offers this extensive and useful tools wiki.
A number of scholars write frequently about their own research methods.
- Shane Landrum documents his methods at History Research Hacks and on his blog. I also love Shane’s “Camera, Laptop, and What Else?: Hacking Better Tools for the Short Archival Research Trip.”
- William J. Turkel often blogs about his digital methods.
- Chad Black also blogs about his work.
- Marta S. Rivera Monclova writes about digital tools, too.
- Shannon Mattern has a great series of posts related to her graduate methods class, including this post on notetaking and abstracting.
- Kalani Craig describes her research process.
- Will Howarth has spoken about the digital tools he uses, including a great many for the iPad.
- Shawn Graham created this amazing diagram of his own research workflow.
- Caleb McDaniel often writes about the digital tools he uses in his work.
- Into the Archives is a new-ish blog about digital research tools for historians.
- Over at HASTAC, Mary Caton Lingold started a great discussion about research and studying tools.
- P. Kerim Friedman’s “Reading Fast, Reading Slow” is a good, tool-focused post that distinguishes between scanning and deep-reading.
- Aleh Cherp, an environmental science professor at Central European University, maintains Academic Workflows on Mac, a very useful blog.
- Lennart Olsen maintains a Delicious stack of research tools.
- The Text Pistols, a group blog run by UCLA History Ph.D.s, has a great set of application reviews and recommendations.
Smithsonian Institution: Born Digital Video Preservation: A Final Report (PDF)
Library of Congress Digital Preservation Program
Scholars who use DEVONthink are often evangelical about it.
- Steven Johnson, “Tool for Thought” and a related blog post
- AcademHack posts on DEVONthink
- GigaOm, DEVONthink vs. EverNote
- Rachel Leow has a wonderful series of posts on using DEVONthink for historical research: I, II, and III
- Shane Landrum has written about how he uses DEVONthink
- Chad Black gives a terrifically detailed description of how he uses DT
- Douglas Barone is less enthusiastic about DEVONthink
I’ve written about how I use Automator to batch-process research photos
Gina Hiatt has written about how she uses EverNote
Kalani Craig has also written about EverNote for historical research
Brian Croxall did us all a great service by writing this definitive comparison of Zotero and Endnote
Tips on Taking Photos in Archives
An excellent guide from the University of Illinois, Urbana-Champaign.