Last week, I taught the image-mining portion of the Images and Texts in Medical History workshop at the National Library of Medicine. I am far from an expert on OpenCV, the open-source computer-vision library. But as usual, that didn’t stop me from attempting to teach it.
The materials I created for the workshop include detailed instructions on how to use OpenCV to extract images from scanned journal pages (using a script written by Chris Adams), as well as a detailed breakdown of how to use the Python OpenCV library to take the average color of an image. I’ve also included links to my favorite resources on OpenCV and computer-vision in general. (My experience has been that there are a lot of really terrible tutorials out there, so I’ve tried to link only to those that are actually helpful.)
Ben Schmidt taught the text-mining portion of the workshop, and his materials are really great. His handouts in particular are concise, opinionated rundowns of the strengths and weaknesses of various forms of text analysis.
In preparation for the workshop, Ben and I created a virtual machine, provisioned via Vagrant with all the dependencies and data the participants needed. If you’d like to install the VM, it has everything you need for both Ben and my portions of the workshop, and the instructions should be pretty clear. (The VM is based on one that Andrew Goldstone created for his Literary Data class.)
The process of getting the VM installed on participants’ own computers was … complicated. We learned many things about Vagrant and VirtualBox, including the fact that Windows 7 and 8 don’t come with any way to handle SSH.
It was definitely the most technically complex workshop either of us have attempted (to a group of about 50!!). It was definitely not hitch-free, but it was really satisfying to see participants get excited about computer vision, and to talk about ways they might use these techniques in their own research.