Google Fusion Table Basics with IU’s Cushman Collection

I’ve used Indiana University’s Cushman Collection of photographs before, in my Palladio tutorial. Google Fusion tables, though, is a slightly simpler way for people to get started with data visualization. So here’s a quick tutorial that uses the same data to create a map and some simple charts.

You can also download this tutorial as a PDF or a Word document (in case you’d like to modify it).

Here’s a preview of the map we’ll make:

Continue reading “Google Fusion Table Basics with IU’s Cushman Collection”

A fun way to introduce DH students to dataviz

As a teacher, I’ve always operated on the assumption that students are primarily interested in each other. Here’s a fun activity that takes advantage of that interest to teach students a little about data visualization. It’s an extremely unscientific Cosmo-style quiz, designed to show students which interests they have in common with each other. It’s just an introductory lesson, but it gives you a fun dataset to play with. You’ll probably want to split this among a few class sessions, since students will need at least one full class to just get familiar with Gephi.

Of course, it’s also a good chance to talk about how authoritative graphs like these can look, and whether the data these contain actually means much at all. (Probably not!)

Make a questionnaire for your students

wpid1844-media_1429464822280.png

I’d do this about a week before you do the dataviz lesson. I used Google Forms for this. Just to make things more fun, I called it the Mysterious DH Questionnaire. I asked five questions, each of which had five options. The possible answers were literally the first options that occurred to me.

Of course, you can choose whatever you want; just be sure you have a constrained list of choices (no write-ins).

Make your spreadsheet into a two-mode edge list

wpid1845-media_1429465322984.png

Now that you have your data, you want it in three different formats: 1) raw; 2) an edge list for a two-mode network graph; and 3) an edge list for a one-mode network graph. To get your two-mode list, use Open Refine to transpose columns across rows. The idea is to go from the layout shown in the above screenshot to …

wpid1846-media_1429465687328.png

… this one. It’s the same data, just rearranged into two columns.

Make your spreadsheet into a one-mode edge list

wpid1847-media_1429466525579.png

Then, if you want (you don’t have to, but it can help students see the difference between one-mode and two-node network graphs), you can project your two-mode edge list into a one-mode edge list, using Gephi and this tutorial from Shawn Graham.

Make an alluvial diagram

wpid1848-media_1429467279291.png

You can do this with the class. Use RAW to make alluvial diagrams from the raw dataset, experimenting with different categories. It’s fun to see the various relationships between, say, book and movie preferences.

Make network graphs

wpid1849-media_1429467435887.png

When the class is ready, move on to using the datasets to show which students have the most in common. Here’s a tutorial I prepared for students to use with this dataset (names have been blurred out). (And here’s a Word version of the Gephi tutorial, in case you’d like to alter it.)

Start with the two-mode network diagram, and when the class is ready, move on to the one-mode. Students really enjoyed seeing who had the most in common, examining the communities Gephi was able to detect, and comparing those communities to their own groups of friends.

Getting started with Palladio

NOTE: Scroll down to get to the tutorial itself!

Updated November 2015 for Palladio 1.1. If you’d like to use this tutorial in the classroom, or if you want to alter it and make it your own, there’s a version on Github you can do whatever you want with.

Palladio, a product of Stanford’s Humanities + Design Lab, is a web-based visualization tool for complex humanities data. Think of Palladio as a sort of Swiss Army knife for humanities data. It’s one package that includes a number of tools, each of which allows you to get a different angle on the same data.

Palladio is relatively new and still under active development which means that you will almost certainly encounter bugs! Still, it’s a very useful tool for getting a handle on a complicated dataset.

When Might Palladio be the Right Tool for You?

You have structured data.
Here, “structured data” means “data in a spreadsheet”: categorized, sorted, and stored in an Excel document or some other kind of spreadsheet application.

You’re interested in time, space, and relationships.
That’s where Palladio excels: showing you how various entities are connected across time and space.

Your data has many attributes.
Palladio’s really good at helping you uncover relationships among disparate attributes over time and space for example, it can help you see that a diarist was especially interested in trees as he traveled through North Carolina, and especially interested in bats as he traveled through Arizona. Palladio allows you to drill down through your data using faceted browsing.

When Might Palladio Not be the Right Tool for You?

You have unstructured data.
If you’re trying to analyze a long text, like a poem or a novel, Palladio won’t help you much. You’ll want to look for text analysis tools, like Voyant (http://voyant-tools.org/).

You just want to count things.
If you just want to make relatively simple charts and graphs, like a bar or pie chart, Palladio is too much tool for you! Instead, try using Excel’s built-in functions, or check out tools like Plot.ly or Tableau.

You want to present an interactive visualization.
One big limitation of Palladio is that you can’t embed or share the visualizations you create, except in static form. So while Palladio can help you explore and understand your data, it’s not great for presentation, at least not yet. Instead, try Google Fusion Tables, ManyEyes, or Tableau.

You want to create complex, fine-tuned maps and networks graphs.
While Palladio can produce maps and network graphs, you can’t customize them to any great extent, and you can’t perform sophisticated network analysis, such as calculating various measures of centrality. Instead, you might consider more sophisticated mapping tools, such as CartoDB or ArcGIS, and more sophisticated network analysis tools, such as Gephi and Cytoscape.

You hate bugs.
Palladio is still a baby, and you will almost certainly encounter some bugs. If you prefer not to use unstable software, you might investigate Google Fusion Tables or Tableau.

With that out of the way, we’re almost ready to get started using Palladio. First, though, a quick note that this tutorial does not cover some important features of Palladio, specifically its ability to link multiple data tables together, its timespan feature, and a feature that allows you to use multiple basemaps. Perhaps these will be the subject of a later tutorial!

A word on the dataset we’ll use, which you can find here.

This is a spreadsheet that contains the metadata for a portion of the Charles Weever Cushman Collection of photographs, located at Indiana University. The full Cushman Collection contains more than 14,500 Kodachrome photographs, taken between 1938 and 1969. Indiana University’s archivists were forward-thinking enough to place this data on Github, which is how we’re able to use it.

In order to make this data a little easier to work with, I’ve limited this spreadsheet to photographs taken between 1938 and 1955. I’ve also removed the “End Date” field to prevent confusion, changed the format of the date field, and added geocoordinates so that we can map the data more easily. For a great introduction to how to do some of this data manipulation on your own data, see this handout, developed by Owen Stephens on behalf of the British Library, which explains how to use the data-cleaning application OpenRefine.

A reminder that Palladio is still under development, so it can be buggy and slow! Some tips:

  • Work slowly. Wait for an option to finish loading before you click it again or click something else.
  • Do not refresh the page. You’ll lose your work.
  • On a related note: To start over, refresh the page.
  • Clicking on the Palladio logo will bring you to the Palladio homepage, but it won’t erase your work.

Navigate to Palladio.

wpid1798-media_1415771170331.png

Go to palladio.designhumanities.org and click on Start.

Upload your spreadsheet.

wpid1799-media_1415771240406.png

Click on the Load Spreadsheet or CSV tab and drag your spreadsheet onto the tab. (If you have an Excel spreadsheet, save it as a .csv file before uploading it.) Then press Load.

Hey, you imported your data!

wpid1800-media_1415771395672.png

As you can see, each column in your spreadsheet is a different category of data. If you look closely, you’ll see that Palladio has automatically categorized your data as different datatypes: “IU Archives Number” is a number, for example, while “PURL” is a URL. And if you scroll down, you’ll see that “Geocoordinates” is Latlong.

Tell Palladio what kind of data you have.

wpid1809-media_1416796422140.png

One of your data categories is a date, but Palladio hasn’t figured that out right away. We need to tell it, so that it treats this particular category as temporal data.

Click on the Date category. In the window that pops up, select Date from the Data type dropdown menu. Looks good! Click Done.

Hide some data

media_1447903129139.png

We have a lot of categories here, and Palladio runs a little faster if it has fewer of them to deal with. (Plus it’s easier to see what you have.) Let’s hide some categories we won’t be using by clicking on the tiny eye to the right of the category name. I hid Archive Date, Description from Slide Note, Image Note, and Slide Condition. You can always go back and reveal these if you decide you want them after all.

Map your data!

media_1447903367595.png

Click on the Map tab at the top of the window to go to the maps view of your data. Before we go on, let’s talk about what you see in the Map layers pane that appears in this window.

Palladio expects you to map your data in layers. This means that not only could you map one kind of thing, like photos; you could layer other kinds of things on top of that data. For example, it might be cool to have a layer of Cushman’s photos and a layer of interstate road networks, to see if Cushman traveled on highways. Palladio lets you do that!

But for the time being, we only have one layer: Cushman’s photos. So we’ll stick with that.

Map your data! (2)

media_1447903785544.png

Let’s tell Palladio what we want in our layer. We can name the Layer whatever we want. I’ll call it Photos.

Keep the map type as Points. If you happened to have data that depicted the movement of objects from place to place, you could do a point-to-point map. But we don’t have that kind of data.

If you click on the Places box, you should be able to choose Geocoordinates from the dropdown.

The Tooltip Label, which controls the label you see when your cursor hovers over a point, can be anything you want. I’ll call mine Genre 1, since that gives me some sense of what’s in the photo.

When you’ve done all this, press Add layer.

You have a map!

media_1447903865326.png

Looking good! If you hover over a map point, you should get a tooltip.

Combine your map with a timeline.

wpid1811-media_1416797659940.png

The ability to put data on a map is cool, but the real power of Palladio is the ability it gives you to explore the relationships of various features of your data through Facets and Timelines. Let’s start with a timeline, which is pretty much what it sounds like: a visualization of the distribution of your data over time.

Start by clicking on Timeline tab at the bottom of your screen. Group your data by Genre 1. Now you can see the distribution of photos over time. That’s interesting: looks like Cushman took a lot of photos in 1952.

Filter your data by date.

wpid1801-media_1415772593401.png

On the bottom graph, use the crosshairs to drag (slowly!) from 1940 to 1942. A blue box appears to indicate that you’re filtering your data by date. You’ll notice that the points on the map repopulate to correspond with the timespan. You can even select multiple spans of time and see them visualized simultaneously!

If you want to temporarily collapse your timeline so that you can see the map better, click on the downward-pointing arrow on the right of the timeline pane. To get rid of the date filter, click on the pink “x” next to the datespan above the graph.

Note: If you’re unable to “grab” your timeline in order to filter it, it may help to lengthen your browser window.

Add a facet to further refine your data.

wpid1812-media_1416797768383.png

You’ve now narrowed your data down to 1940–1942. Now let’s try filtering and visualizing your data using other attributes. We can do this with a Facet filter.

Click on the Facet tab. (You’ll probably want to compress your Timeline window by clicking on the downward-pointing arrow that appears on the upper right-hand corner of the pane.)

Click on the Dimensions menu.

Now select Genre 1, Topical Subject Heading 1, and Topical Subject Heading 2. (Actually, you can select whatever you want; I just think these are fun ones to try.)

Explore your facets.

wpid1813-media_1416797854183.png

Working from left to right, the facet dimensions gradually narrow down the data displayed on the map. For example, in the image above, the map will show where Cushman took landscape photographs that contain both trees and shrubs. (Only on the East Coast and Great Lakes! Wonder why.)

Try playing with some other facets and altering your timeline. Find any interesting relationships?

(You might wonder about the Timespan tab, which is greyed out when we use Palladio with our dataset. If our records had start dates and end dates, the timespan function would display those dates as “lifespans.” Take a look at this video for an explanation: https://vimeo.com/101672780.)

Explore your data as a gallery.

wpid1802-media_1415773502546.png

Maps are fun, but galleries can be useful, too, especially when you’re working with images. First, delete your time and facet filters by clicking on the tiny pink garbage can that appears at the lower right-hand corner of each pane. (You can also delete them by clicking on the pink X’s at the top of the filters pane.)

Now, click on the Gallery tab at the top of your window.

Change the categories your gallery displays.

wpid1803-media_1415773713790.png

So far, not very useful. Let’s change the categories your gallery is displaying. For Title, choose City and State. For Subtitle, choose Genre 1. For Text, choose Description from Notebook. For Link URL, choose PURL. For Image URL, choose Image URL. If you’d like, you can sort your gallery by Date.

(Actually, you can put whatever you want on these gallery cards, but these are some categories I think are interesting.)

Filter your gallery by date and other attributes.

wpid1804-media_1415774239529.png

You can filter your gallery in the same way that you filter your map. For example, in the above image, I’m looking at pictures taken in Chicago that contain both clouds and buildings.

View your data as a network diagram.

wpid1805-media_1415774406486.png

Network diagrams are good for showing the relationships among entities. Often, those entities are people or objects, but we can use subject headings as our entities, too.

To view your data as a network diagram, get rid of your filters and then click on Graph. (Palladio is using the term “Graph” the way computer scientists do, to mean exclusively a network graph.)

Set the parameters of your network diagram.

wpid1806-media_1415774737046.png

In order to create a network diagram, you need to tell Palladio which two attributes of your data you want to explore. For Source, choose Genre 1; for Target, choose Genre 2. Now you can see which genres tend to co-occur in Cushman’s photographs. You can click and drag the nodes (the circles) to explore your diagram.

To highlight one kind of node in order to distinguish between the two, click on the Highlight checkbox. To size nodes according to the number of objects they represent, click on the Size nodes checkbox.

And you can filter your diagram in the same way you filtered your map and gallery.

Share your work.

wpid1807-media_1415774914386.png

Unfortunately, you can’t embed interactive Palladio diagrams on webpages, but you can produce static images, either by taking a screenshot or clicking on the Download link, which allows you to download an svg file. An svg is an image, and you can post it or share it as you like.

Download your work

wpid1808-media_1415775046077.png

Palladio doesn’t save your data, but you can export your data model — the way you configured your data and upload it again later. This will save you the trouble of configuring your dataset the next time you want to work with it.

To do this, click on Download. This will download a file with the extension .json. The next time you use Palladio, you can upload this file (on the Palladio homepage) in order to open your project where you left off.

Other cool things Palladio can do

media_1447918883898.png

Palladio has some other cool capabilities we haven’t discussed here. The image above shows one that I like: the ability to use other georeferenced maps (in this case an old railroad map from the New York Public Library) as basemaps. Here’s a tutorial on how to do that.

Other cool things you can do with Palladio:

  • work with multiple tables of data, connected relationally
  • export lists of data using the same filtering mechanisms we used for visualizations
  • create point-to-point maps
  • visualize spans of time with the timespan feature

Finished? Awesome! Now is a good time to see if anyone else in the room needs a hand. While you’re looking for people to help out, see if you can answer the following questions by visualizing the dataset:

  • When did Cushman take the most photographs?
  • Where did Cushman take the most photographs?
  • Can you connect travels or photographs with events in Cushman’s life? You can read about him here.
  • When and where did Cushman take photographs of landscape features, like trees, clouds, and the sky?
  • When and where did Cushman tend to take photographs of people?
  • Can you map Cushman’s travels to a particular road or interstate highway? How would you do this?
  • What other information would you need to fully understand this data? How might you obtain that information?

And check out the way in which my undergraduate students used the Cushman dataset as the basis for their final project!

Here and There: Creating DH Community

Thanks a million to the University of North Texas’s Spencer Keralis for inviting me to come speak at Digital Frontiers, a great conference in Northern Texas! I’m having an excellent time. Here’s the talk I gave today.

Around springtime, when universities are making offers for jobs that start in the fall, I tend to get a few similar emails. I’m junior enough that I know a lot of people just leaving grad school (whether it’s library school, a Ph.D. program, or a master’s program) and as universities continue to build DH centers, these people are getting snapped up to help spark DH activity elsewhere. So around May, they’re emailing me (and probably a lot of other people, too) to ask: Where do I start? What do I need to know?

I’ve been frank, as you may know, about what I think of taking someone fresh out of grad school, giving her a temporary gig, and expecting her to be the sole torchbearer for some amorphous DH initiative. In brief, it’s a bad idea, for a lot of different reasons. It’s not fair to the person you’re hiring, who will spend her entire tenure trying desperately to impress you at this impossible task so she can keep her job. And it’s not fair to your university community, which deserves continuity, focus, and the attention of someone who cares about the big picture.

But a number of people have good gigs that involve an element of community-building. And there are also a lot of people who’ve been working in libraries or other units for some time and are newly tasked with the responsibility of building interest in and capacity for digital humanities on their campus.

So for awhile now, I’ve had a mental list of things that I tell my friends who are getting started on the job of starting a DH initiative on their campus. If at all possible, I try to do it over a drink. This work is not easy, and it’s very sensitive, and I’ve only learned what I know by making terrible mistakes.

So in a minute, I’ll give you that list of suggestions for building and sustaining a digital humanities community at a university. Continue reading “Here and There: Creating DH Community”

How Did They Make That? The Video!

After I wrote my original “How Did They Make That?” post, on some common types of DH projects, I got to thinking about whether there might be ways to help people reverse-engineer digital projects on their own. I used a talk I gave at CUNY as an excuse to think of some of these ways. This presentation, a modified version of that talk, is the result.

Special thanks to my all-star cast: Rachel Deblinger, Moya Bailey, and Elijah Meeks; and to Matt Gold at CUNY for inviting me to give the talk.

Incidentally, I propose a drinking game: whenever you see my tiny Skype avatar taking a sip of coffee, take a drink.

Erratum: The Negro Travelers’ Green Book is a project of the University of South Carolina Libraries, not the University of Southern California, as I keep saying. Also, just a note that while I focus on the mapping elements of that project, they’ve also done a beautiful job digitizing the book itself.

Reflections on my digital materiality and labor class

Group photo on top of One Wilshire.
DH150 on the roof of One Wilshire. Photo by Craig Dietrich.

I was really glad to get the chance to teach a special topics course on Digital Labor, Materiality, and Urban Space last quarter. I’ve been thinking about this class for years, and the syllabus is the (imperfect) culmination of lots and lots of reading and thinking.

In the event, the class was terrifically generative and fulfilling — for me, and, I hope, for the students. While the memory of the class is still fresh, I wanted to jot down a few notes about some new-ish (for me) elements I introduced into this class, and how well I thought they worked.

Continue reading “Reflections on my digital materiality and labor class”

How Did They Make That? at CUNY, March 27, 2014

Screen Shot 2014-03-27 at 4.19.59 PMHere’s a list of links for my talk at the CUNY graduate center, for the audience members who’d like to follow along:

My original “How Did They Make That?” post (with Dot Porter’s Zotero library!)

UCLA Digital Humanities 101

Ben Schmidt, A Year of Ships

University of South Carolina Digital Libraries, Negro Travelers’ Green Book Map

Radu Suciu, Medical Case Studies on Renaissance Melancholy

Kieran Healy, A Co-Citation Network for Philosophy

Rachel Deblinger, Memories/Motifs

Stephanie Evans and Moya Bailey, SWAG Diplomacy

Stanford University Library, Kindred Britain

 

Commit to DH people, not DH projects

We’ve seen digital humanities in terms of “projects” since Roberto Busa indexed Thomas Aquinas. But lately it seems to me that the imperative to continuously produce something is getting in the way of how people actually think and grow. What if we viewed digital methods as a contribution to the long arc of a scholar’s intellectual development, rather than tools we pick up in the service of an immediately tangible product? Perhaps we’d come up with better ways of investing in people’s long-term potential as scholars.

It’s natural for DH centers, especially newish ones, to want to spread the word about digital humanities. But increasingly I suspect that issuing a faculty call for projects is not the way to do it.

Continue reading “Commit to DH people, not DH projects”

Advanced Scroll Kit Techniques: The Parallax Effect

My Digital Labor, Urban Space, and Materiality class will be using the drag-and-drop framework Scroll Kit to create multimedia “device narratives.” Here’s the tutorial I’ve created to teach them to use Scroll Kit. You’re welcome to download these instructions as a PDF or as a Word document, in case you’d like to modify them.

This is my second Scroll Kit tutorial; the first covers Scroll Kit basics.

Continue reading “Advanced Scroll Kit Techniques: The Parallax Effect”