network of eight trains

Eight Trains depicts a man’s regular Tuesday journey to-and-from work in rural Japan and the various people he encounters. The story is told completely from his perspective: he leaves for work at 6:12 AM, takes 4 different trains to get to work, spends 5 or so hours at work, then takes the same 4 trains back home till he finally arrives at his house at 8:05 PM where he drinks coffee and smokes cigarettes. We don’t know much about the narrator, but we learn a few of his quirks and mannerisms as he lets us inside of his personal thoughts and introspections on the seemingly faceless people he comes across.

I had wanted to create a two-tiered system of nodes that had the narrator centered in the middle, with different nodes for the different locations extending outward from the center, and then different nodes for different characters extending outward from each location. But I couldn’t figure out how to do that on Google Fusion Tables, so I instead made two graphs to represent the same information.

screen-shot-2016-11-14-at-11-17-25-am

The graph above illustrates the different locations that the narrator frequents every Tuesday, including the 8 trains, his work, his home, and the Moka station platform. This graph clearly represents the story’s point of view: the narrator (and the reader) is on the inside, looking out at the world from different directions, taking different paths to get to each place. Train 1 and 2 have bigger nodes to represent more characters noticed and discussed at each location. However, this graph conceals the fact that train 1&8, 2&7, 3&6, and 4&5 are actually the same exact train but moving in opposite directions, and that each pair (more or less) has the same passengers on it.

screen-shot-2016-11-14-at-11-07-56-am

The graph above displays the different characters that the narrator sees during his day at all 10 different locations. While the previous graph could not depict that certain trains had the same passengers on it, this graph clearly can, seen in the connection between the vain schoolgirls on train 2 and 6, and the homeless man at the Moka station platform and in the narrator’s home.

While both graphs display the narrator’s whereabouts and the people situated around him, neither are able to convey his emotional environment (meant here to mean the ways in which he feels at each location due to the people that surround him). This limitation is a significant one because his changing sentiments throughout the day is the entire point of his story. For example, as he leaves work and moves toward his first train back home (and his fifth train of the day), he thinks “Returning is always sad […] To go is always to go somewhere; returning, you return to nowhere. That’s the way it is.” But his increasing boredom and lack of fascination by the strangers cannot be felt by looking at these nodes. Therefore, the graphs fail to illuminate the narrator’s true connections to the people around him.

mapping london’s past

This week I chose to explore the project, Locating London’s Past. The project allows users to search through a variety of records from six different databases in order to map different data types on five contrasting base maps: (1) a GIS compliant version of John Rocque’s 1746 map of London, (2) the 1869-80 Ordinary Survey map, (3) a modern day Google map, (4) a satellite view map, (5) and a blank map. Using different base maps to map the same data allows users to compare an eighteenth-century representation London to the first OS map and to current day Google Maps. Below are pictures of Google Maps, the OS map, and Rocque’s map.

screen-shot-2016-11-07-at-11-56-34-am

1869-1880 Model of London on Locating London's Past

John Rocque's 1746 Map of London on Locating London's Past

This project is unique because it provides users with empty maps, asking the users to map the data that they find relevant and important. To create my map, I searched through data from the Old Bailey Proceedings data set, which contain accounts of trials that took place at the Old Bailey courthouse. Users can find incredibly detailed and specific records from this data set, as the data types include the defendant’s home/crime location and gender, the victim’s gender, the offense category and subcategory, the verdict category and subcategory, the punishment category and subcategory, and the years that the case was on trial. I chose to look at records on those imprisoned for murder, and then I mapped the locations of these murders. Clicking on one hit (seen below in red) creates a pop-up that provides users with more information about the case and suggests links for further investigation. From here, users can actually add more data on top of the existing map, so I chose to add population densities (seen in green) to view the amount of murders relative to how many people were living in each area.

screen-shot-2016-11-07-at-11-15-15-am

In Maps are Territories, David Turnbull says “a map is always selective […] the mapmaker determines what is, and equally importantly, what is not included in the representation.” This idea that maps are inherently subjective is especially true for this particular project, because a user’s final map reflects not only what data he or she found important, but also which records and which data sets the workers on this project found important AND what data types recorders in seventeenth and eighteenth century London found important to document in the first place. My final map, therefore, is a culmination of deliberate decisions from 3 parties, all completely separated by time and space, of what should and should not have been included in a representation of London’s past. All possible maps consequently reflect the values and ontologies of governing bodies and archaeologists, of the scholars who built the website, and of the users themselves. In my map pictured above, for example, the darker green areas indicate larger population sizes. The statistics on these population numbers come from the Bills of Mortality, or burial records. Clearly, the workers on this project found old burial records to be a reasonable, or at least the best known measure of population numbers. Maybe this is because, from their perspective, every burial = death, so every death = burial. Or maybe its because, in 17th and 18th century London, every single dead person was buried. But for cultures today that participate in, say, cremation, a death would not automatically = burial, and their differing ontologies would create a disconnect for the project’s mapping abilities.

 

Data Visualization: Poverty Statistics

I explored the data set on poverty statistics, found here, which details the birth and death rates, infant morality rates, life expectancy rates of men and women, and GNP of 97 countries in 1990. One of the first things I noticed was that each country had been assigned a specific “region,” indicated by a number 1-6. Eastern European countries, such as Albania and Romania, were assigned region 1. South American countries, such as Brazil and Columbia were assigned region 2. Region 3 was compiled with mostly Western European countries such as France and Germany, but also included, interestingly, North America (U.S.A and Canada) and Japan. Middle Eastern countries such as Turkey and Israel were assigned region 4, and Asian countries, excluding Japan, made up region 5. Lastly, region 6 contained African countries, such as Kenya and Uganda.

I predicted that regions 3, 4, and 5, which described Western Europe, the Middle East, Asia, and North America, would likely have the highest GNP, while regions 1,2, and 6, which described Eastern Europe, South America, and Africa, would likely have the lowest GNP. I made many different visualizations to show the GNP of each region, but ultimately decided to use Raw to create a scatter plot. While all of the other visualizations I made were “cooler” looking, I chose this one because Nathan Yau said it was most important to choose a visualization that had the right visual cue. In the plot below, the GNP is located on the y-axis and the regions are located along the x-axis.

screen-shot-2016-10-24-at-8-57-10-pm

After seeing this, I took it a step further and estimated that the countries with the highest GNP (regions 3,4,5), likely had the highest life expectancy rates, and that countries with the lowest GNP (regions 1,2,6), likely had the highest birth, death, and infant morality rates. I used Google Fusion Tables to create data visualizations to see if my predictions were correct.

screen-shot-2016-10-24-at-9-15-27-pm

The graph above shows the average birth rates, death rates, and infant mortality rates across the regions, with the average rates located on the y-axis and the regions located along the x-axis. The visualization shows that the region 6, which contains the countries with the lowest GNPs, clearly has a higher average infant mortality rate, and a considerably larger average birthrate, but does not have a notably larger average death rate. In fact, the regions do not vary much in average death rate. Region 5, which has the 3rd highest average GNP, actually has some of highest birth, death, and infant morality rates, which I was not expecting.

screen-shot-2016-10-24-at-9-17-37-pm

The graph above shows average male and female life expectancies across regions, with the regions located on the y-axis and the ages located along the x-axis. This graph also makes region 6 stand out, but this time for its low life expectancies. Region 3 has a noticeably higher life expectancy than the rest of the regions, but isn’t too far ahead of region 1. This also surprised me, because region 1 has the second lowest GNP.

While looking at the data set, I assumed that I would be able to guess which countries had which rates with fairly high accuracy, but after looking at the data visualizations I can see that the lines are not so clearly drawn. It is clear, however, that in 1990, those living in Western Europe, North America, and Japan had much higher life expectancy rates and far lower death, birth, and infant mortality rates than those in other countries, while those living in Africa had almost the exact opposite.

Data Analysis: Gender Breakdown by Department

Gender Breakdown of City Works by Department documents the percentage of male and female full-time employees in 2015 across the various Departments of Los Angeles, including city planning, fire, and sub-departments of public works, such as engineering and sanitation. The data set also reports the employee count and total payroll per department, the number of males and females in each department, and what percentage of the department are male and female. Additionally, the information also breaks down the male and female total salary within departments, the average salaries of males and females within departments, and the percent of the payroll given to males and given to females.

This dataset was created by the Los Angeles City Controller’s Office. I believe Wallack and Srinivasan would identify this dataset’s ontology as a comparison between employee gender and salary within and between government departments. This data set is very easy to navigate, and theres a tool guide that allows viewers to make data visualizations for even easier juxtaposition and comparison.

screen-shot-2016-10-17-at-10-12-20-am

The line graph above, for example, shows average female salary in navy and average male salary in orange across the various departments. This data is very straightforward: on average, men make more money than women in 37 (out of the 40) departments, with women making more only in the Library, Recreation and Parks, and Public Works – Street Lighting Departments.

On the ground level, grassroots coalitions and social justice organizations, particularly feminist advocacy groups, would find this data very useful. Pulling up these statistics could have a big impact on arguments for women’s rights or affirmative action. Seeing as though Los Angeles is one of the most liberal and diverse major cities in California and in the entirety of the Unites States, one could use these numbers to argue that there are still mass inequalities in the workforce today. At a higher level, this ontology also makes sense for policy makers and those in the City Planning and City Ethics Commission Departments who: (1) (hopefully) want equal and just opportunities for women, and (2) want to appear as though they are working towards equal and just opportunities for women.

While the numbers state the “what” in this gender breakdown, there is no “why” to explain the reasons behind them. In the fire department, for example, 92.8% of the full-time employees were male whereas only 7.2% were female. I assume this disparity has less to do with discrimination and more to do with the fact that less women want to be firefighters. Nevertheless, this could certainly lead to further social science analyses to explain this kind of information that has been left out of the data set.

If I were to start over with data-collection, I would attempt to describe the ontology of higher rates of males in leadership positions than females. In the current data set, in the City Administrative Officer Department, almost 70% of the employees are female, and yet the average female salary is about $34,000.00 less than the average male salary. This is (also hopefully) because males hold most of the leadership/managerial roles than females in this department, and not because males are making more money for the same work. By including columns stating how many males/females in each department hold leadership positions, and how many males/females in each department make over/under $50,000.00, the spreadsheet could produce different narratives based on a different ontology described by the data.

Exploring the George Meyer Simpsons Script Files

This week I chose to explore the Finding Aid for the George Meyer Simpsons script files, which is held at Charles E. Young Research Library in UCLA’s Performing Arts Special Collections. The finding aid details the organization of the drafts of scripts for seasons two through six of The Simpsons penned by George Meyer, writer and producer of the longest running animated television series. The 78 boxes also contain script annotations and other story notes recorded by Meyer.

The records in this inventory have been gathered by archivists in the hopes of documenting all the bits and pieces of available, relevant, and recorded data that describe the initial visions behind the second to sixth seasons of The Simpsons. The finding aid describes the pieces of data, but without a narrative to string the data together, we are left without any sort of storyline to this history. Given an extensive biography of George Meyer in this finding aid, and seeing as though the collection documents the man who wrote the show more so than the show itself, one would be able to imagine a narrative of a Harvard graduate who found comfort in writing for comedy television shows, switching from show to show and building up a large network of people “in the business” until he finally landed a writer/producer spot on The Simpsons. While the documents themselves are neutral in their nature, their inclusion in this collection was deliberately decided as “relevant” by the scholars who added them to the collection in the first place. This bias, along with the potentially varying interpretations of the documents, could lead to different imagined narratives by viewers.

It would be difficult for someone who had no prior knowledge of the television show to come up with a narrative based solely on the records in this collection. First, the scripts are organized alphabetically in boxes based on script titles. This makes it much harder to chronicle any sort of linear timeline for a narrative. A sequential arrangement of files would also have provided insight into Meyer’s personal growth as a writer and comedian, since the collection attempts to describe him through his scripts in the first place. While the records do list the date that the documents were written, grouping the files according to a timeline rather than alphabetically would allow scholars an easier and more efficient way of searching and examining the collection. Second, the descriptions of files in this finding aid mostly consist of the title of the script, the date it was written, and the author(s) of the work. Without additional notes or categories of the files (i.e. types of comedy, pop-culture references, etc.), it is much harder to analyze the data.

screen-shot-2016-10-10-at-1-34-25-pm

Deconstruction of Early African American Film

This week, I chose to deconstruct the DH project, Early African American Film. The database describes and discusses the history of silent race films from a vast gathering of information drawn from both primary and secondary sources. The team defined a silent race film as one that was made up of African America cast members, was produced by an independent, African American owned production company, and was advertised as a race film in the press. The project details information not only on the films, but also on the actors, directors, production companies, and paraphernalia created by the race film industry, including posters and theater programs. The team that worked on this database intended to demonstrate the craft behind African American silent filmmaking while also providing insight into the community as a whole in order to promote awareness on this era of film history.

The project only included data on films created between 1909 and 1930 and that were intended for African American audiences. The data was gathered from various African American film collections and archives located in the United States that both documented the history and culture of African American film traditions and contained rare moving images and recordings of silent reels. Additionally, the team also amassed data from scholarly essays and texts that traced the emergence of African American filmmaking and analyzed the filmography and role of race films in the early 1900s.

The team assembled a database of all of the people that were associated with the silent race film industry. As the connections between the people grew stronger and more complex, the team began to process the data using visualizations that (1) exhibited the number of silent race films and the year that they premiered, (2) showed a network of people associated with the films, including actors, directors, writers, and others that were somehow involved in the industry, and (3) displayed the locations of African American production companies and the year in which they were founded.

screen-shot-2016-10-03-at-1-40-33-pm
The data that detailed the number of race films was presented as a histogram chart, showing that the peak of race film production (51 documented premieres) took place in 1921.

screen-shot-2016-10-03-at-1-43-39-pm

The network of people in the silent race film industry was displayed in two different diagrams. Connections were shown by an edge if two people worked together on at least one film, and were darker if they worked together more than once. Nodes, which signified individual people, were larger if one person had many connections with other people in the industry. The second diagram indicated the ways in which people were connected, displaying which film connected any two people together.

screen-shot-2016-10-03-at-1-44-17-pm

screen-shot-2016-10-03-at-1-44-23-pm

The data that presented the locations of the African American production companies was exhibited in one moving and one still time map. In the moving time map, production companies appear as pulsing dots as the time key moves year-by-year in their geographical location on a map of the United States, while the still map is one image that shows the expansions of the companies.

eaa-time-map-768x566