Course blog

CAPT[A]VATING INTERPRETATION WITH A COVER

Credit: invisibleaustralians.org
Credit: invisibleaustralians.org

Don’t judge a book by its cover.

This platitude resurfaces more and more with covers such as the one above.  The title of the “experimental browser” evokes a notion of  what if: “the real face of white australia.”  That is, what if the history of an established (super)nation were to be exposed and recreated with data mining and excavation from the past that tells a different story of its people?

I will return to the humanistic interpretation in a few, but first, I would like to discuss the visual displayed.  The visualization of peoples’ faces we see are a production employing a face detection script.  According to Tim Sherratt, the collection was compiled from a harvest of 7,000 photographs ready for the application to get to work.  Current applications such as, iPhoto and Google, perform, and simplify, photograph face detection scripts.  For Sherratt, the experiment of manually connecting the dots and displaying faces fascinated him for reasons other than just providing data in a quantitative manifestation: “you look at their faces and you simply want to know more. Who are they? What were their lives like?”

So what does the visual do for us at first glance?  It triggers a question: who are they?  And, what does Australia have to do with it?

Well, as we know, nations were once before inhabited by natives who do not necessarily look like the majority of natives now.  By showcasing the different face from what we are accustomed to seeing, we generate inquires about the nation’s past.  Judging from the website’s cover, we could see the ethnic diversity ranging from Indonesians, Philippines, Papua New Guinea, Asia proper, and other neighboring ethnicities.  These faces do not exemplify exclusive ties to one nation.  The faces are portraits of hybridity.  Which takes me to this next cover:

Credit: chandraprasad.com
Credit: chandraprasad.com

The cover above came to mind as faces seemed to be dominate the space.  Chandra Prasad’s anthology showcases an array of fictional writers whom depict themes, characters, and notions of hybridity identification.  Works from one of my favorite authors, Ruth Ozeki, beautifully tells the story of a character who identifies as a being from two races: white and oriental.  Now, narratives aside, the covers of both images I put forth mirror one another, however, one is used as an instrumental tool incorporating data/capta, while the other employs the cover solely to highlight and trigger interpretation.  Like I mentioned before, the former cover still evokes inquires that relate to interpretation solely based on the surface of its cover.  Whether or not one cover is better than the other is not the basis of debate or inquiry.  Rather, the point is to further illuminate how both fields are working together, humanities and sciences, to further display visuals for different purposes, yet still produce similar ways of interpretations.

The Real Face of White Australia and Ancestry.com

Screen Shot 2014-11-02 at 5.32.38 PM

When I attempt to explain digital humanities to my friends who ask about it, I simply say it is learning how to bring the humanities into the modern, digital world.  I tell them it is a way to represent history, literature, and art, among other things, on digital platforms like websites.  “The real face of white Australia” was a great example of a digital humanities project I might show my friends as an example because it was clearly presented and brought to light a part of history that is not commonly known.

Before exploring this website, I had always thought of a country that was home to Aborigines, much like our Native Americans, and then settled by white, British settlers. I had thought there must have been some cultural diversity in Australia of course, but it never occurred to me that there would be a large Asian population, despite knowing how close it is to Asia. One of the things that makes this presentation so interesting and compelling is that instead of presenting merely the statistics about what races were living in Australia, the images of the documents were presented.  Being able to click on and see an actual document from so long ago is something really special about digital humanities projects.

Another website that I believe represents the field of digital humanities well is one of my favorite examples to give people when I explain the field of digital humanities. This well known website is Ancestry.com. Much like the historian researched the historical records in Australia to learn about the diversity, I have been able to research my own family history through records on Ancestry.com. I did a free trial of Ancestry.com today to test it out and was very impressed by all the information I could see after just entering my email.  For example, I found my grandfather and his family’s record in a census pictured below.  Having the image of the handwriting really brings the history to life.  It would have been very cool if pictures were also included on Ancestry.com, because I am sure that is possible after seeing how they were able to use pictures on “The Real Face of White Australia”.

Projects like these are some of my favorite because I am so interested in history, but also appreciate how digital humanities can benefit the public when shared on the internet, and not just the people that study the specific topic.  With digital humanities, anyone who is even slightly interested in the topic can benefit from the project.

Can digital humanities change history?

Andrew Smith’s article “The Promise of Digital Humanities” starts with a doubtful question: Sure, data mining through the machine analysis of text can potentially close the gap between humanities and hard sciences by “allowing us to subject historical texts to quantitative analysis”; but can it actually extract impactful information? The criticism roots from the large amount of investments that have been made in digital humanities technology that make the data mining possible, and the unsurprising, “we already knew that” results of the researches done in multiple universities. However, among the less-than-ordinary findings, some research projects do manage to find information that “fundamentally undermines the scholarly consensus about a particular history topic”, and William Turkel’s project on Data Mining with Criminal Intent was just that.

This umbrella project that required them to put in 127 million words into the database for data mining included The Old Bailey project, for which they digitized and transcribed records of 198,000 trials between 1675 and 1913 that took place in The Old Bailey, the central criminal court in London. The result was a surprising pattern that history had never figured out:

  • There was an unusual increase in the number of guilty pleas and very short trials since 1825 (By 1850, one-third of all cases involved guilty pleas)
  • In the 1700s there were nearly equal numbers of male and female defendants, but in 1800s men outnumbered women by nearly 10 to 1

And these findings contradicts the general historical understanding that mid-1700s was “the turning point in the development of the modern adversarial system of justice in England and Colonial America, with defense lawyers and prosecutors facing off in court”.

As Turkel’s project is a supporting evidence of digital humanities researchers’ hypothesis that data mining through machine text analysis is the key to digging through the history again for more data-backed findings, my conclusion is that these projects are when the fundamental basics of research plays a critical role: creating a hypothesis that provides value to scholars and our society, doing an extensive secondary research for the topic, thesis, and its possible outcomes, creating a specific guideline on how to efficiently perform the research project to minimize the negative impact i.e. time, effort, and especially the funding, if the hypothesis of the project is proven wrong, etc. The growing field of digital humanities need more set examples like The Old Bailey project to receive more attention and funding to support them, and each projects can really make a huge impact on the field.

 

References:

Humanities Approaches to Graphical Display

For many years now, data has been placed into simplistic visualizations for its readers to view. The problem with this: the word simplistic. Although it is useful for readers and viewers to be able to see a statistical data set in just a chart, it leaves a lot of things out. There is no way that a massive, complex research project can be boiled down into just a few charts or pie graphs. And here lies the problem with visualized data – it’s misleading.

375x247xterry-schiavo-misleading-graph.jpg.pagespeed.ic.aetQIotsuy

 

Even if it doesn’t have to do with research or complexity, graphs are also often misleading in terms of public opinion or statistics regarding our country. For example, this graph displaying how many Republicans, Democrats and Independents agreed with a court ruling. In the graph, the bar for the Democrats appears to be about three times larger than the bar for the Republicans and the Democrats. This makes it seem that three times more Democrats supported the decision than either the Republicans or Democrats. However, this is not the case at all because the left axis doesn’t start at zero. Instead, it only goes from 50 to 64 percent, therefore making any deviation seem much larger. Had the graph’s axis begun at zero, the difference in support would appear to be much smaller. This is what I mean by data misrepresentation – people who make the graphs can alter them to have them display very different meanings that the actual data represents.

While this may not seem like a large problem in society, in can turn out to be. When reliable news stations release media, people just take them without questioning it when it may be completely misrepresented. However, most people in this day and age don’t question what they are being told – they just accept it. Especially if it comes from the government or a source that they have relied on and trusted for a very long time. With new technology it is important that people actually pay attention to what they are looking at.

 

“Misleading Graphs: Real Life Examples.” Statistics How To. N.p., n.d. Web. 02 Nov. 2014.

Drucker, Johanna. “Humanities Approaches to Graphical Display.” DHQ: Digital Humanities Quarterly:. Digital Humanities Quarterly, n.d. Web. 03 Nov. 2014.

Week 5–Invisible Australians

This week’s reading features a web-based data visualization that accompanies the project, Invisible Australians. A research collaboration instigated between Dr. Kate Bagnall and Dr. Tim Sherratt, Invisible Australians was created to identify and reveal the true face of the so-called White Australia during the early 20th century. During this time in White Australia, thousands of non-Europeans residing in the country faced discriminatory laws and policies that denied their rights as Australians. Although shunned and marginalized as a Australian minorities, these non-Europeans, including Chinese, Indians, Japanese, Syrians, and Malyans, were ironically well-documented through government records. Bagnall and Sherratt have taken advantage of these extensive records in order to develop a database intended to commemorate and identify the thousands of non-white Australians who made up the true face of “White Australia”.

While browsing The Real Face of White Australia, I also kept in mind Johanna Drucker’s distinction between capta and data. She emphasizes the need for humanists to utilize conceptual tools like capta and stray away from tools and methods developed from fundamentally epistemological disciplines. This distinction is definitely a core concept to understand when approaching any kind of archived information. When data is presented in a way to prove a certain opinion or thesis, this is essentially converting this hard, cold data into capta. Capta, in Drucker’s definition, carries a constructed interpretation of the data it came from. The Real Face of White Australia is a great example of humanizing data and presenting it in a way to prove a point. The simple composition of “the real face of white australia” at the header of the browser accompanied by a grid layout of all the collected images of these discriminated Australians is for the most part a self-explanatory. A visitor first sees the header and is immediately gratified with Bagnall and Sherratt’s findings. With the documented identities that Bagnall and Sheratt have found, they are ultimately imposing the idea that the true face of Australia was formed by the non-white residents of that time.

Drucker’s distinction between capta and data is a definite step in the right direction for how people typically analyze so-called data visualizations. I feel that we have come to a point where we give the term data too much authority. Too many times have we trusted a visualization that claims to be based on a found set of data, only to later find that certain kinds of data was omitted, duplicated, etc. We need to realize that data is extremely vulnerable to be skewed and constructed to fit the data miner’s own opinionated agenda or perspective.

One project that came to mind was photographer Giles Revell and graphic designer Matt Wiley’s collaborative project called “Photofit: Self-Portraits”. Using a now outdated and disregarded Penry Facial photofit kit from the 1970s which were used for constructing police sketches of suspects, the creative duo called upon a number of test subjects to use the kit to compose their own indetikit image purely by memory. These participants  put together the tactile components of the kit which include paper strips of various facial features. The result of these participant interviews and Photofit assemblies reveal the complexity of these participants’ relationship with their self-identity suggesting that their distorted compositions of themselves show more about the subject’s personality than a straight-forward photograph would.

Here the photograph could be considered as the straight-forward data and the test subjects’ photofit compositions of themselves are the capta. Each participants’ own personal perspective about their appearances converts the data of their facial topography into the capta of what they view themselves as.

http://gilesrevell.com/files/photofit2.pdf

Week 5: The Importance of Presentation for Text Analysis

Screen Shot 2014-11-03 at 1.12.12 PM

In regards to to “The Real Face of White Australia” project by Tim Sherratt, I saw a few of my fellow classmates discuss categorization, and how it includes not only ideas, places or things, but also people. This is an aspect that I also find incredibly interesting, but decided for this particular blog post to focus on a different aspect: the power and importance in choosing the right format to present your data, especially text analysis.

Upon clicking the link to Sherratt’s site, a collage of faces appear with the title of the project as the header text. When you click on a photo, you can see the form for each individual that immigrated to Australia. The site also includes only two exhibits, ‘home’ and ‘about’. After scrolling and navigating the site even more, I concluded that this was an incredibly powerful way to present all of this document information. What if the creator of this site had simply listed the immigrants’ names, and you had to click on each name to see each photo? For the most part, I think people can agree that another format for this site would not have been as effective. It would not have been as engaging for the user. And the creator chose this format for specific reasons.

This made me realize just how thoughtful of a process it is when creating the “face” of your project, the presentation. Especially for a digital humanities project on text analysis, text can be a turn-off for readers, so how can we keep them interested? These are things all of us working on a digital humanities project have to consider. We have to go through the pros and cons of what to include on our sites, and take ourselves away from the project to consider what may be clear to the creators, but not be clear to our users.

“The Real Face of White Australia” was very engaging and it definitely kept me interested. In fact, it made me want to learn more. Now I can assume that Sherratt wanted to keep the site at its bare minimum where it gave me the information I would need, and would leave me wanting to learn more. At the same time, as a user, I wish there was more on the site so that I didn’t have to click on more links to learn more or search for more information myself. I wish there was more information or maybe even another form of categorization on the site that would allow me to filter and see specifically where each immigrant was from.

I guess this just puts into perspective that the person creating this is doing it for a greater purpose, especially if the information is being published online publicly. It is important to keep in mind the user and help them understand what the project is about and for what purpose, give them the information they may need to fully understand. Especially if they know nothing about the subject matter.

Week 5: Visual Connections

I have always been a visual learner. Flash cards with bright colors and images were not only beneficial for my academic pursuits, but were an outlet for me to creatively express myself through my studies. I pursued my artistic abilities early in high school through the outlets of photography and music, further extending myself into the realm of the arts, while pushing father and father away from the left side of my brain. In reading Humanities Approaches to Graphical Display, I was reminded of the growing disparity between humanities scholars and the “scientific world.” Although it seems that collaboration between the two fields could lead to greater findings and realizations, the fact of the matter is, the learning styles and ontologies between these fields are so different because of a fundamental dissimilarity in the basics of understanding capta. The creation of data visualizations attempt to bridge this gap and provide a balance of art and algorithm so both end-users find this data as helpful tools in scholarly research.

Constructing data visualizations allows individuals to interpret data in an alternative form that has the potential to heavily influence its users. Data is unbiased in nature, however, the inherent assumptions humanities scholars have when constructing these graphs and visualizations weigh a considerable bias on the interpretation of the module. Regardless of the unavoidable biases, data visualizations help to create more context and evidence for a multitude of purposes from historical studies to competitive analysis to even determining the factors that influence and attract individuals to certain food truck locations (the latter is a real example from our final project).

After looking at the various visualizations provided in this week’s readings, I thought back to a friend of mine who is currently a student at Stanford in the Product Design & Engineering school, as well as an active member of the Design for America program. Her website promotes her unique thinking process, her design skills, her marketing and communication skills, her successful work experience, and of course, her charming personality. I selected a segment of her website that highlights her design thinking and saw a variety of clean and aesthetically pleasing visualizations that she had created for a project about the homeless in California.Screen Shot 2014-11-03 at 1.06.56 PM Screen Shot 2014-11-03 at 1.09.22 PMScreen Shot 2014-11-03 at 1.07.13 PMNot only has she created visualizations for various projects and job experiences, but also, she creates logos, pamphlets and other designs for her other extracurriculars. I am so beyond impressed with Katie’s ability to communicate with people so clearly and effectively through the medium of data visualizations and design, and she gives me hope that the world of humanities will continue to become a more prominent and important part of our society.

 

Datamining and Criminal Intent Project

  

When looking up court cases it can get very confusing and overwhelming when trying to find the right case. I can relate as a political science major myself when looking up cases it is nice to have a concrete reliable website that can help guide you to specific supporting cases. This weeks reading guided me to a website that does just this. 

     This website titled criminalintent.org goes through a tutorial of a new project using Datamining to bring together three online resources known as Old Bailey Onine, Zotero, and TAPoP. Old Bailey is an online resource that uses controlled vocabulary to search through 127 million words of trial accounts. Users can query this website through a dedicated API. Zotero is an information management tool and TAPoR uses analytical tools like Voyeur. 

  When I came across this word datamining I thought it could have something to do with sorting data in a deeper more analytical way. I decided to look up the definition and landed on a website by the UCLA Anderson School of Management. It stated that data mining, “is a process of analyzing data from different perspectives and summarizing it into useful information.” They describe it as a way to look at data through different dimensions and angles to find patterns and trends. 

   The Criminal Intent website is very unique because it does just this. This new user friendly website allows users to easily look up information of any case by keyword, gender, date range, verdict, crime, etc. The website will automatically filter the results based on your selections and tell you how many cases were found and how many, “hits” of the terms you selected. 

  What I found interesting about this website is how I can connect this to my International Law course I am taking at UCLA. For me this is where it started getting confusing and less specific. For example, if I wanted to look up something to do with international law in the keyword text box it would separately search for international and then law. This is when I learned the website is directed more at users looking for only cases pertaining to London between 1674 and 1913. However, as I continued playing with the search engine on the Old Bailey API website I was able to create a new Voyeur Tools corpus from the result set of female defendants by clicking on “send to Voyeur”. 

  This was the most fun part of my adventure through this website. There are about 20 different graphical tools available in Voyeur to visual your data. The default is, Cirrus (a word cloud visualization), Summary (an overview of the corpus including word counts and aggregate trends) and Reader (a scalable text reader that can be used to scroll very large documents). You can change or add more visuals to sort your data. IT is possible to export a corpus by clicking on the “skin export” icon and choosing a skin builder. This is a very interactive and personal way of sorting case data.

 

http://www.oldbaileyonline.org/obapi/

http://criminalintent.org/getting-started/

http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm

Week 5: Infographics and Data Visualizations

“Infographics of infographics of infographics…”

As with many other new digital humanities minors, I found that I had struggled in the beginning to give a precise definition about what the subject entails to other students. Yet in Smith’s blog post, it seems like there is one general consensus about the greatest potential in the field, which is the analysis of text. Furthermore, I realized that often times I had also used this as an example of using digital tools to investigate and present data in a different way for clarity and research purposes.

Because this is a relatively new field of academic scholarship, there is bound to be projects based on utilizing these digital tools for the sake of analyzing evidential texts to prove the obvious. I believe that some critics are too harsh and should give room for the field to grow and time to flesh out creative ways to approach, mine, and present data. As a proponent for a more humanities approach to graphical display, Johanna Drucker raises awareness about approaching data differently as a capta, which is “taken” rather than “given.” Thus, the former allows for a more interpretive aspect to making visualizations whereas data that is given is recorded and observed in a more fixated manner.

In the article, the differences between figures 16 and 17 are not only found in the purposes of those charts, but also in the level of complexity and what they’re set out to measure. Drucker illustrates that there is a lot more information visualization can display as soon as one uses a more humanistic approach to analyzing the capta. It is important to address multiple questions when attacking an issue and analyze the data through all sorts of dimensions such as time and space as well as other categories such as gender and age.

These articles reminded me of another popular type of visualization, called infographics, which are making its rounds in social media and deployed as a marketing tool. In this blog post, Vincey adamantly criticizes the confusion that the public seems to have with using infographics interchangeably as data visualization. The blog makes some interesting points, especially about the fact that some infographics try to pile too much information that emphasizes qualitative data over quantitative data that its message gets lost in the chaos. Although I do agree with some of his opinions, the bottom half of the post seems to be arguing for more simplicity and clarity when compiling data. This alludes to the idea that there is a fuzzy boundary between what is too much, too little, or just the right amount of qualitative information one can display on a data visualization.

Works Cited: http://insights.qunb.com/why-we-hate-infographics-and-why-you-should/

http://pastspeaks.com/2011/08/21/the-promise-of-digital-humanities/

http://digitalhumanities.org/dhq/vol/5/1/000091/000091.html

The National Sex Offenders Registry Database

Primary Source: familywatchdog.us/default.asp

Reading through the Criminal Intent and Old Bailey projects got me interested in the documentation of crime. The nature of the crime, verdict in the trial, or even the behavior of the criminal does not matter, anyone who committed a crime between 1674 and 1913 who was tried at the Old Bailey Courthouse is included in the database. Take the case of Thomas Poddy, who in 1710 was tried for “intent to sodomize” another man. Although the evidence against Poddy was not strong enough for conviction, he is included in the database. However, this cannot really harm Poddy’s life, as he dies many years ago and the availability of this information cannot affect his life in a negative way.

This got me thinking about modern criminal registries. Not only do we have newspaper police logs which are usually available online, we have nationwide criminal registries. In comparison to the Old Bailey, many of these do not include details of the crime committed, which can lead to misconceptions about the criminals one may read about online. I searched for the National Sex Offender registry and was directed to familywatchdog.us/default.asp, a “free service to help you locate sex offenders in your neighborhood”. I plugged in UCLA’s zipcode, 90024. There are two ways you can view the results, on a map which shows the addresses of sex offenders and other types of criminals with boxes which are color coded to indicate the type of crime committed. There are also white boxes, indicating schools and playgrounds, and if you click these you can see offenders who live within 1000 feet of this location and offenders within half a mile. You can also view the results as a list, which gives the name and current address of the offender, with those who are closest to your designated location at the top. There are 8 offenders and 16 non-mappable offenders, which means there is no full address available for the offender, or the map does not recognize the address. Clicking on the list I noticed there are 2 offenders living in 90024, and one who actually has “UCLA” included in his address. H0wever, the type of crime or date of crime is not available for these offenders, anyone who committed a sex crime after 1997 is included in the database, which can lead to unnecessary panic if you see that a sex offender is registered in your area.

In comparison, I searched for my home zip code, 03458, corresponding to my home town in New Hampshire. There are six sex offenders in my neighborhood, and one of them is actually my neighbor! Although of course there are more sex offenders living in Los Angeles, they are simply random names in a sea of thousands of apartment buildings and houses.  Seeing the names of people you know in the sex offender registry is extremely frightening. Although my neighbor’s crime could have occurred long ago and might well not have anything to do with pedophelia or violent assault, it does make me slightly apprehensive about this man. Although I should let his personality speak for itself, seeing him on a sex offender registry changes my opinion about him, and I probably won’t be so keen if my parents want to invite him over for dinner this Thanksgiving.