Course blog

Data Mining Pros and Cons

In “The Promise of Digital Humanities”, Andrew Smith writes about the rise of digital humanities as a field due to the increasing popularity of the topic of machine analysis of text. With data mining in the picture, we can potentially and rapidly assess and cycle through hundreds of thousands and even millions of words or texts in order to arrive at an unprecedented conclusion. However, Smith does point out that all these investments in digital humanities technology have not really produced much enlightening or eye-opening results. Instead, many endeavors of data mining result in conclusions that state the obvious. I thought that this was valid point to consider because we are so ready to believe that we can achieve so many new things with the rise of technological prowess. Even though, with technological advancements, we are able to do things faster and more efficiently, we might not arrive at new, profound information all the time. It might just be that sometimes we can arrive at previously known information or analysis at a faster rate. I thought that this was important to remember, especially as students in digital humanities attempting research projects, because we might be tempted to make conclusions about our projects that are obvious. As we are learning to use new digital humanities technologies, we might be quick to render our findings as new and exciting when in fact it is possible that these new technologies make us ignorant or blind to the past conclusions that were based on years of thorough research and analysis.

http://cacm.acm.org/magazines/2014/6/175163-the-power-of-social-media-analytics/abstract

With that warning aside, data mining is potentially a very powerful tool not only for historians to gain new understandings of the past but for businesses to gain new understandings of the market and services as well. Synthesio is a social-media monitoring and research company that tracks customer opinions and posts on travel websites. Synthesio created a data mining tool to analyze the online reputations of thousands of hotels. By pin-pointing key words that attributed to negative reviews and feedback, Synthesio was able to help companies realize what they needed to improve on. For example, many customers for a specific hotel were unsatisfied because their room keys were being demagnetized by their smartphones. Synthesio was able to capture this information and relay it to the company. Then, the hotel was able to address the problem by dealing with the room-key supplier. After fixing the problem, the hotel saw a growth in positive feedback from online posts by customers. I thought that this was such an ingenious example of data mining being used to improve services for businesses. This shows us that data mining and digital humanities technologies can indeed advance our understanding of the past and improve the way businesses are run in the real world.

Turning a Blind Eye to Racial Categorization

none-of-the-above-428x181[1]

The categorization of race has always been a controversial topic throughout history. The website titled “The Real Face of White Australia” highlights one of the instances of controversy pertaining to the policies put in place in the early 20th century that essentially excluded those who were not white from Australia. This excluded those who had lived in Australia for the entirety of their lives, just because of their families’ origins and the color of their skin. They “found themselves at odds with the nations’ claim to be white” and were confronted with discriminatory laws preventing them from leading their normal (or at least what used to be) Australian lives. For those confined by this “White Australia” that had never lived anywhere else, this seemed rash and unjust as they saw themselves as a part of the country just as much as the next person. This specific issue parallels the discrepancy between perceived and personal race identification both in the past as well as today.

In this article that I recently read (link), the history of the process of collecting census data is outlined. For nearly 200 years, before mail-in survey methods were used, the government collected data by sending out a government representative to evaluate households based on various categories (including race). The article states that the government workers, or “census enumerators” did not let the people characterize themselves. Instead, it was based on appearance and ultimately was determined by the census enumerator. This completely segmented race into a matter of appearance rather than identity, and brought up the same question seen within “The Real Face of White Australia” website of what the idea of race actually means at its core. When mail-in surveys became the method of collecting census data, the number of people identifying themselves as certain races drastically changed. This highlights the difference between personal identification and initial misclassification that has been occurring around the world for ages.

This error in classifying someone’s race solely based on his or her appearance can also be a problem for any large database management system. Databases cannot include personal thoughts or identification in their analysis of someone’s face or body without being given the appropriate additional information. When sorting solely through images, it may resort to placing people into different buckets based on physical features or attributes that it finds to be common or similar. While this may be appropriate in some settings, taking into account a person’s personal outlook and opinion is an aspect that can be easily overlooked. In the article “Humanities Approaches to Graphical Display”, Johanna Drucker touches on the concept of “humanistic interpretation” relating to the expression of digital information and the need for “a co-dependent relation between observer and experience”. I believe this directly applies to the idea of racial categorization and that there is an evident need to analyze all relevant potential factors before attempting to classify a human being.

 

Sources:

  1. http://www.psmag.com/culture/census-data-collection-changed-race-in-america-57221/
  2. Tim Sherratt, The Real Face of White Australia. http://invisibleaustralians.org/
  3. Johanna Drucker, “Humanities Approaches to Graphical Display,” Digital
    Humanities Quarterly 5, no. 1 (2011). http://digitalhumanities.org/dhq/
    vol/5/1/000091/000091.html.
  4. http://1.bp.blogspot.com/-7zTnJIf_wow/UzxE9H5Nv5I/AAAAAAAAo_Y/pLzHrIEd5Tw/s1600/none-of-the-above-428×181%5B1%5D.jpg

Week 5

This week’s reading “The Image of Absence: Archival Silence, Data Visualization, and James Hemings” presents one of digital humanities greatest areas for concern, and also for opportunity. The information that we gather about a society or a period of time is entirely dependent on the few sources we have available. In the case of this article, we could have never known the entirety of James Hemings’ relationship without alternate sources to support any claims. Often, we are not so lucky. For example, the Incan Empire had a vast majority of their books and documentation burned and destroyed by Spanish settlers. Because of this, historians do not have nearly as much information about the prosperous and advanced society. Since our primary source information is limited, the Incan empire is mainly represented by the Spanish colonizers documentation. They are also now a culture defined by their disappearance rather than their achievements.

image-04-large

This Incan “mystery” is an example of the dangers of digitally archiving the past. Since we do not have a wealth of information published electronically, the general public will not know many important details about their culture. This theme of unclear representation continues throughout ethnographic studies, and has become much faster and easier as technologies have developed. It is increasingly easier to give misinformation to the masses, and easier to have valid information lost among webpages. However, there is an upside to this situation.

 

Technological advances have made it possible for cultures to represent themselves. Those who have alternate sources, such as oral traditions, can make their own information known.   They also can find misinformation and attempt to correct it. Websites such as Wikipedia have already embraced aspects of this idea, but the scholarly possibilities present great opportunities. Utilizing these features will prevent cultures from being viewed as frozen in their own history. And despite the fact that we will not always be able to miraculously reproduce burned books, we will be able to ass to the information that we do know about cultures in an attempt to give them a more well rounded representation.

Week Five: Humanities Approaches

In “Screen Shot 2014-11-02 at 9.33.50 PMHumanities Approaches to Graphical Display” Johanna Drucker stresses the importance of remembering that the “humanities are committed to the concept of knowledge as interpretation, and, second, that the apprehension of the phenomena of the physical, social, cultural world is through constructed and constitutive acts, not mechanistic or naturalistic realist representations of pre-existing or self-evident information.” This is especially important to remember when using digital visualization tools in digital humanities endeavors. Many data visualizations seem to argue for an exclusive and narrow way of viewing the world based on assumptions of knowledge thought to be shared by the same groups of people viewing the visualization. In turn, these assumptions usually dilute the complexities of the external and ‘real’ world, in order to attempt to answer large questions with easily digestible graphics. This kind of visualization not only removes layers of complexity from the capta presented, but also assumes a role of neutral knowledge that is deceitful. When in this form, many types of differing knowledges can become hidden and the presented narrative can become static—in direct violation to the type of discourses humanities disciplines seek to encourage.

While not deliberate, an example can be seen in Katie Leach-Kemon’s article “Visualizing the surprisingly massive toll of suicide worldwide.” A data visualization titled “Top 20 causes of premature death in females, 2010” lists self-harm as number three in both 1990 and 2010, with the mean rank increasing in 2010. What is not immediately apparent is how exactly a female is defined as a female. In this example, it seems a female can only be between the ages of fifteen and forty-nine to be considered, which raises the question—are females below or above this age range considered female? What characterizes a female—the ability to procreate? Or is this data merely restricting the age set in order to get a smaller result? Why would this be the case if one wanted to know how many females committed an act of “self-harm”? How is female defined different than male in this data sample? As seen in Drucker’s paper, “the assumption that gender is a binary category, stable across all cultural and national communities, is an assertion, an argument. Gendered identity defined in binary terms is not a self-evident fact, no matter how often Olympic committees come up against the need for a single rigid genital criterion on which to determine difference.” What is also unclear is what exactly is defined as “self-harm”. Labeled under “injuries”, “self-harm” is included with “road injury,” “fire,” “interpersonal violence,” “drowning,” and “forces of nature.” If a female between the ages of fifteen and forty-nine set herself on fire and died, would this be included under self-harm or fire? As Drucker stated, “the more profound challenge we face is to accept the ambiguity of knowledge.” We must constantly repeat her “refrain–that all data is capta.”

Drucker, Johanna. “Humanities Approaches to Graphical Display,” Digital Humanities Quarterly 5, no. 1 (2011)

Leach-Kemon, Katie. “Visualizing the Surprisingly Massive Toll of Suicide Worldwide.” Humanosphere. 18 Aug. 2014. Web.

 

Week 5: Andrew Smith and Text Analysis

Andrew Smith’s “The Promise of Digital Humanities” brings up some very interesting points on the impact digital scholarship and the increasing incorporation of technology have on the humanities fields. I tend to think of the digital humanities field as one that allows for traditional humanities research to expand and become more accessible through technology, however not all digital humanities initiatives accomplish this end. Smith brings up a valid observation that some projects use digital components to re-state the obvious and then applaud themselves for incorporating technology to support their findings. Apart from this pitfall, I think that the use of data mining to examine the veracity of humanities research opens up exciting avenues for scholarly thought. Because analysis tools are becoming more widespread, there is an added incentive to use data that supports the trend being examined or written upon rather than ignoring it to further the specific thesis with hand picked anecdotes . Analyzing text in this quantitative manner opens up new avenues for researching and promoting ideas across the humanities and even into the scientific fields.

After finishing this commentary, I was drawn to a related linked article of his, “So what’s text analysis actually good for?” From Smith’s first article, it seems like text analysis can open a lot of doors for innovative questions and methods but that it’s taking a little longer for researchers to utilize these resources to their fullest. The video (embedded below and also on the article page) gives some insight into what current scholarship is doing to integrate digital tools. This research incorporates traditional lines of humanities research inquiry but uses search tools to facilitate the process by allowing the researcher to not get caught up in the vast number of sources to sift through.

This second article made me realize that using digital resources for a project is only the first step in making a real difference in digital scholarship. Digital humanities research is most effective when technology is used to significantly advance a realm of scholarship; the maximum impact is made with innovative ideas combined with effective use of digital resources. This is something we should all keep in mind as we continue our research for this course and for our future endeavors; incorporating technology should support our end goal, not be the end goal.

 

Week 5: Alfred Korzybski and Johanna Drucker

When I used Google Maps to scale the distance of my destination, I felt assured that I will arrive as scheduled. I had read from “Science and Sanity”, a book by Alfred Korzybski, that “the map is not the territory”. I knew that the printed map did not correspond with real time, unlike how Google Maps can calculate the traffic condition and the different paths as I traveled. However, I kept in mind, even as omnipresent as technology is and no matter how careful we are with planning, that unpredicted events can still arise and be overlooked. Johanna Drucker, in “Humanities Approaches to Graphical Display”, explains the concepts of “data” and “capta”. Drucker explains that “Capta is ‘taken’ actively while data is assumed to be a ‘given’ able to be recorded and observed”. The time that I will reach my destination is calculated by the data that can be given to Google Maps. However, certain circumstances, such as an accident and constructions on the road, can not be taken into account by Google Maps. It had been reliable for my part, until I used it in the unpredictable roads of Los Angeles. Google Maps will insist that I will get to my destination at a certain time, but I have to give it the benefit of my own calculations to arrive on time. Human beings, not machines (at least not yet), are capable of seizing information. Truth is moving and alive. Drucker further explains that “Humanistic inquiry acknowledges the situated, partial, and constitutive character of knowledge production, the recognition that knowledge is constructed, taken, not simply given as a natural representation of pre-existing fact.” We seize knowledge through comprehension, interpretation, and the application of an idea that is the product of this process. “Data” and “Capta” are the participles of Latin verbs that can be literally translated as “having been given” and “having been seized”. The use of “Capta” by Drucker reminds me of “Carpe Diem” by Horace, where we try to seize a day as unpredictable and transient as tomorrow itself. Google Maps, although useful, is an “abstraction” of the territory, as Korzybski would have it. It really just tells me where to turn and what streets to look for, and I hardly look at the screen. When I reached my destination, I realized that even Google Maps “is not the territory”.  The following video shows Korzbski talk about “illusions” and “distractions”, reality as we perceive it.

 

 

 

Work Cited:

 

www.youtube.com/watch?v=A-7zYBKgzfs. Alfred Korzybski – The World is NOT an Illusion. Online video clip. Youtube. Youtube, Sep 27, 2012. Web. 2 November 2015

Drucker, Johanna. http://digitalhumanities.org/dhq/vol/5/1/000091/000091.html. Humanities Approaches to Graphical Display. Digital Humanities Quarterly5, no. 1 (2011). Web. 2 November 2015.

Data Visualizations

Johanna Drucker’s article Humanities Approaches to Graphical Display discusses the importance and usage of data visualizations in Digital Humanities—in particular, why we need a humanities approach to the “graphical expression of interpretation.” Visualizing data allows the viewer to actually see and understand data instead of just looking at numbers on a spreadsheet. Drucker continues by explaining the difference between data and capta. Drucker wants us to “reconceive all data as capta…Capta is “taken” actively while data is assumed to be a ‘given’ able to be recorded and observed.” This distinction is important because it leads to how we represent and display this capta. Drucker explains, “The representation of knowledge is as crucial to its cultural force as any other facet of its production. The graphical forms of display…and the common conception of data in those forms need to be completely rethought for humanistic work.”

top-ten-salaries-google_large_mini

An issue presented with data visualization is the desire for it to “look cool”. The person creating an infographic can become side tracked in the hopes of creating an interesting looking image that they forget to actually convey any information. In the image above, “Top 10 Salaries at Google,” the graphic designer attempted to display the salary ranges for those who work at Google. While trying to make the data appear more interesting by using a pie-chart format and crazy colors, the chart actually obscures the data, making it inaccessible. Although a table format is boring, it is more efficient, in this case, in making the data understandable for everyone. Making the data appear more interesting doesn’t necessarily make it easier to understand.

SmashingMag1_ChartRedesign-03_large_mini

Tiffany Farrant-Gonzalez, in her article, “All That Glitters Is Not Gold: A Common Misconception About Designing With Data,” gives an alternate example of what the data could look like. Her version is a much better representation. Gonzalez explains, “With the linear organization, the viewer can understand at a glance what the data is showing, without having to work too hard. In stark contrast to the original, this graph makes the data instantly accessible, allowing for easy comparison between the jobs.” Her version might not be as visually appealing as the original, but it definitely makes a lot more sense.

Works Cited:

Johanna Drucker, “Humanities Approaches to Graphical Display,” Digital Humanities Quarterly5, no. 1 (2011)

http://www.smashingmagazine.com/2013/07/29/common-misconception-designing-data/

Week 5: The Changing Definition of “Risk” as Capta

 

Back in the winter of 2013, I was a research assistant for Dr. Matt O’Hara of the University of California, Santa Cruz’s department of History and Latin American and Latino Studies. His research centered on the examination of the history of time in colonial Mexico and the Spanish Inquisition.

After reading Dr. Johanna Drucker’s text “Humanities Approach to Graphical Display,” I was immediately reminded of many of the sources I encountered during my time as a research assistant under Dr. O’Hara. One article in particular, entitled “Uncertain Times: the Notion of ‘Risk’ and the Development of Modernity” by Gerda Reith stuck in my mind as being relevant to Drucker’s article.

In her essay, Reith explores the “ways in which understandings of uncertainty have evolved during the development of modernity, and in particular, how they are expressed in the notion of ‘risk.'” Through a temporally heterogeneous analysis ranging from the medieval to the modern, Reith demonstrates how the definition and idea of what it means to take risks are embedded in socio-economic contexts and grounded in particular temporal orientations, specifically as expressed in notions of determinism and indeterminism.

If the humanistic approach which Drucker discusses in her text is centered in the experiential in which “capta” should be taken actively and “data” are assumed to be a given, capta will reflect the divergent experiences of each generation on a temporal scale. After rereading Reith’s article I found that the connection between how humans perceive notions of risk in many ways reflects how Drucker suggests that digital humanists today should interpret and display data.

In her article Reith states that “so, as long as human actors who perceive and think and respond are involved in the probability equation, there can be no such thing as ‘objective’ risk.” Here Reith both highlight’s Drucker’s theory of data as capta by pointing to each generation’s different interpretation of the notion of risk, and also shows us that the idea behind risk itself is rooted in the impossibility of a singular, real or objective point of datum. In other words, the both the perception and later observation of risk is changeable and open for human interpretation.  In her article, Drucker asserts that all eras representation’s of knowledge are distinct, and divergent from the next. Reith explains that during the medieval period and other periods where technological control over the natural world was limited, uncertainties were expressed and managed through a range of religious or magical concepts such as luck or fate, but around he seventeenth century, dramatic developments in social, intellectual and economic life transformed ideas about uncertainty, the future and human agency through the growth of economic systems such as capitalism, credit, and insurance. During the early modern era, perceptions of risk shifted and allowed for greater human intervention as opposed to a reliance on faith or luck for an explanation of teleology.

I think Professor Drucker’s approach to the study of digital humanities with “capta” is honorable because it holds true to the traditions of what it means to conduct scholarly research within the sphere of the humanities, while still employing new forms of technology to support those inquiries.

View Reith’s article here: http://tas.sagepub.com/content/13/2-3/383.full.pdf

Week Five: Information Visualization, Continued; Text Analysis

5474039-25383714-thumbnail

Johanna Drucker’s article Humanities Approaches to Graphical Display discusses the prevalent challenges and approaches to visualizing and understanding data. First, she calls attention to the distinction between ‘data’ and ‘capta’. Acknowledging the difference between the two is of key importance – as they delineate “constructivist and realist approaches” (Drucker). Data is the information that surrounds us – existing independently and without human interpretation. Capta is the data that we interpret – “taken and constructed” (Drucker). The distinction between data and capta acts as the basis for a more comprehensive construction of data visualization. Instead of viewing bar charts and maps as absolute truth, we take capta as a caveat – this information is mediated.

Drucker discusses the huge impact this challenge offers. She writes, “If we don’t engage with this challenge, we give the game away in advance, ceding the territory of interpretation to the ruling authority of certainty established on false claims of observer-independent objectivity in the ‘visual display of quantitative information’” (Drucker). As we move into this stage of a digital world where scholars contribute work, we have to confront this issue head on. Drucker suggests that capta display “ambiguity and complexity”. This is an important step towards greater clarity in data presentation. Drucker explains, “Nothing in intellectual life is self-evident or self-identical, nothing in cultural life is mere fact, and nothing in the phenomenal world gives rise to a record or representation except through constructed expressions” (Drucker). This is all to say that any information we view must be mediated through humanistic approach.

A keen example of Drucker’s argument is Julia Belluz’s infographic for an online article The Truth about the Ice Bucket Challenge. The data visualization is titled Where We Donate vs. Diseases That Kill Us, which illustrates color coordinated circles that correspond to the amount of money donated to causes compared to the highest death causing diseases in the country. However, a blog post on Cool Infographics by Randy Krum points out that the size of the circles do not accurately depict the proportional values. Krum warns, “Designers make the mistake of adjusting the diameter of circles to match the data instead of area, which incorrectly sizes the circles dramatically. It takes some geometry calculations in a spreadsheet to find the areas and then calculate the appropriate diameters for each circle” (Krum). Krum proves his point by actually correcting the infographic. The result is much less impactful, as the size of the circles in each table level out considerably.

Bullez’s article has since been corrected by the website it was run on, Vox Media, but its mistake offers an insight into Drucker’s argument. In her “polemic call to humanists to think differently about the graphical expression in use in digital humanities” (Drucker), Drucker asks that capta shifts its terms from “certainty” to “interpretive complexity” For example, who donates to these causes and why? Who are the people who die of these leading-causes diseases and what are their stories? Although this is daunting, Drucker argues that it is all the more enlightening to the humanist approach to knowledge and understanding.

Krum, Randy. “False Visualizations: Sizing Circles in Infographics.” Web log post. Cool Infographics. N.p., 29 Aug. 2014. Web. 2 Nov. 2014.

Week 5: The Comparison to Graphical Display to BODY SHOP Advertisement

Johanna Drucker in Humanities Approaches to Graphical Display talks about how digital visualization tools act as an “intellectual Trojan Horse,” or a “vehicle which assumptions about what makes up information swarms with form.” As I was reading this I realized that graphical displays are not the only forms that act as Trojan Horses. Advertisements have become just as sneaky, playing on consumer assumptions.

The advertisement that came to mind was the one below:

Body Shop ad

 

I saw this post float around Tumblr for a bit with the caption, “Who’s Alex? Billboard demonstrating gender stereotypes as most people automatically assume that Alex is the boy.”

I sure as hell did. I’m even sure the people reading this blog post even assumed that Alex was the boy. I even thought it was a clever advertisement, a company riding on the strongly rising movement of gender equality. It was clever that they were forcing people to face their stereotypes and their assumptions. It’s clever because we think that it’s doing exactly what Drucker has been saying. The advertisement is saying “Look, you assumed! That’s okay though. You can change. Buy our products.” Or something to that extent.

This is where it kind of gets Inception-like.

We’ve been fooled. We’ve made the assumption that this ad played off of our gender assumptions and stereotype, when in fact the advertisement actually just used advertising and design trick. Tumblr user, Urulokid demonstrates that Body Shop is catching us red-handed but because the little boy is the focal point, we immediately assume he’s Alex. Furthermore, Urulokid proves that the ad is fallacious because English readers’ eyesight scans from left to right. The first thing we read is “MEET ALEX” and then our eyes go immediate right of the words to the boy.

This is a fallacious confirmation bias, as anyone looking at it will assume Alex is the focal point (i.e. The Boy) and then if they’re perceptive they’ll notice the words at the bottom. Aha! Those damn gender stereotypes gotcha again! Except no, because the ad literally forces you to read it as “Alex is the boy” by the visual language and lines of sight. 

She goes on to create a less deceiving advertisement using a stock photo. You can see it here.

Overall, the point is that Drucker was right that digital visualization tools present themselves as assertion and not interpretations which is deceiving. But what would she say about interpretive advertisements that are deceiving?

Johanna Drucker – Humanities Approaches to Graphical Display

Tumblr – Urukoid post about Body Shop advertisement