Week 5: Avoiding Assumptions

Screen Shot 2014-10-31 at 7.21.53 PM    Screen Shot 2014-10-31 at 7.28.18 PM

A theme that resonated with me from this week’s readings was information loss, both through miscommunication between reader and content, and through lack of a voice in history translating to lack of a voice in documentation – which can also be switched to say that a lack of voice in documentation leads to an assumed lack of voice in history. I thought it was important that Drucker mentioned in “Humanities Approaches to Graphical Display,” that “the history of knowledge is the history of forms of expression of knowledge.” This to me meant that history is only as much as how it was documented and interpreted, and the miscommunications through information loss become very dangerous in this sense. This week we’ve been learning about different data visualization techniques to use in our Final Project, and these readings emphasize the importance of being smart about our techniques and tools. The first step is our data – choosing what to gather, how to gather it, then gathering it, and then thinking about how the reasoning behind why we gathered it can translate into knowledge through a visualization. Our data for our project is metadata about the most popular LA food trucks, looking at categories such as common words, food types, ingredients, names, and prices. We want to take this metadata, visualize it, and then use it to prove and analyze our findings about success and trends of food trucks. One of the first steps of visualizing could be done through a word cloud that makes common words bigger, and then links them to words that they are most commonly paired with. This would give insight into the main items or catchphrases that food trucks are using the drive business, and would also give insight into what consumers are consuming the most of. From here, I would take the most common ones and do further analysis on them, in order to get specific, accurate, detailed data. This could be a timeline, incorporating time frames into the visualization to show the rise of trucks and when the trends proved in the word cloud were realized. I think two different types of visualizations would allow for flexibility and accuracy with our data. It would also encourage the readers to interact with the data more and figure out how the two relate to one another. Of course, this could be a problem in itself. To make sure the readers don’t assume too much, our graphs would have short descriptions for accuracy, and then a further detailed “about” paragraph. In addition, we would address any data that doesn’t quite fit into the map. For example, uncertainties. In addition to addressing visualizations, Trucker’s article also points out some faults with “data.” Data assumes that it is a black and white fact that can be plotted onto a visualization, when in reality there are many uncertain pieces of data that don’t quite fit in. In order to not have to omit these pieces of data – which would result in the reader thinking they just don’t exist – a visualization tool has to be created with these humanities issues in mind. How our tool expresses the data, defines it. The representation of knowledge is just as important as the knowledge itself.

Week 5: Mistakes are Inevitable in DH

When I read the description behind the website The Real Face of White Australia, it struck me how it explained the shortcomings of their use of a face detection script. While they have tried to weed out most of the inconsistencies, faces of white people have managed to escape their notice. I was eager to see if I could spot one, and sure enough, after a few minutes of scrolling and exploring, I came across a Customs documentation portrait of a white man named Tom Solomon Toby. Even research projects of this extent have deficiencies in their data visualization. The problem does not lie with the data itself, it has to do with the computer’s processing of the data. Like we have learned in class, the world of the humanities is too complex to be completely and fully processed by that of a computer, and this serves as an example of how this issue can transfer into problems with Digital Humanities projects.

This reminded me of what Francesca warned us about in lab on Friday. The data visualization programs we learned about (Many Eyes, Tableau, and Palladio) may not correctly process our data. Therefore, we must be on the lookout for inconsistencies between our data and its visualization, and be prepared to either find a way around it or explain why the irregularities have occurred.

The inconsistencies between an item search on a website and the wide variety of products that come up serve as an example of discrepancies between what is listed in the database and what is represented in the visualization of that data. For example, on Etsy, an online marketplace for independent merchants, when one searches for a “computer case,” many different items pop up. You can see the results for this search here: https://www.etsy.com/search?q=computer%20case&ref=auto1

In addition to actual laptop cases; laptop stickers, messenger bags, cosmetic bags, travel tags, and even a faux-crocodile handbag came up as search results. There is nothing wrong with Etsy’s database; it is the means of processing this data with a search engine to visualize it on its website where problems come into existence. Etsy can use a controlled vocabulary to better streamline the representation of their database with search engines; minimizing the use of ambiguous terms like “computer case” and thus streamlining their searching process. Again, computers process things in a very strategic way that leaves out the potential of processing people’s tendencies for multiple vocabularies and complex ideas.

 

Law and the Human Condition- How to Represent and Extrapolate Controversial Data?

http://demonstrations.wolfram.com/TheAppealsCourtParadox/

http://demonstrations.wolfram.com/ThePersuasionEffectATraditionalTwoStageJuryModel/

A comment raised in the Data+ Design article that really stuck with me was the notion that “data is around us and always been” and that “only recently have we had the technology to efficiently surface these hidden numbers, leading to greater insight into our human condition.” Given that the human condition features what is perceived to be an unalterable part of humanity that is tied to our tendency for error and fallibility, it is interesting to imagine instances where we might conceivably quantify such intangible concepts- let alone provide insight into such a topic. This standpoint is notable especially given the general aversion of the humanities toward the need to quantify everything in the world and see it in black and white.

This reminded me of the computational knowledge system Wolfram Alpha, which “takes the world’s facts and data” and computes it across a range of topics”. I went to their demonstrations website (where people can showcase the projects they have been working on), and found an interesting collection of projects, including a fair amount in the legal field. These include the projects linked above- The Appeals Court Paradox, in particular, takes into account the probability that each judge votes correctly, and factors in whether the judge votes independently, to determine the likelihood of a “correct” ruling being delivered.

The projects demonstrate a more pressing/ overarching issue in legal rulings and procedure, where judges’ bias, however reprehensible, is first difficult to identify and allege, and seems to be an inescapable part of the decision making process. Especially in the Hobby Lobby case and recent decisions that have been split 5-4 or 4-5, we now understand decisions also as a product of judges’ personal ideology or political affliation. This has resulted in a notable drop in confidence of the public toward the supposed objectivity that the judicial system is expected to deliver, such that the ruling system seems more a result of chance, rather than law.

Ignoring the assumptions made in deriving any such numbers for the initial calculation, the Wolfram Alpha project therefore seems capable of reconciling the need for grey areas/ in between spaces (as opposed to black and white) and statistics by calculating probability.

Then again, there seems to be something unsettling about basing the present on the past-gathering data from past occurences and extrapolating that to predict the future. Problems also arise as the data set of choice is conceptually fuzzy- what is the “correct” decision in relation to the law? If the notion of correctness is associated to our personal beliefs, how then might we represent that in an empirical data set?

At present, although data can be useful in representing non-contentious information, it remains to be seen whether it can assist us in illuminating controversial topics in the realm of ethics and law, both of which are underpinned by the human condition.

The Function(s) of Clothing

Product Theory Image

 

Since when did a coat become more than just a coat?  When scholars and researchers begin to unpack and explore fields of nascent inquiry, we are presented with information seen through a different lens.

The nexus described above highlights the distinctions of functions of an article of clothing.  The function of a product, such as a coat, is twofold, according to Dagmar Steffen, practical functions–for warmth, protection from rain, chemicals, etc–and product language functions–style, aesthetics, fashion statements, etc.  What Steffen explores and explains is the function of language and how we interpret  categories.  A coat falls under the broad category of clothing.  Under the ambiguous, yet somewhat specific category, we fall prey to losing traction as we know coats could also be categorized under particular seasonal lines: fall, winter, spring, and summer–nominal data.  Michael Castello clearly simplifies the ways in which data collection, databases, and visualizations are conceptualized and visualized.  The free online “guide” lays out the “how-tos” to the “ta-da, we’re done.”  What he explains with an analogy to a supermarket and food, is how we measure data with categories such as nominal, ordinal, interval, and ratio.  I mentioned nominal earlier when I said fashion lines by seasons.  The seasons are nominal data.

ch02-01-percent-basket

 

Similar to the grocery food categories, the coats under each season could be calculated as percentages and not averages to gives us a representation.

Why do we visualize data?

Here’s the background: I am an art history major whose hobby is to collect interior design and craft project pins on Pinterest, as well as a marketing and branding enthusiast who holds certificates in market research and marketing with concentration in social media and web analytics. I am highly design-and-user experience oriented and analytical, and I love to teach and explain the concepts in a visual manner. That has led me to become an aspiring brand designer and a creative director whose forte is in strategic and analytical background. That being said, I am not a designer, or at least yet. Working as a marketing strategist and brand manager in a number of startups with great ideas and lack of manpower, I’ve taught myself how to use Photoshop, Illustrator, InDesign, Muse, Final Cut Pro, iMovie, and even Microsoft Office Publisher to create images and concepts that are vivid in my head into tangible works. It takes a great deal of time and frustration for a strategist to deal with highly detail-oriented design works, and that’s the reason Data + Design grabbed my attention like no other.

 

The concept of learning data visualization is fascinating. With an ever-growing amount of data, there is a necessity to fill the gap between collecting and analyzing the data and explaining and creating results with it. As complicated as data can be, backed with statistical sources full of numbers and graphs, the most efficient way to explain it has been visualization of the data and information such as infographics and interactive web design. Based on my personal experiences, the communication gap between data collectors and graphic designers is often too large, leaving both parties in assurance that they belong in the opposite poles of the world, lost in translation. Data + Design in collaboration with Infoactive (whose landing page had an error and I couldn’t conduct a research on) provides simple steps of collecting and analyzing data and building a visual summary of it, which will be a quintessential guide to any data scientists or designers alike; and its format as an open source site and the growing size of the community only proves the need for the combination of data and visual sources in today’s world.

 

 

Reference:

Data + Design: A Simple Introduction to Preparing and Visualizing Information

Designing Data- Image Atlas (Taryn Simon)

The book Data and Design was an extremely interesting read. I enjoyed its clear and cohesive approach at drawing out the relationship between good design and presenting information. I love the fact that Coale not only encourages us to “present information” but to design an “information experience”. I think this is important in understanding the fact that data cannot be completely objective as the very act of organizing and visualizing it is in turn emphasizing certain elements and bringing attention to select elements of the research through design. He goes on to describe in his chapter “Importance of Font, Color and Icons”, principles of design such as minimalist approaches, utilization of color theory etc. that lead to more effective absorption of visualized data. An example shown in the book that inspired me was Florence Nightingale’s “Diagram of the Causes of Mortality”. The beautiful and cohesive design of the Coxcomb diagram is an iconic method of displaying data that is still used in a contemporary environment. It goes to show that good design is something that is universally recognized; a language that although not everyone speaks, any pair of eyes will understand. Another interesting point that I think he made was his reference to the fact that humans have been using methods of visualizing data with icons and pictograms as far back as the Neolithic ages. This is interesting to me as it shows that man has always had the ability to visually communicate information and the inclination to express themselves with something more than words.

Taryn Simon is an American artist that is fascinated by categorization and classification. Her work often involves extensive research to gather data which she then formalizes in the medium of photogrpahy, text or graphic design. Her project Image Atlas was something I immediately thought of upon looking into visualizing data. Her website http://www.imageatlas.org/ “interrogates the possibility of a universal visual language and questions the supposed innocence and neutrality of the algorithms upon which search engines rely.”- as described on her portfolio. The structure of her image atlas is interesting as it compiles the top results of an image search using the same keyword from different countries that use local search engines. E.g. If I wanted to compare the the images a user in China and a user in Korea would see after typing food, Image Atlas would compile the top results of their local search engines (Baidu and Naver, respectively). Although this is not an entirely scientific method of acquiring insight on the topic of search, it is an interesting way of communicating differences in terms of exposure to data and cultural iconography within each country. My favorite part of this website is the fact that North Korea is listed, but has zero results listed no matter what the search is.

Week 4: When Excel Can’t Excel

A bug in 2007 version of Excel.

In the online-guide “Data + Design”, various authors collaborated to discuss the complexity of comprehending and organizing various forms of data. Alistair Croll’s piece on data aggregation was particularly interesting to me because it was the first document I had ever seen that grouped and explained different kinds of ways data can be combined and explicitly delineated the logic behind the rules of these combinations. Particularly striking was the piece’s definition of “summable multiseries data”. A group of data connected by their representation of a larger statistic, these “subgroups” are often more fickle to identify and arrange than they seem. Using their example of coffee consumption, a statistic on how many cups were served to men and how many cups of regular cups of coffee were sold cannot be compared because their basic subgroups (for a visual aid, think of subgroups as “graph axes”) are not the same – one breaks down consumption by gender, the other by the kinds of coffee purchased. Even further, as Croll demonstrates, these figures cannot be leveraged against each other to “back” into the statistics of another subgroup. For example, just because you know 36.7% of cups were sold to women DOES NOT mean that 36.7% of regular cups of coffee were subsequently sold to women – those two figures did not correlated with each other and thus, do no indicate something about the other. Thus, data and context are equally important in statistics.

However, as the article points out, subgroups and categories are strictly anthropological. While working on a set of important excel data, I once made the mistake of selecting every piece of data to generate a graph, instead of the more specific set of data I intended to work with. As a result, I got an unintelligible set of strings of lines instead of the orderly, legible graph I was expecting to work with, similar to the image in the above. While I immediately registered the graph as incorrect, Excel however, never once issued an error signal. Thinking the graph was an accurate amalgamation of the data it was fed, Excel couldn’t tell the data I selected did not make sense and proudly generated the tangled lines I had before me – slapping one line charting evaluations cores over time over another line plotting satisfaction per class. While Excel is very good at interpreting data, human logic is obviously a whole other ball game it is far from winning.

The New York Times + Data + Design

Screen Shot 2014-10-27 at 11.38.42 AM

I was reading Trina Chiasson’s compiled online source book on Data + Design, and I just have to say it’s one of the coolest resources on the internet. I really enjoyed the final product because I know compiling all this information, and making it accessible and understandable to users can be a huge challenge. Within the past year, I’ve really become interested in how data visualization can make unsurmountable numbers of data digestible and surprisingly, enjoyable. It just looks so good. Whether it be through infographics or interactive data visualizations, is a great way to digest information in the 21st century.

As beautiful as this may look. Chiasson’s online book almost takes away the fascination shown in data visualization. What I mean is it is a lot of hard work. In my opinion, there are so many things that can go wrong when compiling and organizing the data. I have so much respect for people who go through the means of creating these visualizations.

One of the best uses I’ve found for data visualization has been through journalism. The New York Times is my favorite in terms of how it creates data visualizations that can apply to any reader, and they are also super interesting, too! The one I would like to talk about is their most popular data visualization of 2013: How Y’all, Youse, and You Guys Talk.

Screen Shot 2014-10-27 at 11.39.22 AM

This data visualization map takes 350,000+ survey answers that were taken from August to October 2013, and creates a map for where different phrases are said within the United States. The interesting part is that you can take the 25 question quiz, which tells you from where your unique dialect derives from.

This NY Times visualization was based on the Harvard Dialect Survey conducted by Burt Vaux and Scott Golder, which actually began in 2002. After taking this quiz, and seeing how personalized it can be, I can only imagine the number of steps needed to be done in order to visualize this information. I wish that they had shared exactly how they created and organized the data collected instead of having a short “About this Quiz” section.

As someone who is becoming more interested in digital humanities, it really holds importance when sharing data visualizations. Though The New York Times is a journalistic source, we can assume that the information is true, but it’s always good to share that information with readers who want to know more.

 

Sources: http://www.nytimes.com/interactive/2013/12/20/sunday-review/dialect-quiz-map.html

Data + Design: A Simple Introduction to Preparing and Visualizing Information, https://infoactive.co/data-design/titlepage01.html

 

Week 4: Interactions with Databases

When I started this week’s reading, I kind of dragged my feet. I had a very singular image of what a database was and what it could do. For whatever reason, I was restricting myself to imagining databases as endless accumulations of data, minimalistic in presentation, which could only be decoded by people trained for such a job. The Companion to Digital Humanities did say that the database has an important place in humanist research, “whether it is the historian attempting to locate the causes of a military conflict, the literary critic teasing out the implications of a metaphor, or the art historian tracing the development of an artist’s style,” but it was still difficult to imagine using a database unless for dedicated research.

It wasn’t until I started to casually browse the New York Public Library’s Articles & Databases page (itself something of a database) that I realized the number of purposes databases could serve. Many stored immigration or genealogy information (similar to the Transatlantic Slave Trade Database); others were archives of printed articles. Many of these were interesting enough, but they fit into the descriptions of databases provided by our readings. I wasn’t really surprised by what I found until I saw the listing for the International Movie Database (IMDb), which I recognized. Although IMDb claims to be a source for entertainment news, most people use it to figure out if they really do know an extra in a movie from something else.

It is possible that the majority of databases are used as directories or otherwise in the pursuit of research. To some extent, I have not reconciled my preconceptions about the utility of databases, but I can see how it is possible for a database to function more as a search tool in everyday life. I noticed that a lot of people see iTunes as something of a personal database that they use frequently, which makes me wonder if there are any more traditional databases that could be considered overlooked.

Week4: Graphs and Charts

After spending so much time and focus on metadata, I believe I have a pretty good understanding of it. Now it was interesting to read the book about the presentation of data, using this metadata we have just studied. Talking about data can often be very confusing, but the food preparation analogy, made things a lot clearer and helped me to understand what each step was, and how it contributed to the overall data representation. Of all of the steps, I found the section on visualizing the data to be the most interesting.
I have made many charts and graphs in my time, and am sure I will continue to make even more in my time in the digital humanities minor, as well as in the future, so I am sure that these tips and tricks given in the data visualization section will be very helpful. Most of the information in Chapter 14 “Anatomy of a Graphic” were tips that I had heard before, but the one thing that really stood out was the point about not cluttering a graph. When trying to make a graph informative, I do have the tendency to over-clutter it, because I worry I am not putting enough information on otherwise. Although this was a point I am very happy I have addressed now, the chapter that I was extremely interested in was Chapter 15.

In Chapter 15, the subject was the importance of color, fonts, and icons. I had always enjoyed playing around with colors, fonts, and icons when making visual representations of information, but I had not realized how important these aspects actually are. One of the first things that stood out to me was that they acknowledged that color is important, but stressed that the data must first all be laid out before adding color. Another idea that was extremely interesting to me and I hadn’t thought about before was the importance of white spaces. When showed the comparison between charts and graphs with white margins and those without, I was impressed by how much cleaner and clearer the representations with margins looked. With this knowledge I decided to search the internet for interesting graphs that depicted some of the techniques I had just studied.

image001-2This is an example of a representation that did not use color well. There are no color differentiations between any of the descriptions near the top, which makes it confusing to understand. Although this is a silly chart, it shows a confusing example of the point the person is trying to make, because of the coloring of the image.

image002This other fun chart is clear, because color is utilized to show overlap in the graph and differentiate between attributes. In this visual representation, the color only benefits the representation.