Course blog

Week 4: Databases-The Push of a Button

This morning, I turned on my laptop, went to my folder labeled “Fall 2014,” clicked on another folder, “Astronomy 3” and finally opened a word document file named “Ast 3 Lec 10-24-14” to reviews my notes from the last Astronomy 3 lecture that I went to. It’s so easy for the average person to create a database, no matter how small, and store their files on their computers. The internet makes this even easier by letting us post texts, pictures, videos, etc. on an open space where others can view them with the click of their mouse.

Databases are helpful in many ways, especially when it comes to immortalizing historical texts. In Computer Databases and Aboriginal Knowledge, Michael Christie talks about the aboriginal population Larrakia from Darwin, The Northwestern Territory in Australia and how some of their women want to put their elder’s knowledge onto a database so that the younger generations can have that information even after the elders pass. With youth now constantly on their devices whether it be a phone, laptop, or tablet, they are constantly being fed information through websites like the Yahoo home page, social networks, and plain research so it makes perfect sense to put the Aboriginal elders’ knowledge on databases that the younger generations can access.

Databases are essentially virtual archives which can be derived from the Greek word “Arkhe” and defined as the “commencement” and the “commandment” as described in Jacques Derrida’s Archive Fever: A Freudian Impression.  As the commencement, archives describe nature and history as the origin where things commence. As the commandment, they show how men command because archives are a man-made creation. When looking at archives with these two principles, we can truly appreciate their importance and creation. By applying this to the Aboriginals from Darwin, we can see the significance in inputting their elder’s knowledge into a database.

The Aboriginal’s can create a database for their children and grandchildren and they’ve done their job when it comes to bringing them the information. The old saying “You can lead a horse to water but you can’t make him drink” comes into play here. If the children do not show interest in using the database to learn about their history and culture, then that is their choice. The real magic happens when someone reads the information and uses it or tells someone else about it. Text is nothing more than text until someone reads it. At that point, it becomes knowledge that someone can use by teaching and using that information. Databases are great because they offer us information that we can tell others about.

Works Cited

http://www.cdu.edu.au/centres/ik/pdf/CompDatAbKnow.pdf

http://books.google.com/books?hl=en&lr=&id=6KNJmNkE11UC&oi=fnd&pg=PA4&dq=archive+fever&ots=lrKZ2mSmXe&sig=CIde5g-wdIhhSFQIxgXcj9aZzWE#v=onepage&q=archive%20fever&f=false

Week 4: Wardrobes and Relational Databases

image

This chart is an example wardrobe plan from the book New Image for Men: Color and Wardrobe by Marge Swenson and Gerrie Pinckney (published in 1983). It shows all of the pieces of an imaginary wardrobe and puts them into categories according to level of formality and type of clothing. There are pieces for business, dress, and casual wear, and they include suits, sport coats, shirts, pants, ties, jewelry, belts, shoes, socks, sweaters, and jackets/coats. Each piece has attributes such as color, pattern, and material. Also, there are a number of pieces for each type of clothing at each level of formality, such as five shirts and three ties that match a suit for dress wear. This plan results in a flexible, efficient wardrobe that it is easy to make outfits with and avoids extraneous or redundant pieces that clutter up your closet.

Stephen Ramsay discusses relational databases in the chapter “Databases,” which are based on the idea that a database can be “a set of relations.” If all of the outfits that you can put together comprises a database, a wardrobe plan is analogous to a database design. A simple, old-fashioned tabular database would mean that each piece in an outfit is only used for that outfit. If you had 18 outfits that included black Louboutin pumps, you would actually have 18 pairs instead of one, which is an improbable situation. A relational database describes the reality of wardrobes much better, since a single pair of shoes can be used in many outfits (what Ramsay calls a one-to-many or 1:M relationship), thereby minimizing redundancy. In a relational database, each outfit would be a record or entity with its own primary key, and the black pumps and other pieces in the various categories (tables) would be referred to via foreign keys that can be reused in other records. Furthermore, the ways that pieces are mixed and matched, indicated here by horizontal lines that separate the levels of formality, would be described by entity relationship diagrams. However, like Ramsay notes in regard to real-world data, actual wardrobes are more complex than this idealized wardrobe plan.

Just like databases, a person’s wardrobe reveals things about him or her. The particular items of clothing they buy and the way that they make outfits can give clues about a person’s tangible and intangible characteristics such as body type, their “color season,” personality, age, occupation, socioeconomic background, etc. Likewise, what data goes into a database and what is left out, and how the database is designed, reveals the ideology of the people who made it.

Week 4- Incan Databases

Reading Stephen Ramsay’s article Databases and also the Data + Design book really got me thinking about the way that data is visualized.  In both readings, the database, or specifically the computerized database, is described to be a complex system in which to store and sort information. Specifically, Ramsay describes the digital humanities database as a series of relationships.  He describes these relationships as being able to “hold out the possibility not merely of an increased ability to store and retrieve information, but of an increased critical and methodological self-awareness.”  This got me thinking about different origins for non-digital databases, what kind of relationships were they created to represent?

Inca_Quipu

The quipu (alternate spelling Khipu) is an artifact of the Incan empire (1400-1532 AD).  Quipus were used by the Incas to record information. As the Incas did not utilize a written language or numerical system, quipus were used both to document numerical information, historic myths, and imperial decrees.  Quipus consisted of several long strings.  Each string would hold its own pattern, spacing, and style of knots representing the recorded information.  Although, full knowledge of the Quipu system is lost on the modern western world, it is known from contemporary accounts that Quipus were used for highly complex tasks, not unlike modern databases.

This got me thinking about different, or perhaps non-western, ideas for organizing the database.  In the Incan context, the quipu relied heavily on the knowledge of the “reader” and also heavily on the notion of relationships.  From what little is known about the Quipu, it is clear that information is not recorded in a direct manner.  A specific kind of knot does not correspond directly to a letter or a word, it is highly contextual and is perhaps intended as a type of nemonic device for the reader.  This to me, seemed exactly what Ramsay was referring to when he said that, for the digital humanist, the real purpose of the database lies in the relations produced.  Moreover, the physical structure of the quipu brings up questions of data presentation.

Screen Shot 2014-10-26 at 2.19.52 PM

Moreover, I thought that it was an interesting aside that Harvard is now creating its own database about Quipus.  The database will function to record all of the data presented on existing Quipus today.  Even cooler is the fact that this data base has mirrored its data scheme on the Quipu calling it the “khipu data scheme.”  The website for the project explains the data structuring as a “branching network in which the number of branching levels is highly variable, but in which components at every level share certain characteristics.” Moreover, the computer database will look at interpreting the physical nature of the Quipu focusing on: “the interlocking relationships between khipu components, the branching or tree-like structure of khipu, the similarity of certain components, and the multi-dimensionality of khipu variables.”  I thought this was a fascinating instance of mediation and also of episteme! The quipu uses its own unique system to address how it structures and presents information.  The fact that this system, while seemingly foreign, so easy coordinates into a computer database is fascinating to me.  Perhaps this speaks to a universality of databases?  I am intrigued and curious if anyone else has instances of early databases!

 

 

An addendum to “Classics and the Computer: An End of the History”

 

 

 

These images are examples of a roll and a codex. The ancients transcribed their written works in scrolls made out of papyrus. Eventually, codices made out of parchment were used to transcribe classical texts. Writing on scrolls were difficult in terms of space. They were also inconvenient as references, since they had to be completely rolled open. They were subject to fast deterioration as well. Therefore, authorships were lost through literary corruptions, deterioration, and misplacement of scrolls. When codices were developed, perhaps to address these problems, it introduced new ways of organizing written work and new ways of reading. The transition from books to codices, like tapes to CDs, introduced a much efficient way to circulate knowledge. Greg Crane notes in “a Companion to Digital Humanities” that “The adoption of electronic methods thus reflects a very old impulse within the field of classics.” Classicists have an obsession for truth prompted no less by the loss of great works through deterioration and manuscript corruptions. The transition from scrolls were not without consequences. Certain authors and works were not transcribed into codices and were lost. Crane also notes, “Many non-classicists from academia and beyond still express surprise that classicists have been aggressively integrating computerized tools into their field for a generation.” Perhaps this is to address that the transition to a digital media is complete and that no work is lost? Computers spurred a new way of circulating knowledge reminiscent to codices. My Classics professors, for instance, use digital dictionaries and grammar books for reference. In some ways, how we read now are much authoritative in comparison to the ancient Romans themselves. A number of reasons, such as education being limited only to the upper class and the limitations of the papyrus, have limited the understanding of certain works to only a few readers. Classicists now use digital tools to easily navigate through these works, to learn ancient languages, and to inspire new questions by looking at these texts from a different perspective in a way allowed by computers.

 

This is a lemmatization of a Latin piece. Such visualization allows readers to gain not only a better understanding of the piece, but also to gain new insights and questions. Crane ends with this note: “Our history now lies with the larger story of computing and academia in the twenty-first century.” Perhaps Classicists today are not just learning digital tools to simply increase the chances of their employability, but are simply part of a new transition.

 

Citations:

A Companion to Digital Humanities, ed. Susan Schreibman, Ray Siemens, John Unsworth. Oxford: Blackwell, 2004.

http://www.digitalhumanities.org/companion/

From Scroll to Codex. http://courses.educ.ubc.ca/etec540/July03/batchelorj/researchtopic/. Image. Web. 25 October 14

Disambiguation and Lemmatisation of Automatically Computed Texts. http://wiki.hudesktop.hucompute.org/index.php/Lemmatisation/Disambiguation. 14 October 2014. Image. Web. 25 October 14

Week 4 – 12 Graphs and Charts that Perfectly Illustrate What It’s Like Trying to Get Ready in the Morning

enhanced-7340-1400009601-5

This week I was drawn to the book, Data and Design, by Trina Chiasson, Dyanna Gregory and many other contributors. I flipped through a lot of the pages of this book and was impressed not only by the helpful information that was presented, but also by the pleasing visual display. Immediately, I saw a lot of similar terms and ideas as I had seen in my Stats 10 book. I had just taken my first midterm for that class this week so all the different definitions of qualitative and quantitative data were fresh in my mind. I could personally attest to the premise of this book that data visualization and data in general can be overwhelming and confusing for “non-math” people. My brain is math oriented, but I had never taken a stats class before and I haven’t taken any sort of math class in over a year, so I was a little rusty. Both my stats book and Data and Design have helped me to see how certain types of graphs better display different types of data. For example, a pie chart is better for categorical data and a histogram is a more appropriate data visualization for numerical data.

All this talk about graphs and data visualization reminded me of some of the fun Buzzfeed graphs I had come across when I was procrastinating one day. Typically, we think of graphs and charts as boring and inapplicable to daily life, but these graphs demonstrate that data visualization can be humorous also. My favorite set of graphs and charts from Buzzfeed is “12 Graphs and Charts that Perfectly Illustrate What It’s Liike Getting Ready in the Morning” by Adam Ellis, Buzzfeed Staff.

http://www.buzzfeed.com/adamellis/graphs-and-charts-that-perfectly-illustrate-what-its-like

These graphs depict information that we can all identify with and do it in a fun whimsical way. This article would not be as effective if it was just called “12 Things We Can All Relate To When Getting Ready In The Morning” and for #5 Types of Breakfast, Ellis just wrote there are different types of breakfasts depending on how the morning is going, but there is almost always coffee. These graphs take data visualization to a whole new level because they include actual pictures that make the data more appealing and memorable for the reader. Although, these graphs and charts do not have vital or completely accurate information, I think the authors of Data and Design would very much appreciate their design and creativity.

original-29965-1400005596-11

Week 4: GIS, the Internet and Databases

Currently I am taking a course on GIS which forces us to use a system called QGIS. Using shapefiles, we create maps with layers of “data” applied to them. When looking at the database of names in the African Slave Trade website, it looked almost exactly like the attribute tables that we use for GIS. Indeed these are databases except they are organized and linked in a way as to be applied to a map. With the right equipment, mathematics can be done on the nominal data. Another example of a database would be our course catalog. It holds information about class descriptions, id’s, professors, units, requirements, time, and place. All students are familiar with using this database and the organization of it ensures easy selection of classes for the coming quarter. The odd thing about this system is that classes leave and enter the database based on availability. While there is a database with each class,, most students will only ever see the ones being offered. The database is therefore redesigned each quarter based on the need of certain classes. Data is taken out of the complete database and is formulated into a comprehensible catalog with only the necessary information. In this perspective, the database is specific to the problem being addressed: finding classes for next quarter.

Also reading the Kissinger article, I am struck by the age of the comment itself. Nowadays if something is online or in the “cloud” it remains forever. A good example would be nude celebrity photos that are leaked into public space. Anything on the internet will exist forever. I recall a quote from the movie The Social Network, “the internet is written in ink.” I believe Kissinger is wrong. A paper can be destroyed but the internet keeps everything and can at any given point, be retrieved. Ultimately the internet is just a large database of sites and information that can be lost or maintained. In Web of Science, students search for key terms that are stored in the database and referenced in the metadata. With extended searches, you can even narrow down your focus by limiting the search to file types and subject. Oddly enough, it seems that every search engine is just a program/ method for finding something in a large database using key terms. I figure every internet site must therefore have a database behind it if there is to be any hope of organization or storage of information.

http://www.qgis.org/en/site/

http://www.theguardian.com/commentisfree/2014/sep/01/celebrity-naked-photo-leak-2014-nude-women

http://www.registrar.ucla.edu/schedule/schedulehome.aspx

Week 4: Databases Visualized

“If [this] data were published in books, a bookshelf 450 miles long would be required to hold them” (Kroenke & Auer). This quote from “Database Concepts” made me think about different ways of representing the data in a database. It could be written out in physical books, stored in limitless tables online, or it could be visualized. Since, according to this article, the “largest databases are those that track behavior,” I wanted to find a metadata visualization that could communicate that type and that volume of information.

Data Paris (http://dataparis.io/#) is a visualization of the city of Paris in the form of metadata. At first glance there are a lot of different buttons that I’m not sure what to do with, and this is a problem that I’m assuming one runs into when trying to turn so much data into a simple graphic – it would make sense to translate the idea of rows and columns (from a traditional database) into this visual, because the logic of such a structure would be easy to understand. I began to understand the website after playing around with it for a while, but the context of “Paris” was lost for me because I am not familiar with the area. I did, however, find patterns in the metadata that I wouldn’t have been able to detect as easily without visuals. I started by looking at areas with the least amount of single people. I noticed that these areas also had the most married people, most retirement aged people, least population density, most home owners, and highest home prices. All of these metadata statistics made sense that they would go together, so it was cool to click on metadata categories and predict which areas would light up. I had to make these data connections myself, but the visuals confirmed the predictions I had made based on previous information that I had gathered from this metadata visualization.

Another metadata visualization source is http://create.visual.ly/, which allows anyone to create visualizations based on their own or chosen metadata. For example, you can log into Facebook and if you have a Page, you can see basic stats of page fans such as demographics and geographics, how your page is doing in terms of shares, views, and clicks, and data about use over all time vs last 30 days. Another visualization on this website allows you to log into twitter and search any hashtag to see metadata about its lifetime, common sources, and twitter accounts with the most influence on the hashtag. These visualizations are great ways to show relationships between gathered, available data. It puts metadata into context because it is very specific and relevant. This also means, however, that these visualizations stay very basic. They can only give you access to a limited amount of metadata and in a very specific context, but they still give a nice simple visualization timeline that provides insight through contextualized knowledge about the data.

Overall, visualizations are a great way for people to make sense of databases and turn data into knowledge. They provide a seemingly simple process and are an enjoyable way for users to learn.

Week 3- Metadata as Content Shaper

NetflixDVDReading Alexis Madrigal’s article “How Netflix Reverse Engineered Hollywood,” really blew my mind.  The amount of metadata which is created for every program on Netflix is simply astounding.  Looking at the graphs provided, for most common “adjective” ect, brought me to think about the “new” direction of Netflix: Netflix not only a server/stream for content, but also as a platform for producing new content.  Since 2012 with the release of the TV series Lillyhammer, Netflix has presented itself as a platform for releasing previously unseen content.  Recently, there has been a lot of publicity about Netflix expanding to releasing first run movies as well.

The amount of news attention brought to Netflix’s original releases, as alluded to by the links above, is massive.  A Google search for “Netflix Original Series,” produces articles from every major news publication.  Almost all of these articles report with a sense of skepticism on Netflix’s expansion practice.  All of this commotion and my newfound understanding of Netflix’s use of metadata, brought me to question how does Netflix decide what original content it will seek to produce.  Netflix recently made a deal with Adam Sandler to produce 4 straight to Netflix movies.  The article makes a simple assumption that the deal is based on the high traffic that Adam Sandler movies receive on Netflix.  However, with knowledge of Netflix’s use of categorization, it seems that these deals are deeply rooted in Netflix’s highly complex classification system.

images

For instance, look at Netflix’s most recent Original Series release, Bojack Horseman.  The cartoon series, which follows the adventures of an anthropomorphic washed-up 90s sitcom star, has been meet with mixed reviews.  The reviews seem to  fall into two camps: those who praise its undeniable influence by shows such as Bob’s Burgers, 30 Rock, and Archer to name a few and those who condemn the show for not inciting the viewer with anything particularly new.  The resemblance of Bojack Horseman to other shows, which are very popular on Netflix, made me question the motives of producing this show.  Bojack Horseman was renewed for a second season shortly after the first season was released, even with this noted mixed reviews.  This is highly unusual in terms of show being renewed for another season, production is costly. Does this suggest viewing habits? Are people more inclined to watch something that falls into the same specific genre or are they repelled by this?  I am curious as to how well Netflix can produce content to fit into its “viewer recommendations.”  Whereas other shows, such as House of Cards and Orange is the New Black, have been highly successful, it feels like the jury is still out on if “Bojack” is able to establish itself as both a similar and a unique show.

Screen Shot 2014-10-19 at 12.55.01 PM

For this post I desperately wanted to get a screen shot, showing Bojack Horseman under a highly specified category with other similar shows.  Of course, Murphy’s law hindered my search and the best I could get was without a specified classification. Thinking about Netflix after this article, makes me question when categorization gets too narrow.  In the article, it is mentioned that often only a few films will exist in a highly specified classification.  As humans seeking entertainment, do we want to stay exactly  in the classification of the predessor or is the key to move a slightly different classification? (dark dramas about pigs as opposed to dark dramas about house pets for an obtuse example) Moreover, it brought up the question of what happens when the classifier (Netflix) produces its own content for the classification system? Can this be done without bias?  Does bias of the information provider even matter? Does it jeopardize the efficiency of the system or does it make things easier to find? I guess in the case of Netflix, time will only tell.

On the Netflix Quantum Theory, or, Microtags

In her article “How Netflix Reverse Engineered Hollywood”, Alexis Madrigal explores the categories that divide the genres and sub genres of movies and TV shows on Netflix, how the company structured what amasses up to over 76,000 “micro tags” as they call it. These tags that were built through an algorithm, where professionally trained (with a 36-page packet on how to watch and rate movies) movie-watchers tagged each movies using the commonly repeated adjectives and the program divided those adjectives in a systemized order, of date of the production, name of the producers, actors, targeted audience, and many more specific sub genres.
So confronting this unprecedented, jarring number of tags that categorize the movies and TV shows we see (or will see) on Netflix, we naturally ask ourselves questions: what was the reason behind Netflix creating such system? how does this system affect the users’ viewing experiences on Netflix? and furthermore, how does this innovation reflect on the present and future of our cultures and societies?
When Todd Yellin, the Vice President of Product Innovation at Netflix, came up with this tagging system, he had one goal in mind: “Tear apart content!” The team then tentatively named the system the “Netflix Quantum Theory” and created a guideline that “spelled out ways of tagging movie endings”, such as the “social acceptability” of lead character. They also created a rating system for each genre, from the scale of 1 to 5. The tagging and rating system continues to much deeper, much more specific level, into the happiness-to-sadness-ending, the plot, lead character’s jobs, movie locations, and everything we can compare and categorize about movies. Once the base of this tag pyramid system was built, their team of engineers created a syntax for the genres based on these microtags to create the alt genres, combining the human-built system of hand-tagging objects with a machine-based program to categorize them.
Netflix’s microtag system shows us the process of not only categorizing the objects around us but also going much deeper into the specifics of it, getting as close as it can to relate to the human’s brain process of explaining and dividing things, and beyond that, building an algorithm based on it to understand us better, which fundamentally changes the way we communicate, interact, sell, and purchase objects in the society.
Ever since the industrial revolution when products were produced in mass quantity, businesses have been developing systems to understand the market’s behaviors and thought processes in order to make products that sells, and create marketing tactics that affects the customers’ decision to choose among myriad products. As much as it was an unprecedented event to categorize movies based on tags, and sub tags, diving and sub diving, to the level of microtags, we can understand that the combination of the categorizing system we have been using to archive data from the ancient times with today’s technology of automatically systemizing the data gives us an access to understanding, and better, predicting human behaviors. We are living in intriguing times, to be able to witness the development of the mix of humanities and digitalization to understand how we ourselves work, to change the way we live based on the system that we have developed ourselves, and to open more doors to the future.
References:
Alexis C. Madrigal, “How Netflix Reverse Engineered Hollywood,” The Atlantic, January 2, 2014

Week 3: Suggesting the Right Thing

  1. Peter Lunenfeld wrote in his book The Secret War Between Uploading and Downloading: Tales of the Computer as Culture Machine- “All animals download, but only a few upload anything besides shit and their own bodies.” Through the use of suggestion algorithms such as the one that Netflix designed, users are essentially automatically uploading information every time they select a movie or post a rating.

    My Facebook is able to tell me who I may and may not know. My twitter is able to show me news based on interests or previous activity on my handle. My web browser predicts what I’m going to search and gives me options to choose them before I am even done typing. My computer basically seems to know me better than I do.

    Different social media platforms have different functions to me. Facebook is a social tool that I primarily use to connect to people, whereas Twitter is more of a place where I can think aloud in 140 characters. Through the perception that each platform can provide something different for me, my persona becomes depicted differently in each situation. In turn the things that appear on my news-feed or dashboard across each account are vastly different. Suggestions as a feature allow users to explore content that they may be interested in based on previous navigation, but arguably also restrict and discourage freely surfing other topics. Some apps and programs have taken this to the next level by completely abolishing the search function, and replacing it with ‘programmed content’ based on its relevance to you.

    The app Yeti is an example of an app that shows you content, not based on search, but location. Evolving from its conceptual predecessor “At the Pool”, Yeti utilizes one’s location to generate a feed of content generated from other users nearby.

    Sources:
    http://yeti.ai/#firstPage/discover
    Peter Lunenfeld- The Secret War Between Uploading and Downloading