October 2014 – Page 5 – DH101: Fall 2014

Week 4: GIS, the Internet and Databases

October 24, 2014October 24, 2014 amymchan Leave a comment

Currently I am taking a course on GIS which forces us to use a system called QGIS. Using shapefiles, we create maps with layers of “data” applied to them. When looking at the database of names in the African Slave Trade website, it looked almost exactly like the attribute tables that we use for GIS. Indeed these are databases except they are organized and linked in a way as to be applied to a map. With the right equipment, mathematics can be done on the nominal data. Another example of a database would be our course catalog. It holds information about class descriptions, id’s, professors, units, requirements, time, and place. All students are familiar with using this database and the organization of it ensures easy selection of classes for the coming quarter. The odd thing about this system is that classes leave and enter the database based on availability. While there is a database with each class,, most students will only ever see the ones being offered. The database is therefore redesigned each quarter based on the need of certain classes. Data is taken out of the complete database and is formulated into a comprehensible catalog with only the necessary information. In this perspective, the database is specific to the problem being addressed: finding classes for next quarter.

Also reading the Kissinger article, I am struck by the age of the comment itself. Nowadays if something is online or in the “cloud” it remains forever. A good example would be nude celebrity photos that are leaked into public space. Anything on the internet will exist forever. I recall a quote from the movie The Social Network, “the internet is written in ink.” I believe Kissinger is wrong. A paper can be destroyed but the internet keeps everything and can at any given point, be retrieved. Ultimately the internet is just a large database of sites and information that can be lost or maintained. In Web of Science, students search for key terms that are stored in the database and referenced in the metadata. With extended searches, you can even narrow down your focus by limiting the search to file types and subject. Oddly enough, it seems that every search engine is just a program/ method for finding something in a large database using key terms. I figure every internet site must therefore have a database behind it if there is to be any hope of organization or storage of information.

http://www.qgis.org/en/site/

http://www.theguardian.com/commentisfree/2014/sep/01/celebrity-naked-photo-leak-2014-nude-women

http://www.registrar.ucla.edu/schedule/schedulehome.aspx

Week 4: Databases Visualized

October 24, 2014October 24, 2014 dparnanen Leave a comment

“If [this] data were published in books, a bookshelf 450 miles long would be required to hold them” (Kroenke & Auer). This quote from “Database Concepts” made me think about different ways of representing the data in a database. It could be written out in physical books, stored in limitless tables online, or it could be visualized. Since, according to this article, the “largest databases are those that track behavior,” I wanted to find a metadata visualization that could communicate that type and that volume of information.

Data Paris (http://dataparis.io/#) is a visualization of the city of Paris in the form of metadata. At first glance there are a lot of different buttons that I’m not sure what to do with, and this is a problem that I’m assuming one runs into when trying to turn so much data into a simple graphic – it would make sense to translate the idea of rows and columns (from a traditional database) into this visual, because the logic of such a structure would be easy to understand. I began to understand the website after playing around with it for a while, but the context of “Paris” was lost for me because I am not familiar with the area. I did, however, find patterns in the metadata that I wouldn’t have been able to detect as easily without visuals. I started by looking at areas with the least amount of single people. I noticed that these areas also had the most married people, most retirement aged people, least population density, most home owners, and highest home prices. All of these metadata statistics made sense that they would go together, so it was cool to click on metadata categories and predict which areas would light up. I had to make these data connections myself, but the visuals confirmed the predictions I had made based on previous information that I had gathered from this metadata visualization.

Another metadata visualization source is http://create.visual.ly/, which allows anyone to create visualizations based on their own or chosen metadata. For example, you can log into Facebook and if you have a Page, you can see basic stats of page fans such as demographics and geographics, how your page is doing in terms of shares, views, and clicks, and data about use over all time vs last 30 days. Another visualization on this website allows you to log into twitter and search any hashtag to see metadata about its lifetime, common sources, and twitter accounts with the most influence on the hashtag. These visualizations are great ways to show relationships between gathered, available data. It puts metadata into context because it is very specific and relevant. This also means, however, that these visualizations stay very basic. They can only give you access to a limited amount of metadata and in a very specific context, but they still give a nice simple visualization timeline that provides insight through contextualized knowledge about the data.

Overall, visualizations are a great way for people to make sense of databases and turn data into knowledge. They provide a seemingly simple process and are an enjoyable way for users to learn.

Week 3- Metadata as Content Shaper

October 22, 2014October 22, 2014 Torischmitt Leave a comment

Reading Alexis Madrigal’s article “How Netflix Reverse Engineered Hollywood,” really blew my mind. The amount of metadata which is created for every program on Netflix is simply astounding. Looking at the graphs provided, for most common “adjective” ect, brought me to think about the “new” direction of Netflix: Netflix not only a server/stream for content, but also as a platform for producing new content. Since 2012 with the release of the TV series Lillyhammer, Netflix has presented itself as a platform for releasing previously unseen content. Recently, there has been a lot of publicity about Netflix expanding to releasing first run movies as well.

The amount of news attention brought to Netflix’s original releases, as alluded to by the links above, is massive. A Google search for “Netflix Original Series,” produces articles from every major news publication. Almost all of these articles report with a sense of skepticism on Netflix’s expansion practice. All of this commotion and my newfound understanding of Netflix’s use of metadata, brought me to question how does Netflix decide what original content it will seek to produce. Netflix recently made a deal with Adam Sandler to produce 4 straight to Netflix movies. The article makes a simple assumption that the deal is based on the high traffic that Adam Sandler movies receive on Netflix. However, with knowledge of Netflix’s use of categorization, it seems that these deals are deeply rooted in Netflix’s highly complex classification system.

For instance, look at Netflix’s most recent Original Series release, Bojack Horseman. The cartoon series, which follows the adventures of an anthropomorphic washed-up 90s sitcom star, has been meet with mixed reviews. The reviews seem to fall into two camps: those who praise its undeniable influence by shows such as Bob’s Burgers, 30 Rock, and Archer to name a few and those who condemn the show for not inciting the viewer with anything particularly new. The resemblance of Bojack Horseman to other shows, which are very popular on Netflix, made me question the motives of producing this show. Bojack Horseman was renewed for a second season shortly after the first season was released, even with this noted mixed reviews. This is highly unusual in terms of show being renewed for another season, production is costly. Does this suggest viewing habits? Are people more inclined to watch something that falls into the same specific genre or are they repelled by this? I am curious as to how well Netflix can produce content to fit into its “viewer recommendations.” Whereas other shows, such as House of Cards and Orange is the New Black, have been highly successful, it feels like the jury is still out on if “Bojack” is able to establish itself as both a similar and a unique show.

For this post I desperately wanted to get a screen shot, showing Bojack Horseman under a highly specified category with other similar shows. Of course, Murphy’s law hindered my search and the best I could get was without a specified classification. Thinking about Netflix after this article, makes me question when categorization gets too narrow. In the article, it is mentioned that often only a few films will exist in a highly specified classification. As humans seeking entertainment, do we want to stay exactly in the classification of the predessor or is the key to move a slightly different classification? (dark dramas about pigs as opposed to dark dramas about house pets for an obtuse example) Moreover, it brought up the question of what happens when the classifier (Netflix) produces its own content for the classification system? Can this be done without bias? Does bias of the information provider even matter? Does it jeopardize the efficiency of the system or does it make things easier to find? I guess in the case of Netflix, time will only tell.

On the Netflix Quantum Theory, or, Microtags

October 20, 2014October 20, 2014 susiemielekim Leave a comment

In her article “How Netflix Reverse Engineered Hollywood”, Alexis Madrigal explores the categories that divide the genres and sub genres of movies and TV shows on Netflix, how the company structured what amasses up to over 76,000 “micro tags” as they call it. These tags that were built through an algorithm, where professionally trained (with a 36-page packet on how to watch and rate movies) movie-watchers tagged each movies using the commonly repeated adjectives and the program divided those adjectives in a systemized order, of date of the production, name of the producers, actors, targeted audience, and many more specific sub genres.

So confronting this unprecedented, jarring number of tags that categorize the movies and TV shows we see (or will see) on Netflix, we naturally ask ourselves questions: what was the reason behind Netflix creating such system? how does this system affect the users’ viewing experiences on Netflix? and furthermore, how does this innovation reflect on the present and future of our cultures and societies?

When Todd Yellin, the Vice President of Product Innovation at Netflix, came up with this tagging system, he had one goal in mind: “Tear apart content!” The team then tentatively named the system the “Netflix Quantum Theory” and created a guideline that “spelled out ways of tagging movie endings”, such as the “social acceptability” of lead character. They also created a rating system for each genre, from the scale of 1 to 5. The tagging and rating system continues to much deeper, much more specific level, into the happiness-to-sadness-ending, the plot, lead character’s jobs, movie locations, and everything we can compare and categorize about movies. Once the base of this tag pyramid system was built, their team of engineers created a syntax for the genres based on these microtags to create the alt genres, combining the human-built system of hand-tagging objects with a machine-based program to categorize them.

Netflix’s microtag system shows us the process of not only categorizing the objects around us but also going much deeper into the specifics of it, getting as close as it can to relate to the human’s brain process of explaining and dividing things, and beyond that, building an algorithm based on it to understand us better, which fundamentally changes the way we communicate, interact, sell, and purchase objects in the society.

Ever since the industrial revolution when products were produced in mass quantity, businesses have been developing systems to understand the market’s behaviors and thought processes in order to make products that sells, and create marketing tactics that affects the customers’ decision to choose among myriad products. As much as it was an unprecedented event to categorize movies based on tags, and sub tags, diving and sub diving, to the level of microtags, we can understand that the combination of the categorizing system we have been using to archive data from the ancient times with today’s technology of automatically systemizing the data gives us an access to understanding, and better, predicting human behaviors. We are living in intriguing times, to be able to witness the development of the mix of humanities and digitalization to understand how we ourselves work, to change the way we live based on the system that we have developed ourselves, and to open more doors to the future.

References:

Alexis C. Madrigal, “How Netflix Reverse Engineered Hollywood,” The Atlantic, January 2, 2014

Week 3: Suggesting the Right Thing

October 20, 2014October 20, 2014 nathanchan95 Leave a comment

Peter Lunenfeld wrote in his book The Secret War Between Uploading and Downloading: Tales of the Computer as Culture Machine- “All animals download, but only a few upload anything besides shit and their own bodies.” Through the use of suggestion algorithms such as the one that Netflix designed, users are essentially automatically uploading information every time they select a movie or post a rating.
My Facebook is able to tell me who I may and may not know. My twitter is able to show me news based on interests or previous activity on my handle. My web browser predicts what I’m going to search and gives me options to choose them before I am even done typing. My computer basically seems to know me better than I do.

Different social media platforms have different functions to me. Facebook is a social tool that I primarily use to connect to people, whereas Twitter is more of a place where I can think aloud in 140 characters. Through the perception that each platform can provide something different for me, my persona becomes depicted differently in each situation. In turn the things that appear on my news-feed or dashboard across each account are vastly different. Suggestions as a feature allow users to explore content that they may be interested in based on previous navigation, but arguably also restrict and discourage freely surfing other topics. Some apps and programs have taken this to the next level by completely abolishing the search function, and replacing it with ‘programmed content’ based on its relevance to you.

The app Yeti is an example of an app that shows you content, not based on search, but location. Evolving from its conceptual predecessor “At the Pool”, Yeti utilizes one’s location to generate a feed of content generated from other users nearby.

Sources:
http://yeti.ai/#firstPage/discover
Peter Lunenfeld- The Secret War Between Uploading and Downloading

The Importance of Language in Classification: Netflix and Reddit

October 20, 2014October 20, 2014 fmanto Leave a comment

I really enjoyed Alexis C. Madrigal’s “How Netflix Reverse Engineered Hollywood” article. My curiosity is now somewhat satisfied because I have a better understanding of how Netflix has been so darn good at making recommendations for me. It’s amazing how capable the site is of categorizing 76,897 unique ways to describe types of movies. That number for me is hard to wrap my mind around. In many ways, the purpose of categorization is to help make finding information easier, but can it ever be too much where the opposite is done and the purpose of categorization is lost?

The author of the article, Madrigal, mentioned something that I found interesting. We also touched upon it in class, and that is: the importance of language in categorization aka controlled vocabulary. Madrigal states in the article, “Netflix created a vocabulary” that was used in determining how alt-genres would be categorized on the website. To even begin categorization, there needs to be some sort of agreement made by everyone who will be contributing to the categorization. So those at Netflix had to come up with a specific vocabulary that they could understand and that the audience could understand, no matter where they were from. And like Madrigal stated, they did quite a good job in pinpointing the best terms to use. I mean, “20th Century Period Pieces Based On Real Life”? It’s almost scarily too accurate, which is why Netflix is so good at making recommendations for its viewers.

This also reminded me of the extent other websites try to manage the vast number of categories on the web, and how sometimes it does not work well. Netflix, as a private company, created the language used on the site, which users have to agree with. But that is not the case on forum-like websites where the users create and agree on the common language. Reddit, a very popular site that I’m sure many of my fellow college students know all too well, has for the most part, successfully categorized a large number of sub-categorizations. There are about 6000 active subreddits online. The title of each subreddit, has in some way, been agreed upon by the users. For example, the subreddit, /r/aww features pictures, videos, and stories of all things cute (mostly animals). It might seem like a minute change of detail, but what if someone were to search for /r/awww, instead? (an added ‘w’ included in the word ‘aww’)

The official /r/aww subreddit:

The very similar /r/awww subreddit:

There are users on both subreddits and both are active, meaning posts have been submitted within the past 24 hours. And as you can see, both have users on each site, though one has far more. For those using the subreddit /r/awww, they may not know of the common agreement that /r/aww is a more popular subreddit among users, and that separates them from being able to get the same information.

Reddit has acknowledged the differences between subreddits and has begun to direct users to the other, more popular subreddits minimizing the number of categorizations on the site.

Source: Alexis C. Madrigal, “How Netflix Reverse Engineered Hollywood,” The Atlantic, January 2, 2014

Week 3: Netflix and Jeopardy!

October 20, 2014October 20, 2014 kaylee Leave a comment

The article on Netflix was particularly interesting because it put to practice the extreme level of sophistication and specificity Netflix incorporated into its genre-ization algorithm to create specialized profiles for each Netflix user. Alotting for a variety of combinations of periods, genres, and even actors, Netflix attempts to precisely categorize every film ever made, even if that category has just one film. A set of generated genres attributed to each viewer, Netflix not only has the capacity to describe what each viewer watches, but ultimately predict it. Case in-point is Netflix’s educated purchase of “House of Cards”, a show that lined up perfectly with Netflix’s average profiles. However, as the article points out, “the data can’t tell them how to make a TV show, but it can tell them what they should be making.” The automation of something as emotional and complex as movie-making/movie-watching seems to have been reduced and perhaps even mocked by Netflix’s algorithm. Yet, the opposite is true. “House of Cards”, and Netflix for that matter, are not successful because the human capacity to enjoy films can be trivialized to an algorithm but because the preferences related to enjoyment can be more accurately communicated via metadata. Thus, the creation of the media in response to these results remains an essential, anthropological product.

Watson: i.kinja-img.com/gawker-media/image/upload/s–Gbchunvr–/18mhcmpj5aul1png.png

As an avid fan of “Jeopardy!”, this article reminded me of IBM’s Watson. A supercomputer put up against “Jeopardy!’s” greatest champions, Watson easily won the contest with its huge data storage and processing capacities, as well as precise “buzzing” within a millisecond to first “question” the answer given. Its speed and accuracy improving with every clue, Watson also had a “learning” algorithm to remember combinations of answers that proved evident throughout the contest. Remarkably obvious, however, was Watson’s inability to “pattern” human thought and speech. Things like puns and jokes in clues went unregistered by Watson, and was unable to perceive answers to these trickier clues. Thankfully, Watson’s capacities were created with the medical field in mind, even though its stunning calculation abilities are hotly contested and marginalized by professionals in the medical community who worry about the economic and moral ramifications of automating medical practice (“The Robot Will See You Now”). Indeed, Watson is far from capable of understanding human gray areas like fear and morals to present proper diagnoses. However, perhaps medicine as nothing to fear – just as Netflix needs viewers and filmakers, surely medicine requires both human patients and doctors.

Netflix: “www.theatlantic.com/technology/archive/2014/01/how-netflix-reverse-engineered-hollywood/282679/”

Watson: “www.theatlantic.com/magazine/archive/2013/03/the-robot-will-see-you-now/309216/”

Classification, Standards and Aristotle’s 4 Causes- the Case of the Withering Rose

October 20, 2014October 20, 2014 samanthaong Leave a comment

Bowker and Starr’s discussion on classification and standards in “Sorting Things Out” led me to think about Aristotle’s 4 Causes, which he sees as an effective way of understanding objects in the world. This appears to be a more abstract but flexible way of thinking about objects in context, which would subsequently aid in its classification. The equation is as follows-

Material Cause + Formal Cause+ Efficient Cause= Final Cause

This is in contrast to a classification system with categories that are mutually exclusive- the article mentions that “a rose is a rose, not a rose sometimes and a daisy at other times”. Intuitively, this makes sense as distinct categories enable us to better identify things by reference to their specific properties. However, this seems to me to pose a problem in an age where knowledge is in flux and we look to gather information about an object as it morphs over time. While this may not always be the case for historical and archiving purposes, it is interesting to examine how classification works for objects that are highly susceptible to the progression of time and across worlds.

For instance, a rose bud differs from a blossomed or withered rose in that they are all in varied stages of development. Each stage of development in a rose will have, appended to it, a specific set of properties that are different and distinct. Yet a classification system would not be able to capture this progression and merely classify it under “rose”. This issue is exacerbated when placed in the context of the 4 Causes, as each type of rose would have a different material, formal and efficient cause, leading to a different final cause.

Material Cause: what constitutes the object (A rose consists of a thorny stem and veiny petals made of plant matter)

Formal Cause- the ratio or general form an object takes (A rose is 90% stem and 10% flower)

Efficient Cause: the thing that motivates creation/ change (pollination is the efficient cause of roses growing)

Final Cause: the aim or purpose it serves. (A rose is an organ for plant reproduction and an ornamental object)

This sort of ambiguity is one caused by a definitive change in properties of the object, as opposed to an ambiguity regarding the categories of classification a rose should fall into. This demonstrates that classification at its present stage is capable only of registering static information. While Aristotle’s system seems to account also for standards (as opposed to classification) that are able to withstand the test of time, his system is one that has standards relative to a specific object’s constituents, rather than taking these object’s constituents as change over time.

Since a withered rose has a different material, formal and efficient cause from a rosebud or a blossoming rose, Aristotle’s system would still consider the blossoming rose and withered rose to be two different things. If this is the case, how would we be able to prove and track (via either method of classification) that it was the same rose that blossomed and withered? In broader terms, how are our systems of classification working to address the idea that two objects cannot be proven to be one and same object when its properties and standards have morphed over time?

The funny thing, though, is that Aristotle’s 4 Causes continue to influence our notions of classification and standards now. This makes me curious as to the possibility of developing “real time” classification systems that can grow and track changes in the data of the object it describes.

Personalization in the Digital World

October 20, 2014October 20, 2014 christinebragg Leave a comment

http://www.theatlantic.com/technology/archive/2014/01/how-netflix-reverse-engineered-hollywood/282679/5/

https://www.facebook.com/about/privacy/advertising

The digital world is a unique place other than any other subject in society. It is unique because technology can and is being streamed into every subject or interest that we know of. Whether it be exercising, education, social interaction, studying, examining interests; everything is and will be turning digital. This can sound overwhelming and scary at times to think all the information in the world is floating around in a digital abyss. However, it is also very comforting at times to know that it can not be burnt in a fire, lost in paperwork, or at the bottom of the Indian Ocean ruined and never recorded.

The digital world is becoming so advanced that not only are we not losing information that will never be published, we are recording everything we do. A lot of us try to do this ourselves, including myself, where I try to save websites on my desktop. Eventually these websites of shopping, funny videos, articles i want to read later; becomes a blur of words and safari icons that I must organize one Sunday afternoon for hours and probably never see again.

Luckily for me and anyone else out there with an ego for wanting to organize interests all on our own, there are more sophisticated softwares and personnel doing this very same job for me in a more organized and personalized manor. Among these crafty institutes are Netflix and Facebook two amazing companies that make a huge portion of the population’s lives a whole lot easier.

Generations prior have said that this new Tech Age is making our generation and anyone else after more attention deficit. Our heads are being filled with clutter and over-load of information from the Internet. What those people do not know is that the new age is teaching us new ways to filter that over-capacity of knowledge. For example, the innovators at Netflix created a way to do this by using underlying tagging data to personalize our movie interests. Todd Yellin the VP of Product and creator of Netflix’s system has used underlying tagging data to create 76,897 genres to categorize the movies on Netflix.

What interested me the most was how this system personalizes every individual subscriber specific interests based on what her/she has watched and rated a movie. Netflix goes beyond just machine intelligence of recommending movies this way and uses a hybrid of human intelligence as well by looking at how much of romance, comedy or action based on a 1-5 rating scale is in each movie. They won’t tell you what that is, but will recommend it based on what they know about your preferences.

When I started thinking of personalization in movies I thought of Facebook, which uses this hybrid intelligence for advertisements. Facebook ads have gotten very political because people are afraid their personal information is being exposed or they will be spammed if they click on an ad with a virus. I understand this fear, however, Facebook assures us in their, “Data Use Policy” that they do not release any of our personal information, but work the opposite way by choosing ads who they have partnered with and recommend for individuals. They use the information we provide and links we have clicked on throughout Facebook to show more things that interest us. For me, I have found my favorite clothing websites, news articles, and restaurants through their personalized system. Now instead of spending time procrastinating on Facebook, I am led to 100s of other websites that can give me more substance and increase my interests rather than clicking through pictures of myself on the book.

Why the Silly Genres?

October 20, 2014October 20, 2014 JulieEdwards Leave a comment

This week my post focuses on the “How Netflix Reverse Engineered Hollywood” article. Thinking about the n-dimensional classification scheme while reading this article was interesting. The classification Netflix uses isn’t n-dimensional because the genre title can only include so many characters. But it’s not even fourth or fifth or however many identifiers can be put in the genre title. From my understanding, each classifying tag is independent from the next. If you wanted to categorize “Night of the Living Dead” it would be Zombie movie from the 1960’s, but neither “zombie movie” or “from the 1960’s” will be a branch under the other identifier. The Netflix classification scheme is almost an ultra-specific 1-dimensional classification scheme. I looked to the Netflix terms of use to see if their algorithm is mentioned at all. It’s not, but there is a clause that states you may not “engineer or disassemble any software or other products or processes accessible through the Netflix service” which would include their genre system.
However their terms do state that they are constantly updating all facets of their service, which would include their genre list. Based on what people are watching and the growing or dying popularity of a movie or TV show can effect what genres are being included. This can also be applied in the opposite fashion. In the case of House of Cards, Netflix chose to create a Machiavellian political thriller because that was a popular genre. Another clause states that “The availability of movies & TV shows to watch will change from time to time, and from country to country.” The country of origin or setting genre tag is useful to Netflix because they can see if Danish movie are popular in America and therefore should provide American customers with more Danish movies or if they are more popular in Japan and then provide Japanese customers with more Danish movie genres.
It is for this reason that the super specific genre system that Netflix uses not only benefits the viewer but also benefits Netflix itself. If somebody rates romantic comedies 5 stars and coming-of-age movies 4 stars, they will get recommended romantic coming-of-age comedies. From Netflix’s perspective, they gained data that tells them that people who watch romantic comedies also like to watch coming-of-age movies. This data let’s Netflix know the best way to group their movies and TV shows, and what type of movies they should spend money to get licensing for.

“Netflix Terms of Use.” Netflix. N.p., n.d. Web. 20 Oct. 2014. .