Week 3: Data-produced Original Content

The article “How Netflix Reverse Engineered Hollywood” from this week’s readings discussed the author’s project to dissect Netflix’s genre system. He considered everything from the site’s tagging process to the syntax behind its famously niche genres. Though Alexis C. Madrigal was interested in Netflix’s data collection, his focus was largely on how the data contributed to Netflix’s unique categorization system. At one point in his story, Madrigal visits Netflix’s VP of Product, Todd Yellin, and although “he seems impressed at [Madrigal’s] nerdiness, he patiently explains that we’ve merely skimmed one end-product of the entire Netflix data infrastructure. There is so much more data and a whole lot more intelligence baked into the system than we’ve captured.” Madrigal’s focus was rather specific, fitting considering he was analyzing genres known for their alarming specificity, but his conversation with Yellin hinted that Netflix is employing data in many innovate ways, including in the production of original content.

Netflix’s foray into original content is interesting because it has flouted many of the conventions of Hollywood filmmaking. For example, the website releases its content all at once instead of making one episode each week and its executives have refused to publish ratings because they are irrelevant to Netflix’s system. Such policies have produced many a think-piece about the television industry’s potential for change, many of which focus on how Netflix’s access to user data fuels the company’s original programming decisions. Although plenty of studios lean on statistics and ratings, Netflix has a pretty honest view of not only what people are watching but how. According to Yellin, Netflix knows if a user “plays one title, what did they play after, before, what did they abandon after five minutes?” (The Guardian) I’d be interested to learn if Netflix’s data is more helpful than that available to more traditional content creators and if other media platforms feel pressured to adopt certain features of Netflix’s data-driven model.

 

Source: http://www.theguardian.com/media/2014/feb/23/netflix-viewer-data-house-of-cards

Week 2: Meta Data of Life

The Evolutionary Tree of Life

Above is an image of the three largest branches of the Phylogenetic Tree of life, which is much larger and detailed that what is shown above. As I read “Classification and its Structures” by C.M. Sperberg-McQueen, I read, “Classification is, strictly speaking, the assignment of something to a class; more generally, it is the grouping together of objects into classes. A class, in turn, is a collection […] of objects which share some property.” Reading this, I instantly thought of the well known classification system for all living organisms, the Evolutionary Tree of Life.

In seventh grade biology (or ninth grade biology depending on the school you attended), you learn about a man named Charles Darwin, a British scientist from the 1800s who traveled on a five year expedition aboard the HMS Beagle. After seeing different species of animals with different traits from other species that they resemble, Darwin came to the conclusion that all life had one common ancestor and that through Natural Selection, species began branching out to form new species based on the environment.

Why do I bring up Charles Darwin? Classification is a large part of his theory and much of his research went towards classifying every living organism and by creating these classes, he gave us a map to discovering where we fall in the history of life.

We can also look at zoology, the study of animals, which looks at a smaller branch of the Tree of Life. If you go to the Colorado State University Libraries website (a link is provided below), you will find a list of animals listed in alphabetical order by their common names and their species is shown to the right. Each animal species belongs to a genus that in turn belongs to a family and so on until it becomes a matter of what is considered living and what is not.

If you look at the list, you can look at all the different animals and even change the list from alphabetical order of common names to alphabetical order of genus and species. We can take two animals of the same genus, the Mallard (Anas platyrhynchos) and the Hottentot Teal (Anas punctata), and notice that despite their closeness in the Tree of Life, they have differences in characteristics like the shape of their heads. Hottentot Teals have very curvy heads while Mallards have bulb-like heads. Overall classifying animals and living organisms in general makes it much easier for us to identify what species of living organism we are dealing with and it has the same effect with anything you want to classify whether it be movies, food, or any other topic.

Works Cited:

Week 3: Netflix

“How Netflix Reverse Engineered Hollywood” by Alexis C. Madrigal really focused on how Netflix came up with such a large number of genres that the users could pick from. As an avid Netflix user, I have myself witnessed movie suggestions: after watching The Walking Dead I was suggested other zombie-like movies right after. And I have experienced this with numerous websites/apps!

I’m also an avid Instagram user and have been using it for the past 2+ years. Recently I have noticed that right after I click the “follow” button when I come upon a new user I like, a little drop-down section falls right under it with 3 “suggested users.” I found this extremely strange as I wonder what is the criteria that makes these users of a potential interest to me. If I follow a friend from high school, let’s say, maybe I recognize a friend of that user will pop up in the “suggested user” space. This makes sense to me as maybe they’ve tagged each other in pictures. This friend make have only had 100 followers or maybe followed themselves 100 people. The pool is much smaller to choose those suggested users from.

But sometimes I follow users that have 80,000+ followers (maybe because they post pleasant nature pictures!) and maybe they follow themselves 500 people. The pool to choose the suggested users is greater. Where do these 3 users come from?

One possibility I can think of is similar hashtags. Maybe the suggested users use many of the same hashtags? But this seems unlikely because if one person inputs #nature, there are probably thousands of entries for that hashtag. Another possibility is the similarity in users. If there are multiple people who are into nature pictures, they will follow many users that post about nature. But without hashtags how does instagram know what the content of the picture is? Maybe if they post a relevant caption or use the geotag. It gets messy really soon. It is incredibly difficult for me to decipher it, but like Netflix, Instagram must have a algorithm to everything.

Another feature instagram has is “explore.” I have gone to this feature multiple times and each time is something different. Perhaps the first time it was full of nature pictures but the second time it was full of make-up pictures. It is obviously trying to cater to my interest – but how does it know my interests? I soon realized it depends on recently viewed photos.

Maybe someday someone will try to figure out Instagram’s algorithm, although I am sure it is not as crazy as Netflix’s!

For reference:

http://www.instagram.com

http://www.technobuffalo.com/2014/07/11/instagram-suggested-user-feature-quietly-rolls-out/

The Napoleon Dynamite Problem

http://www.nytimes.com/2008/11/23/magazine/23Netflix-t.html?pagewanted=all&_r=0

http://genresofnetflix.tumblr.com

http://www.netflixprize.com

As just one of the millions of Netflix subscribers and self-diagnosed binger, I have definitely spent many long nights getting familiarized with the altgenre system implemented into the streaming media site. I’ve been avidly using Netflix since 2011, but I’ve only really started taking notice of some of its extremely specific genres until just this year. With thousands of titles to sort through on Netflix, their personalized genres are definitely useful, maybe a bit absurd, but still useful. My personal favorites are “hidden gems” or “visually-striking movies” where in these categories I can usually find many independent and quirky films that are difficult to describe.

It is apparent that there is a growing trend of implementing these personalizing algorithms into more and more media sources including the likes of Spotify, Amazon, and Soundcloud. What I find most intriguing and even slightly disturbing about Netflix’s system is the crossover of both human and machine intelligence. It has come to the point where you can probably learn a lot about a person’s interests by simply looking through their Netflix account. In order to achieve this, Netflix engineers definitely had a strong input in creating micro tags for these films based on the Netflix Quantum Theory which makes me question how objective the algorithm system remains to be. There seems to be an ideology similar to the whole “I know it when I see it” expression that is crossing into Netflix’s system.

Netflix has evolved their past system that was based more heavily on numerical values and user ratings to a more human method of introspection. Todd Yellin, VP of product innovation at Netflix had this to say about their new approach:

“Predicting something is 3.2 stars is kind of fun if you have an engineering sensibility, but it would be more useful to talk about dysfunctional families and viral plagues. We wanted to put in more language,”

I think it was a very progressive approach for Netflix that also reveals some very interesting quirks about the relationship between categorizing systems and human nature. Atlantic’s article on Netflix’s genre algorithm system mentions the $1 million prize that the company had offered back in 2006 which reminded me of the Napoleon Dynamite problem. As a film, Napoleon Dynamite seems to be the most difficult movie to pinpoint and recommend to Netflix users. The quirky film remains to be the most stubbornly unpredictable movie as it is attracts many users to rate the film while still being hard to predict. This imbalance, while probably a headache for Netflix developers and engineers, is to me a very humorous quirk in the system that shows how difficult it is to categorize human interests and behaviors.

Bliss: Crafting a Successful Symbology

Screen Shot 2014-10-19 at 4.13.33 PM Stained glass art by Shirley McNaughton, called “Communication.” It’s composed of 10 Bliss symbols.

What is a sign? According to the French philosopher Charles Sanders Peirce, signs are: “something, which stands to somebody for something in some respect or capacity.” At least, that’s the one of the definitions Professor Erkki Huhtamo of the Design/Media Arts program offered us in lecture last week. We broke it down further by dividing up signs into two categories: the “signifier,” or the visible/interpretable form the sign takes, and the “signified,” or the idea the sign expresses. Essentially, signs are defined by humans. Nothing is a sign unless it can be interpreted through a shared culture or ontology.

This past summer, I found myself listening to a wonderful episode of the RadioLab podcast, entitled “Bliss.” One of the subsections of this episode focused on the story of one man named Charles Bliss, who created a system of signs entitled “Bliss Symbolics.” Like many of his time, Mr. Bliss was disillusioned with the dystopian chaos of the post-WWII world, and in turn believed he could heal the miscommunication and destruction he saw around him through a universal system of iconic signs, which all humans would be able to use in order to understand and communicate with one another, regardless of language.

Unfortunately for Bliss, his system of symbols never took root in the global manner that he had envisioned. However, in 1971 a nurse named Shirley McNaughton began using Bliss Symbolics to help children with cerebral palsy develop language skills. Eventually, these children were able to speak basic English by developing written skills using Bliss Symoblics to communicate with their instructors. Over time, Bliss’ signs began to develop to meet the specific needs of children, which eventually traveled around the world. In each place, the symbols would inevitably become tweaked to fit the rules and linguistic ontologies specific to that culture. Bliss Symoblics in Israel were written from right to left, because Hebrew is written from right to left. Bliss’ hope of a universal and unchanging semiotic language was a complete failure.

In many regards, the problem of “mismatched ontologies” presented by Wallack and Srinivasan link directly to the discussion of how the creation and reading of Bliss’ signs played into human culture, education and communication. The localization and specification of Bliss’ signs to small groups of children around the world reminded me of the problems states face in developing broad ontologies that attempt to force large groups of diverse people together in a binary census. The signs that were appropriated from Bliss’ semiotics proved successful in teaching children to speak because they were modified to fit local customs and cultures. In their article, Wallack and Srinivasan point to this exact issue and explain how “any object, attribute, category or relation included within a local ontology could be included in a meta-ontology…there is no reason [Governments] could not also incorporate folkloric relations that guide community perceptions.” Inevitably, local communities know best what it is that is required to successfully educate their children. If governments give their citizens the ability to define themselves on a local level like many different groups did using Charles Bliss’ symbols, I think information loss and infrastructural dysfunction could be significantly diminished in the future across the globe.

Check out the podcast here: http://www.radiolab.org/story/257194-man-became-bliss/

Week 3: Course Evaluation Forms and Mismatched Ontologies

image

This is a screencap of the webpage for the Evaluation of Instruction Program’s (EIP) course evaluations. Toward the end of each quarter, students at UCLA receive emails to complete and submit these forms, but instructors give the impression that few students actually fill them out. They always stress the importance of this feedback mechanism to improve the quality of instruction and to better serve their students.

This source is related to the discussion in “Local-Global: Reconciling Mismatched Ontologies in Development Information Systems” by Jessica Seddon Wallack and Ramesh Srivasan. It illustrates how the school collects data on the academic activities of community members, in this case faculty and students, in order to make better decisions about policy. However, this evaluation form also illustrates the phenomenon of mismatched ontologies between students and administration. Whereas common academic problems for students may involve thoughts like “I don’t really get what this assignment is asking me to do and how I’m supposed to do it,” “My reading comprehension and note-taking skills need work,” “I don’t know how I should be preparing for the exam,” or “It takes a really long time to do all of the reading and writing assignments,” this form was not designed to address such issues despite its role in improving the quality of education. As far as I know, there is currently no mechanism directed at collecting data on improving academic services, and unsurprisingly there is no Academic Skills Center that provides formal training in effective learning skills. This evaluation form demonstrates how the administration’s meta ontology influences how it attempts to address community problems, but has difficulty taking local knowledge into account and therefore incurs an information loss.

Wallack and Srivasan make several recommendations for how meta ontologies can incorporate local knowledge. The first is to develop collaborative and inclusive ontologies. This online form does not provide for student input on what questions are asked and which ones are the most important, though the technological capability does exist. The second recommendation is to allow the community to provide feedback on the data that the administration has collected, and to help them make good critiques of that data through education and appropriate communication strategies. Currently, the contents of these evaluations are confidential, and students remain unaware of how other students felt about the course (outside of the usual gossip), or more importantly how the administration understands the data. Finally, the third recommendation is to provide for alternative means of communication and decentralizing decision-making to more local levels. The final comments section on the form allows for some flexibility regarding the former, though the response may not be relevant to the question, while the latter issue is beyond the scope of this form entirely.

Decoding Netflix compared to Plateau People’s Web Portal

Source: http://www.slashgear.com/wp-content/uploads/2009/06/apple_macbook_pro_13-inch_teardown_1.jpg
Source: http://www.slashgear.com/wp-content/uploads/2009/06/apple_macbook_pro_13-inch_teardown_1.jpg

This is an image of all the bits and pieces that go into a Macbook computer. It reminds me of the difference between the Plateau People’s Portal and Madrigal’s Netflix exploration. Just as the computer must be built before it can be broken down, a website must be put together before the semantics that make up that website can be revealed.

The difference between creating content and cracking method by which content is created is the difference between building up and breaking down. Thousands of memories and years of history go into creating the background of the Plateau People, just as thousands of directors and actors put together the movies that make up Netflix’s endless categorization. All those moments and all those movies were uploaded meticulously into a data base, then made public for browsing and viewing. Breaking down that content, however, takes just a few people and some really good data recovery software. This can be seen when comparing Madrigal’s “How Netflix Reverse Engineered Hollywood” and Washington State University’s “Plateau People’s Web Portal”.

Clicking to the “About” page on the Plateau People’s Web Portal reveals that “tribal administrators, working with their tribal governments, have provided information and their own additional materials to the portal as a means of expanding and extending the archival record.” Memories, artifacts, dates and events were used to create a comprehensive history of the Plateau people. The curators pulled out the most potent pieces of information, deciding what must be shown versus what can be thrown away. All this human effort prevents the website from showing random outliers.

To crack Netflix’s “alt-genre” movie categorization algorithm, Madrigal used a plethora of software and equations. She states the programs took over 20 hours to grab all of Netflix’s possible URLs and patterns, a feat that would have taken years to accomplish in the absence of said programs. Although interpreting and finding patterns in all the data could have only been done by a human, she heavily relied on technology to get the information she decoded.

At the end of the article, Madrigal reveals the Perry Mason effect, where there are an outstanding number of categories for a person most Americans today cannot name, making it clear that the algorithm cannot decide which information is unimportant or an outlier).

Altogether, this shows that although equations and technology are both essential in cataloguing the information we use today, there is no substitute for human effort.

Week 3 – Netflix / 8tracks

When I saw the reading list for this week, I was immediately drawn to the article about Netflix, “How Netflix Reverse Engineered Hollywood” by Alexis C. Madrigal. I consider myself to be an avid Netflix binge-watcher, so I was intrigued to see what this blogger had to say about Netflix. The other day my friend and I were talking about how there’s so much on Netflix, but sometimes I still don’t know what to watch. The different categories can be overwhelming to say the least. Madrigal’s article furthered this point I had and opened my eyes to the absurd amount of movie categories Netflix has. This got me wondering if the genres that didn’t even have any movies in them served a purpose at all. Was it supposed to let Netflix know if they should add more to the “”Feel-good Romantic Spanish-Language TV Shows” genre because people were searching for it?

The way Netflix categorizes movies and tv shows reminded me of the “explore” feature on http://8tracks.com. 8tracks is similar to Pandora in the fact that it’s a free Internet radio, but the difference is that you look up playlists compiled by other users rather than stations by an artist or song. When searching for a playlist on 8tracks, you can go to the “explore” tab and from there, you search through the 1,706,776 playlists available using preset tags or by searching for something specific.

Screen Shot 2014-10-17 at 10.00.52 AM

8tracks’ system is different because you can add multiple tags. And the tags aren’t just based upon genre; you can also find the right playlist by typing in “any mood, genre or activity”. For example, when I’m looking for playlists to listen to when I study, I normally start with the tag “indie” and then from there depending on my mood, I normally either go with “chill” next or sometimes “folk”.

Screen Shot 2014-10-17 at 10.01.04 AM

Screen Shot 2014-10-17 at 10.01.24 AM

Screen Shot 2014-10-17 at 10.01.11 AM

When I’m working out, I start with “running” and then normally either chose “hip hop” or “pump up”. Even if you search for the same tags every time, you’ll find new playlists because they are constantly being added by other users. I think it would be helpful if Netflix had a similar system for searching through their database. Netflix has so many specific tags, but I always find it somewhat difficult to find something that fits what I want to watch exactly unless I know that I’m looking for something specific like “Grey’s Anatomy” or “Gossip Girl”.

 

W3 – Data-mining, Classification, and Research

How a Math Genius Hacked OkCupid to Find True Love

This week’s readings reminded me of an interesting article about a Chris McKinlay, a UCLA grad student who “hacked OK Cupid to find the girl of his dreams.” Some friends shared it on Facebook months ago; apparently he was a TA in one of their lower-div math classes. It was interesting to read about his process and the visualizations included in the article were striking as well.

 

Chris McKinlay used Python scripts to riffle through hundreds of OkCupid survey questions. He then sorted female daters into seven clusters, like “Diverse” and “Mindful,” each with distinct characteristics.
Chris McKinlay used Python scripts to riffle through hundreds of OkCupid survey questions. He then sorted female daters into seven clusters, like “Diverse” and “Mindful,” each with distinct characteristics.

 

His mathematical approach to online dating reminds me of how Alexis C. Madrigal reverse engineered Netflix’s vocabulary and grammar in “How Netflix Reverse Engineered Hollywood.” Both McKinlay and Madrigal started their projects with data-mining scripts. Once they had a sizable data set, they looked for patterns and then ran tests to (dis)prove these hypotheses. However, before they could do this, they needed a classification system. In McKinlay’s case, this meant “seven statistically distinct clusters based on…[women’s] questions and answers.” Once grouped into seven clusters such as “God,” “Tattoo,” and “Samantha” (nomenclature was nonstandard) with distinct characteristics, McKinlay could target women from a specific cluster with a profile tailored to their interests. For Madrigal, classification meant organizing genre descriptors into categories such as “Region,” “About…,” and “Based on….” A Netflix genre was a subset of these components that followed specific grammar rules.

 

McKinlay and Madrigal’s situation was unique because they were both hacking an established data set. Their data was pre-tagged, which made the process of classifying and pattern hunting much easier. In Madrigal’s case, Netflix’s movie taggers broke down movie content into “quanta” or “microtags” that could be fed into computer algorithms. The 76,897 altgenres scraped by Madrigal’s script were the product of these algorithms. In this way, Madrigal was working with “metametadata” or data about data about data. In contrast, the authors of Plateau Peoples’ Web Portal had to build their dataset from the ground up. According to the “About” page of the site, they were faced with the daunting task of curating a diverse collection of Native people’s cultural materials with varying metadata. A classification standard was set that would allow for both consistency and flexibility throughout the collection: “There are nine main categories (users can use the browse section of the portal to view these) within the portal. Each tribe can then add their own subcategories refining the typology further to allow for greater precision and flexibility in searching.”

 

Although McKinlay and Madrigal’s classification process may not have been as extensive as the the authors of Plateau Peoples’ Web Portal, their approach to metametadata was fascinating. I enjoyed reading about reverse-engineering large, cryptic datasets and using them in new ways.

Week Three: Classification, Continued; Research Techniques

Picture8

Alexis C. Madrigal’s article How Netflix Reverse Engineered Hollywood was really fascinating to read. As an avid Netflix user, I used to take these genre titles at face value. I recognized that my watching patterns were probably noted by the Netflix system and therefore suggested similar titles. I was shocked to find out the back-end of this categorization system. Not only does Madrigal’s unique research technique illustrate the complexity of rationalizing such a gigantic database, it also suggests the ideological effects of various systems of classification.

In Sorting Things Out, authors Geoffrey C. Bowker and Susan Leigh Star define classification as “a set of boxes (metaphorical or literal) into which things can be put to then do some kind of work – bureaucratic or knowledge production”. They identify the three key characteristics of an ideal classification system as “There are unique classificatory principles in operation…These categories are mutually exclusive…The system is complete” (10-11). However, Bowker and Star continue their argument to say that “no real-world working classification system that we have looked at meets these ‘simple’ requirements and we doubt that any ever could” (11). The Netflix genre generator does indeed have “literal” which are checked on a rating system. Its classification system does produce knowledge to the company, informing it of its consumers’ likes and dislikes, an obvious advantage in gaining and retaining viewers.

Madrigal explains Netflix’s tagging system in laymen’s terms, “Using large teams of people specially trained to watch movies, Netflix deconstructed Hollywood. They paid people to watch films and tag them with all kinds of metadata. This process is so sophisticated and precise that taggers receive a 36-page training document that teaches them how to rate movies on their sexually suggestive content, goriness, romance levels, and even narrative elements like plot conclusiveness…they even rate the moral status of characters” (Madrigal). While there is a human input to this system, the Netflix genre generator acts as an unprecedented catalyst between man and machine. Madrigal observes, “There’s something in the Netflix personalized genres that I think we can tell is not fully human, but is revealing in a way that humans alone might no be”. In this way, Netflix is “a tool for introspection”. Its unique categorization system sheds light on human’s reliance on machines to even tell us what we like. Can a machine capture the innate, complex human tendency to feel emotionally drawn to something?

A similar project that came to mind (which was actually mentioned in the article) is Pandora’s Music Genome Project. Much like Netflix, Pandora analyzed millions of songs “using up to 450 distinct musical characteristics by a trained musical analyst. These attributes capture not only the musical identity of a song, but also the many significant qualities that are relevant to understanding the musical preferences of listeners” (Pandora.com). Before really reading anything about the Music Genome Project specifically, I had a thought that the categorization of music would be much harder than movies. Relatively, movies tend to follow trends, while music has a longer history and many, many iterations. While it tries to do something similar to the Netflix personalized genres, it is much more ambitious of Pandora to distill this medium. For example, critics of the Music Genome Project pointed out the social aspect of music, “Music is traditionally a more collective experience…that aspect shows itself very powerfully in the way we consume music in society. We want what other people are having” (Wilkinson). Although Pandora is invested in advertising to its listeners in a similar way to Netflix, the medium of music definitely has its limitations.