Course blog

Netflix Recommendations – Netflix + Scandinavian Folklore

Everyone who watches Netflix knows how easy it is to be physically unable to stop watching Netflix. This is partly because of the solid recommendations it provides, but also due to how awesome it is streaming movie after movie via Xbox on a Saturday night accompanied with Lay’s potato chips and Diet Coke. Personally, I also noticed the strange genres that Netflix would come up with to classify the movie I just watched and a potentially compatible movie that is one click away. Users, like myself, really do take for granted all the work and effort put into creating the metadata for the classification of all the movies simply so they can watch one just like it in a matter of seconds. I also want to know where I can get a job that requires you to watch movies all day.

This recommendation feature on Netflix is very convenient for users, which made me think of the idea of how it could be applied to different media or resources even. I was reminded of the program that is used in Scandinavian C171, a class I am taking about Scandinavian folk narrative. The professor actually spent many years writing a book (Danish Folktales, Legends, and Other Stories) that includes a CD that has access to a created digital database of thousands of mostly Danish folktales. This program uses metadata to classify the stories and each have a call number (e.g. DS_VII_505) that resememble those used in libraries. Metadata is also used for recommending other tales, much like how Netflix recommends, except without the goofy genre titles.

Screenshot 2014-10-20 01.22.18 Screenshot 2014-10-20 01.22.28

As seen in the screenshots of the program above, the stories’ pages provide as vast amount of information. Not only do the pages provide original manuscript transcription and translate, a map to show the origin, and dates of when it was told, but also sections dedicated to associated keywords (blue), story indices (green), and recommended stories (red). These recommended stories, much like Netflix recommendations, are for the user to continue reading without stopping, which is complete possibly because the recommended stories have different recommended stories which have different recommended stories etc. Since this use of metadata for recommendation, as seen on Netflix, also can be applied to Scandinavian folktales, there is no limit to how other media can also be grouped together and recommended at this time.

Example keywords: mound dweller, troll, ghosts, mares, coins, bottle, toad

UCLA Dininghall metadata #yum

I absolutely loved the article about netflix and felt like sharing it with everyone I know that watches netflix. I do stand up comedy and the joke photos were so funny, it’s such good material to make jokes about….. but honestly the work done by the computer programs like AntCon and Alexis Madrigal was incredible. Regarding Netflix, I sometimes didn’t like how specific the altgenres get… for example, say you babysit a little kid and watch a show with them, for the next month you get suggestions for little kid TV shows…. which I always thought was kind of dumb. Sometimes I want to see things that are completely new to me. But I do understand their approach and I think it has been very successful for the most part.  It especially creates the conditions for binge watching which is kind of an American epidemic.
Anyway, I was going to talk about how cool Pandora’s Music Genome Project is, but reading this article on Howstuffworks.com made me realize that I myself have been part of a metadata analysis group right here at UCLA.

I still would like to compare what I did, to the Pandora project and below is an except from HowStuffWorks:

“Pandora relies on a Music Genome that consists of 400 musical attributes covering the qualities of melody, harmony, rhythm, form, composition and lyrics.It’s a project that began in January 2000 and took 30 experts in music theory five years to complete. The Genome is based on an intricate analysis by actual humans (about 20 to 30 minutes per four-minute song) of the music of 10,000 artists from the past 100 years. The analysis of new music continues every day since Pandora’s online launch in August 2005. As of May 2006, the Genome’s music library contains 400,000 analyzed songs from 20,000 contemporary artists. ”

When I was living in the dorms my senior year I became a part of the Distinguished Palate Committee, and that pretty much meant I got to eat food for free at the dining halls and then rate dishes and the over all atmosphere of the dining halls. At the dining hall Feast, they specifically targeted students of Asian descent so they could make sure each dish retained it’s authenticity and the students were considered experts in their field, kind of like the experts in music theory mentioned in the quote above. They also had computers at the front of each restaurant where students got to rate dishes based on temperature, presentation, taste, etc.  And because I worked as a taste tester I got to learn about how long it took them to develop the Bruin Plate menu and it actually took them years because of the balance they had to create between being healthy and also delicious.
I think the main thing to take away from all these articles and occurrences, is the illuminating understanding that it really takes a lot of work and data to make things such a song selection or plate of food look simple and easy.
http://computer.howstuffworks.com/internet/basics/pandora.htm

 

Week 3: DDC to Netflix

DDS

 

As we take a more in-depth look at methods of classification, I reminisced about a visit to my elementary school library, where I was first introduced to the Dewey Decimal System. Sat down in front of the librarian at the tender age of 11 or 12, she explained to us how they used this relatively simple system to categorize their awe-inspiring collection of books. First established back in 1876, then revised and expanded through over 20 major editions, the Dewey Decimal Classification (known as DDC, link) is a system of numbering books based on content. Information is divided into ten broad areas, and then from there these groups are broken up into smaller and more specific topics. Topics are given call numbers, which you can look up to see what books the library has on this topic. For example, “Tigers” are given the call number 599.756.

 

I enjoyed all of this week’s articles, but “How Netflix Reverse Engineered Hollywood” definitely stood out for me from the selection. Paired with my nostalgia involving my elementary school library, I couldn’t help but think of how far classification has progressed. The article featured how Netflix creates obscure, but helpfully user-specific genres for its subscribers. The site uses a “real combination: machine-learned, algorithms, algorithmic syntax” (link). The hybrid human and machine intelligence implemented by this system shows the development of classification as the world gravitates toward a digital focus. Netflix partially abandoned a system that depended solely on numerical values, like ratings, broadening their scope to involve a bit of human introspection.

 

While topics in the DDC are very broad, like “500 Math and Science,” or “800 Literature,” this article highlighted the outside the box methods used by Netflix, such as “quanta” and “microtags” to classify their film collection and personally tailor recommendations for their users. Other user-friendly digital media sites have come to prominence in recent years, especially in the music industry. For example, Pandora’s Music Genome Project has attempted a similar formula to achieve what Netflix has, but they haven’t yet reached the success of their movie-streaming counterparts. 8tracks also comes to mind with their widespread selection of ‘tags,’ where you can find a playlist tailored especially for a certain activity, such as “classical + studying,” or “electronic + gym.” It’ll be interesting to see who branches out next and tries to add their own personal spin to classification.

The Tiers of Categorization

Food Chain

All animals and living organisms are classified within a specific tier of the food chain. These classifications have been established and molded for centuries, and help to define the general flow of survival. In this week’s reading titled Sorting Things Out, the two concepts of classification and standards are broken down into concrete definitions. In my opinion, standards serve as the foundation that allows various forms of classification to occur. Without a specific set of standards or guidelines to associate with animals or objects, classification is essentially meaningless. The article states, “a standard spans more than one community of practice… it has temporal reach as well in that it persists over time”. The scope of the standards linked to the food chain has been transformed over the years and altered to allow new classifications to be possible as groundbreaking discoveries continue to be made regarding new species. All communities have accepted the set of standards that are tied to the multitude of unique food chains.

 

The food chain has become an accepted way of ranking superiority in our world. While the sun is often seen as the main cog that turns this wheel of life, different forms of food chains can be broken down and applied to more focused groups. This version of classification within certain pre-conceived categories helps to further specify and define the different levels of consumers and producers in our ecosystems. Without the ability to use intricate categorization, ensuring all aspects of all species involved within a food chain are hashed out, it is hard to tell where the public’s level of general knowledge towards other species and organisms would be. It is true that different communities may view the categorization of some species in separate ways, but as Sorting Things Out mentions, in practicing classification and the implementation of standards, objects must be “able to both travel across borders and maintain some sort of constant identity”. The overarching layout of the general food chain and its sub categories has become embedded in today’s society. The standards that have been developed over the years will continue to change with unpredictable discoveries and worldview changes. Certain classifications may seem to be set in stone and unarguable, but there is always potential that the standards could be slightly altered with time. Categorization under the boundaries set in place by certain standards is absolutely necessary to compartmentalize society and analyze its specificities, but the way people think and process information will never stop changing and will always have a direct affect on the categorization process.

 

Sources:

Selections from Bowker and Star, Sorting Things Out (Cambridge, Ma:
MIT, 1999).

 

http://education-portal.com/cimages/multimages/16/Trophiclevels.jpg

 

“Network thinking in ecology and evolution”. http://eeb19.biosci.arizona.edu/Faculty/Dornhaus/courses/materials/papers/Proulx%20Promislow%20Phillips%20networks%20ecol%20evol.pdf

Week 3: Netflix and Metadat

The article about how Netflix reverse engineered Hollywood was extremely interesting because not only was it relevant to what we have been learning about in class, but also to my everyday life. I love watching Netflix and most of my friends do as well, but most of my friends do not have the insight into Netflix categorization and metadata that I do. I have noticed the extended categories on Netflix before, but didn’t make the connection that this was metadata until reading this article. It makes sense that they want to categorize the movies as specifically as possible.

One of the first things the article mentioned that we talked about in class was the use of controlled vocabularies. In order to correctly categorize movies, they had to pick certain words and phrases to use and orders that these words should be put in. The article spends a lot of time dissecting the controlled vocabulary of the lengthy category descriptions and figuring out how all 90,000 categories were formed. It turned out that this metadata was not all of the metadata that was used for categorizing the movies, and actually not even close to scratching the surface. When the author met up with the man that made the actual categorizations it became apparent that the metadata that went into making the classifications was much more complicated than some controlled vocabulary tags. This metadata was made up of categories that rated each movie on its main characters, romance, likeability, and main actors. All of this metadata makes up what is chosen for the public metadata categories.

This metadata is what makes Netflix so successful at not only keeping subscribers, but also getting new ones. It makes sense that they would advertise similar movies next to movies that people are watching and appeal to what people want. Most people already understand this, but another thing I learned from this article is that more surprising is that they use all of this information also when they are creating shows. Wildly successful shows right now such as Orange is the New Black and House of Cards were created by the people of Netflix that have already been studying what people want. Through these shows they give people elements of television that they have observed as the most popular. After learning so much about Netflix’s system of categorization through metadata, I am very curious to learn more about the metadata of my other favorite websites like Facebook, Spotify, and Pandora.

Netflix Home Page

Week 3: Data-produced Original Content

The article “How Netflix Reverse Engineered Hollywood” from this week’s readings discussed the author’s project to dissect Netflix’s genre system. He considered everything from the site’s tagging process to the syntax behind its famously niche genres. Though Alexis C. Madrigal was interested in Netflix’s data collection, his focus was largely on how the data contributed to Netflix’s unique categorization system. At one point in his story, Madrigal visits Netflix’s VP of Product, Todd Yellin, and although “he seems impressed at [Madrigal’s] nerdiness, he patiently explains that we’ve merely skimmed one end-product of the entire Netflix data infrastructure. There is so much more data and a whole lot more intelligence baked into the system than we’ve captured.” Madrigal’s focus was rather specific, fitting considering he was analyzing genres known for their alarming specificity, but his conversation with Yellin hinted that Netflix is employing data in many innovate ways, including in the production of original content.

Netflix’s foray into original content is interesting because it has flouted many of the conventions of Hollywood filmmaking. For example, the website releases its content all at once instead of making one episode each week and its executives have refused to publish ratings because they are irrelevant to Netflix’s system. Such policies have produced many a think-piece about the television industry’s potential for change, many of which focus on how Netflix’s access to user data fuels the company’s original programming decisions. Although plenty of studios lean on statistics and ratings, Netflix has a pretty honest view of not only what people are watching but how. According to Yellin, Netflix knows if a user “plays one title, what did they play after, before, what did they abandon after five minutes?” (The Guardian) I’d be interested to learn if Netflix’s data is more helpful than that available to more traditional content creators and if other media platforms feel pressured to adopt certain features of Netflix’s data-driven model.

 

Source: http://www.theguardian.com/media/2014/feb/23/netflix-viewer-data-house-of-cards

Week 2: Meta Data of Life

The Evolutionary Tree of Life

Above is an image of the three largest branches of the Phylogenetic Tree of life, which is much larger and detailed that what is shown above. As I read “Classification and its Structures” by C.M. Sperberg-McQueen, I read, “Classification is, strictly speaking, the assignment of something to a class; more generally, it is the grouping together of objects into classes. A class, in turn, is a collection […] of objects which share some property.” Reading this, I instantly thought of the well known classification system for all living organisms, the Evolutionary Tree of Life.

In seventh grade biology (or ninth grade biology depending on the school you attended), you learn about a man named Charles Darwin, a British scientist from the 1800s who traveled on a five year expedition aboard the HMS Beagle. After seeing different species of animals with different traits from other species that they resemble, Darwin came to the conclusion that all life had one common ancestor and that through Natural Selection, species began branching out to form new species based on the environment.

Why do I bring up Charles Darwin? Classification is a large part of his theory and much of his research went towards classifying every living organism and by creating these classes, he gave us a map to discovering where we fall in the history of life.

We can also look at zoology, the study of animals, which looks at a smaller branch of the Tree of Life. If you go to the Colorado State University Libraries website (a link is provided below), you will find a list of animals listed in alphabetical order by their common names and their species is shown to the right. Each animal species belongs to a genus that in turn belongs to a family and so on until it becomes a matter of what is considered living and what is not.

If you look at the list, you can look at all the different animals and even change the list from alphabetical order of common names to alphabetical order of genus and species. We can take two animals of the same genus, the Mallard (Anas platyrhynchos) and the Hottentot Teal (Anas punctata), and notice that despite their closeness in the Tree of Life, they have differences in characteristics like the shape of their heads. Hottentot Teals have very curvy heads while Mallards have bulb-like heads. Overall classifying animals and living organisms in general makes it much easier for us to identify what species of living organism we are dealing with and it has the same effect with anything you want to classify whether it be movies, food, or any other topic.

Works Cited:

Week 3: Netflix

“How Netflix Reverse Engineered Hollywood” by Alexis C. Madrigal really focused on how Netflix came up with such a large number of genres that the users could pick from. As an avid Netflix user, I have myself witnessed movie suggestions: after watching The Walking Dead I was suggested other zombie-like movies right after. And I have experienced this with numerous websites/apps!

I’m also an avid Instagram user and have been using it for the past 2+ years. Recently I have noticed that right after I click the “follow” button when I come upon a new user I like, a little drop-down section falls right under it with 3 “suggested users.” I found this extremely strange as I wonder what is the criteria that makes these users of a potential interest to me. If I follow a friend from high school, let’s say, maybe I recognize a friend of that user will pop up in the “suggested user” space. This makes sense to me as maybe they’ve tagged each other in pictures. This friend make have only had 100 followers or maybe followed themselves 100 people. The pool is much smaller to choose those suggested users from.

But sometimes I follow users that have 80,000+ followers (maybe because they post pleasant nature pictures!) and maybe they follow themselves 500 people. The pool to choose the suggested users is greater. Where do these 3 users come from?

One possibility I can think of is similar hashtags. Maybe the suggested users use many of the same hashtags? But this seems unlikely because if one person inputs #nature, there are probably thousands of entries for that hashtag. Another possibility is the similarity in users. If there are multiple people who are into nature pictures, they will follow many users that post about nature. But without hashtags how does instagram know what the content of the picture is? Maybe if they post a relevant caption or use the geotag. It gets messy really soon. It is incredibly difficult for me to decipher it, but like Netflix, Instagram must have a algorithm to everything.

Another feature instagram has is “explore.” I have gone to this feature multiple times and each time is something different. Perhaps the first time it was full of nature pictures but the second time it was full of make-up pictures. It is obviously trying to cater to my interest – but how does it know my interests? I soon realized it depends on recently viewed photos.

Maybe someday someone will try to figure out Instagram’s algorithm, although I am sure it is not as crazy as Netflix’s!

For reference:

http://www.instagram.com

http://www.technobuffalo.com/2014/07/11/instagram-suggested-user-feature-quietly-rolls-out/

The Napoleon Dynamite Problem

http://www.nytimes.com/2008/11/23/magazine/23Netflix-t.html?pagewanted=all&_r=0

http://genresofnetflix.tumblr.com

http://www.netflixprize.com

As just one of the millions of Netflix subscribers and self-diagnosed binger, I have definitely spent many long nights getting familiarized with the altgenre system implemented into the streaming media site. I’ve been avidly using Netflix since 2011, but I’ve only really started taking notice of some of its extremely specific genres until just this year. With thousands of titles to sort through on Netflix, their personalized genres are definitely useful, maybe a bit absurd, but still useful. My personal favorites are “hidden gems” or “visually-striking movies” where in these categories I can usually find many independent and quirky films that are difficult to describe.

It is apparent that there is a growing trend of implementing these personalizing algorithms into more and more media sources including the likes of Spotify, Amazon, and Soundcloud. What I find most intriguing and even slightly disturbing about Netflix’s system is the crossover of both human and machine intelligence. It has come to the point where you can probably learn a lot about a person’s interests by simply looking through their Netflix account. In order to achieve this, Netflix engineers definitely had a strong input in creating micro tags for these films based on the Netflix Quantum Theory which makes me question how objective the algorithm system remains to be. There seems to be an ideology similar to the whole “I know it when I see it” expression that is crossing into Netflix’s system.

Netflix has evolved their past system that was based more heavily on numerical values and user ratings to a more human method of introspection. Todd Yellin, VP of product innovation at Netflix had this to say about their new approach:

“Predicting something is 3.2 stars is kind of fun if you have an engineering sensibility, but it would be more useful to talk about dysfunctional families and viral plagues. We wanted to put in more language,”

I think it was a very progressive approach for Netflix that also reveals some very interesting quirks about the relationship between categorizing systems and human nature. Atlantic’s article on Netflix’s genre algorithm system mentions the $1 million prize that the company had offered back in 2006 which reminded me of the Napoleon Dynamite problem. As a film, Napoleon Dynamite seems to be the most difficult movie to pinpoint and recommend to Netflix users. The quirky film remains to be the most stubbornly unpredictable movie as it is attracts many users to rate the film while still being hard to predict. This imbalance, while probably a headache for Netflix developers and engineers, is to me a very humorous quirk in the system that shows how difficult it is to categorize human interests and behaviors.

Bliss: Crafting a Successful Symbology

Screen Shot 2014-10-19 at 4.13.33 PM Stained glass art by Shirley McNaughton, called “Communication.” It’s composed of 10 Bliss symbols.

What is a sign? According to the French philosopher Charles Sanders Peirce, signs are: “something, which stands to somebody for something in some respect or capacity.” At least, that’s the one of the definitions Professor Erkki Huhtamo of the Design/Media Arts program offered us in lecture last week. We broke it down further by dividing up signs into two categories: the “signifier,” or the visible/interpretable form the sign takes, and the “signified,” or the idea the sign expresses. Essentially, signs are defined by humans. Nothing is a sign unless it can be interpreted through a shared culture or ontology.

This past summer, I found myself listening to a wonderful episode of the RadioLab podcast, entitled “Bliss.” One of the subsections of this episode focused on the story of one man named Charles Bliss, who created a system of signs entitled “Bliss Symbolics.” Like many of his time, Mr. Bliss was disillusioned with the dystopian chaos of the post-WWII world, and in turn believed he could heal the miscommunication and destruction he saw around him through a universal system of iconic signs, which all humans would be able to use in order to understand and communicate with one another, regardless of language.

Unfortunately for Bliss, his system of symbols never took root in the global manner that he had envisioned. However, in 1971 a nurse named Shirley McNaughton began using Bliss Symbolics to help children with cerebral palsy develop language skills. Eventually, these children were able to speak basic English by developing written skills using Bliss Symoblics to communicate with their instructors. Over time, Bliss’ signs began to develop to meet the specific needs of children, which eventually traveled around the world. In each place, the symbols would inevitably become tweaked to fit the rules and linguistic ontologies specific to that culture. Bliss Symoblics in Israel were written from right to left, because Hebrew is written from right to left. Bliss’ hope of a universal and unchanging semiotic language was a complete failure.

In many regards, the problem of “mismatched ontologies” presented by Wallack and Srinivasan link directly to the discussion of how the creation and reading of Bliss’ signs played into human culture, education and communication. The localization and specification of Bliss’ signs to small groups of children around the world reminded me of the problems states face in developing broad ontologies that attempt to force large groups of diverse people together in a binary census. The signs that were appropriated from Bliss’ semiotics proved successful in teaching children to speak because they were modified to fit local customs and cultures. In their article, Wallack and Srinivasan point to this exact issue and explain how “any object, attribute, category or relation included within a local ontology could be included in a meta-ontology…there is no reason [Governments] could not also incorporate folkloric relations that guide community perceptions.” Inevitably, local communities know best what it is that is required to successfully educate their children. If governments give their citizens the ability to define themselves on a local level like many different groups did using Charles Bliss’ symbols, I think information loss and infrastructural dysfunction could be significantly diminished in the future across the globe.

Check out the podcast here: http://www.radiolab.org/story/257194-man-became-bliss/