Course blog

The Importance of Language in Classification: Netflix and Reddit

Screen Shot 2014-10-20 at 1.27.01 PM

I really enjoyed Alexis C. Madrigal’s “How Netflix Reverse Engineered Hollywood” article. My curiosity is now somewhat satisfied because I have a better understanding of how Netflix has been so darn good at making recommendations for me. It’s amazing how capable the site is of categorizing 76,897 unique ways to describe types of movies. That number for me is hard to wrap my mind around. In many ways, the purpose of categorization is to help make finding information easier, but can it ever be too much where the opposite is done and the purpose of categorization is lost?

The author of the article, Madrigal, mentioned something that I found interesting. We also touched upon it in class, and that is: the importance of language in categorization aka controlled vocabulary. Madrigal states in the article, “Netflix created a vocabulary” that was used in determining how alt-genres would be categorized on the website. To even begin categorization, there needs to be some sort of agreement made by everyone who will be contributing to the categorization. So those at Netflix had to come up with a specific vocabulary that they could understand and that the audience could understand, no matter where they were from. And like Madrigal stated, they did quite a good job in pinpointing the best terms to use. I mean, “20th Century Period Pieces Based On Real Life”? It’s almost scarily too accurate, which is why Netflix is so good at making recommendations for its viewers.

This also reminded me of the extent other websites try to manage the vast number of categories on the web, and how sometimes it does not work well. Netflix, as a private company, created the language used on the site, which users have to agree with. But that is not the case on forum-like websites where the users create and agree on the common language. Reddit, a very popular site that I’m sure many of my fellow college students know all too well, has for the most part, successfully categorized a large number of sub-categorizations. There are about 6000 active subreddits online. The title of each subreddit, has in some way, been agreed upon by the users. For example, the subreddit, /r/aww features pictures, videos, and stories of all things cute (mostly animals). It might seem like a minute change of detail, but what if someone were to search for /r/awww, instead? (an added ‘w’ included in the word ‘aww’)

The official /r/aww subreddit:

Screen Shot 2014-10-20 at 1.12.29 PM

 

The very similar /r/awww subreddit:

Screen Shot 2014-10-20 at 1.12.46 PM

There are users on both subreddits and both are active, meaning posts have been submitted within the past 24 hours. And as you can see, both have users on each site, though one has far more. For those using the subreddit /r/awww, they may not know of the common agreement that /r/aww is a more popular subreddit among users, and that separates them from being able to get the same information.

Screen Shot 2014-10-20 at 1.41.24 PM

Reddit has acknowledged the differences between subreddits and has begun to direct users to the other, more popular subreddits minimizing the number of categorizations on the site.

Source: Alexis C. Madrigal, “How Netflix Reverse Engineered Hollywood,” The Atlantic, January 2, 2014

Week 3: Netflix and Jeopardy!

The article on Netflix was particularly interesting because it put to practice the extreme level of sophistication and specificity Netflix incorporated into its genre-ization algorithm to create specialized profiles for each Netflix user. Alotting for a variety of combinations of periods, genres, and even actors, Netflix attempts to precisely categorize every film ever made, even if that category has just one film. A set of generated genres attributed to each viewer, Netflix not only has the capacity to describe what each viewer watches, but ultimately predict it. Case in-point is Netflix’s educated purchase of “House of Cards”, a show that lined up perfectly with Netflix’s average profiles. However, as the article points out, “the data can’t tell them how to make a TV show, but it can tell them what they should be making.” The automation of something as emotional and complex as movie-making/movie-watching seems to have been reduced and perhaps even mocked by Netflix’s algorithm. Yet, the opposite is true. “House of Cards”, and Netflix for that matter, are not successful because the human capacity to enjoy films can be trivialized to an algorithm but because the preferences related to enjoyment can be more accurately communicated via metadata. Thus, the creation of the media in response to these results remains an essential, anthropological product.

Watson: i.kinja-img.com/gawker-media/image/upload/s–Gbchunvr–/18mhcmpj5aul1png.png

As an avid fan of “Jeopardy!”, this article reminded me of IBM’s Watson. A supercomputer put up against “Jeopardy!’s” greatest champions, Watson easily won the contest with its huge data storage and processing capacities, as well as precise “buzzing” within a millisecond to first “question” the answer given. Its speed and accuracy improving with every clue, Watson also had a “learning” algorithm to remember combinations of answers that proved evident throughout the contest. Remarkably obvious, however, was Watson’s inability to “pattern” human thought and speech. Things like puns and jokes in clues went unregistered by Watson, and was unable to perceive answers to these trickier clues. Thankfully, Watson’s capacities were created with the medical field in mind, even though its stunning calculation abilities are hotly contested and marginalized by professionals in the medical community who worry about the economic and moral ramifications of automating medical practice (“The Robot Will See You Now”). Indeed, Watson is far from capable of understanding human gray areas like fear and morals to present proper diagnoses. However, perhaps medicine as nothing to fear – just as Netflix needs viewers and filmakers, surely medicine requires both human patients and doctors.

Netflix: “www.theatlantic.com/technology/archive/2014/01/how-netflix-reverse-engineered-hollywood/282679/”

Watson: “www.theatlantic.com/magazine/archive/2013/03/the-robot-will-see-you-now/309216/”

 

Classification, Standards and Aristotle’s 4 Causes- the Case of the Withering Rose

4 causes

Bowker and Starr’s discussion on classification and standards in “Sorting Things Out” led me to think about Aristotle’s 4 Causes, which he sees as an effective way of understanding objects in the world. This appears to be a more abstract but flexible way of thinking about objects in context, which would subsequently aid in its classification. The equation is as follows-

Material Cause + Formal Cause+ Efficient Cause= Final Cause

This is in contrast to a classification system with categories that are mutually exclusive- the article mentions that “a rose is a rose, not a rose sometimes and a daisy at other times”. Intuitively, this makes sense as distinct categories enable us to better identify things by reference to their specific properties. However, this seems to me to pose a problem in an age where knowledge is in flux and we look to gather information about an object as it morphs over time. While this may not always be the case for historical and archiving purposes, it is interesting to examine how classification works for objects that are highly susceptible to the progression of time and across worlds.

For instance, a rose bud differs from a blossomed or withered rose in that they are all in varied stages of development. Each stage of development in a rose will have, appended to it, a specific set of properties that are different and distinct. Yet a classification system would not be able to capture this progression and merely classify it under “rose”. This issue is exacerbated when placed in the context of the 4 Causes, as each type of rose would have a different material, formal and efficient cause, leading to a different final cause.

Material Cause: what constitutes the object (A rose consists of a thorny stem and veiny petals made of plant matter)

Formal Cause- the ratio or general form an object takes (A rose is 90% stem and 10% flower)

Efficient Cause: the thing that motivates creation/ change (pollination is the efficient cause of roses growing)

Final Cause: the aim or purpose it serves. (A rose is an organ for plant reproduction and an ornamental object)

This sort of ambiguity is one caused by a definitive change in properties of the object, as opposed to an ambiguity regarding the categories of classification a rose should fall into. This demonstrates that classification at its present stage is capable only of registering static information. While Aristotle’s system seems to account also for standards (as opposed to classification) that are able to withstand the test of time, his system is one that has standards relative to a specific object’s constituents, rather than taking these object’s constituents as change over time.

Since a withered rose has a different material, formal and efficient cause from a rosebud or a blossoming rose, Aristotle’s system would still consider the blossoming rose and withered rose to be two different things. If this is the case, how would we be able to prove and track (via either method of classification) that it was the same rose that blossomed and withered? In broader terms, how are our systems of classification working to address the idea that two objects cannot be proven to be one and same object when its properties and standards have morphed over time?

The funny thing, though, is that Aristotle’s 4 Causes continue to influence our notions of classification and standards now. This makes me curious as to the possibility of developing “real time” classification systems that can grow and track changes in the data of the object it describes.

 

Personalization in the Digital World

http://www.theatlantic.com/technology/archive/2014/01/how-netflix-reverse-engineered-hollywood/282679/5/

https://www.facebook.com/about/privacy/advertising

The digital world is a unique place other than any other subject in society. It is unique because technology can and is being streamed into every subject or interest that we know of. Whether it be exercising, education, social interaction, studying, examining interests; everything is and will be turning digital. This can sound overwhelming and scary at times to think all the information in the world is floating around in a digital abyss. However, it is also very comforting at times to know that it can not be burnt in a fire, lost in paperwork, or at the bottom of the Indian Ocean ruined and never recorded.

The digital world is becoming so advanced that not only are we not losing information that will never be published, we are recording everything we do. A lot of us try to do this ourselves, including myself, where I try to save websites on my desktop. Eventually these websites of shopping, funny videos, articles i want to read later; becomes a blur of words and safari icons that I must organize one Sunday afternoon for hours and probably never see again.

Luckily for me and anyone else out there with an ego for wanting to organize interests all on our own, there are more sophisticated softwares and personnel doing this very same job for me in a more organized and personalized manor. Among these crafty institutes are Netflix and Facebook two amazing companies that make a huge portion of the population’s lives a whole lot easier.

Generations prior have said that this new Tech Age is making our generation and anyone else after more attention deficit. Our heads are being filled with clutter and over-load of information from the Internet. What those people do not know is that the new age is teaching us new ways to filter that over-capacity of knowledge. For example, the innovators at Netflix created a way to do this by using underlying tagging data to personalize our movie interests. Todd Yellin the VP of Product and creator of Netflix’s system has used underlying tagging data to create 76,897 genres to categorize the movies on Netflix.

What interested me the most was how this system personalizes every individual subscriber specific interests based on what her/she has watched and rated a movie. Netflix goes beyond just machine intelligence of recommending movies this way and uses a hybrid of human intelligence as well by looking at how much of romance, comedy or action based on a 1-5 rating scale is in each movie. They won’t tell you what that is, but will recommend it based on what they know about your preferences.

When I started thinking of personalization in movies I thought of Facebook, which uses this hybrid intelligence for advertisements. Facebook ads have gotten very political because people are afraid their personal information is being exposed or they will be spammed if they click on an ad with a virus. I understand this fear, however, Facebook assures us in their, “Data Use Policy” that they do not release any of our personal information, but work the opposite way by choosing ads who they have partnered with and recommend for individuals. They use the information we provide and links we have clicked on throughout Facebook to show more things that interest us. For me, I have found my favorite clothing websites, news articles, and restaurants through their personalized system. Now instead of spending time procrastinating on Facebook, I am led to 100s of other websites that can give me more substance and increase my interests rather than clicking through pictures of myself on the book.

Why the Silly Genres?

This week my post focuses on the “How Netflix Reverse Engineered Hollywood” article. Thinking about the n-dimensional classification scheme while reading this article was interesting. The classification Netflix uses isn’t n-dimensional because the genre title can only include so many characters. But it’s not even fourth or fifth or however many identifiers can be put in the genre title. From my understanding, each classifying tag is independent from the next. If you wanted to categorize “Night of the Living Dead” it would be Zombie movie from the 1960’s, but neither “zombie movie” or “from the 1960’s” will be a branch under the other identifier. The Netflix classification scheme is almost an ultra-specific 1-dimensional classification scheme. I looked to the Netflix terms of use to see if their algorithm is mentioned at all. It’s not, but there is a clause that states you may not “engineer or disassemble any software or other products or processes accessible through the Netflix service” which would include their genre system.
However their terms do state that they are constantly updating all facets of their service, which would include their genre list. Based on what people are watching and the growing or dying popularity of a movie or TV show can effect what genres are being included. This can also be applied in the opposite fashion. In the case of House of Cards, Netflix chose to create a Machiavellian political thriller because that was a popular genre. Another clause states that “The availability of movies & TV shows to watch will change from time to time, and from country to country.” The country of origin or setting genre tag is useful to Netflix because they can see if Danish movie are popular in America and therefore should provide American customers with more Danish movies or if they are more popular in Japan and then provide Japanese customers with more Danish movie genres.
It is for this reason that the super specific genre system that Netflix uses not only benefits the viewer but also benefits Netflix itself. If somebody rates romantic comedies 5 stars and coming-of-age movies 4 stars, they will get recommended romantic coming-of-age comedies. From Netflix’s perspective, they gained data that tells them that people who watch romantic comedies also like to watch coming-of-age movies. This data let’s Netflix know the best way to group their movies and TV shows, and what type of movies they should spend money to get licensing for.

“Netflix Terms of Use.” Netflix. N.p., n.d. Web. 20 Oct. 2014. .

Classifications of Personal Involvement

classification

Ontology is shared understanding of a given interest and its subdivisions, and in the case of Netflix, a set of micro genre’s divided to produce the ultimate personalized watch list. The article How Netflix Reverse Engineered Hollywood, written by Alexis D. Madrigal, focuses on the engineering of the categories or subgenres, and features that have occurred as a product of this complicated algorithm. Netflix is able to break down these films to the nitty gritty and collect the information into the vat of cinema knowledge, then apply the information in a productive manner. Shows like House of Cards or Orange is the New Black has been strategically created based off the data pulled from the viewer’s preferences, they created shows based on what their users like. Personally I think they are genius what they have done makes total sense, but there is so much more complication that exist within a system like this. But honestly my brain cannot fully wrap around the amount of work that went into developing this program. The company has created a system that uses local knowledge from their user community to create a personalized genre, which will help avid video streamers like myself, develop cinema ontology just for personal experience.

 Classifications are the basis to the Netflix organization, but also involved in multiple constructions in the everyday life. We categorize our lives from are stores, animals, food, jobs, and many more standard practice Classifications exist as product variations within our local knowledge. Just like our mind, the Internet uses the information like cookies to create a personal surfing experience. For example, Facebook uses information that you post to create your ad preferences. They pair with companies like DAA( Digital Advertising Alliance) which provides ads which are customized to the users, by using information like age, location, liked pages, and other shared data. This information actually leaves me slightly unsettled. Facebook is tracking my Internet presence as a way of gaining resources. I did know that they were doing this, and it’s awesome that they are attempting to please the user and give them an intimate Facebook experiences, but I do not like that I am being Facebook stocked by Facebook. Netflix approach seems to be more for the user, but this is because there is a membership fee, where as Facebook gets their money from ads so we can use the site for free. It just leaves me with an unsettling feeling, how much do they really know?

Ontologies and Individuals

Reading “Local-Global: Reconciling Mismatched Ontologies in Development Information Systems” by Jessica Seddon and Wallack Ramesh Srinivasan reminded me of peoples’ struggle to reconcile their identity within a system of classifications. After the resource day at UCLA, I realized that there were too many organizations that I should join. I picked three organizations that best represented my interest and identity. However, I ended up devoting myself to only one organization. My other two interest had to be discarded for the meantime. Therefore, my other interests will be lost to the organization that I chose. If my understanding of the reading is correct, an organization representing a single demographic or interest is a mismatch to what defines an individual. Jessica Seddon and Wallack Ramesh Srinivasan notes that “While any group’s ontology is unlikely to match that of every individual within the group, the extent of mismatch tends to increase with the scale of the group and the differences between the purpose of individual and group ontologies.” Ideally, an individual should not be broken up between three interest, but should have one organization that addresses his or her interest in its entirety. Instead of choosing an organization that fits one criteria and leaves out the rest of my interests, I chose an organization that was the most diverse in an attempt to keep my interests broad.

I searched the web for a visual example of an ontology related to the reading and found this simple visualization beginning with a lion and an antelope. The diagram of the two animals resemble the way classifications are divided and the way they relate. The over simplified diagram of the two animals only leads to further classifications.  The problem addressed by Seddon and Wallack is that when information is not “inclusive” or “collaborative” to the community, a mismatch of information takes place. For instance, to further develop this animal ontology, one can create a way for people to add more information about lions and antelopes. The classification process does not really tell us much what is really a lion or an antelope. This visual ontology is suppose to represent lions and antelopes, but because of their classification, information that defines a lion and an antelope are lost. The ontology, therefore, is not reality all the time. Each organization had it’s own ontology that best represents that organizations goals. However, since most organizations specialize to serve the interest of a specific demographic, an individual with a multitude of interests will struggle to reconcile his or her conflicting interests.

 

Sources:

Seldon, Jessica and Srinivasn, Ramesh Wallack. “Local-Global: Reconciling Mismatched Ontologies in Development Information Systems”. 42nd Hawaii International Conference on System Sciences, 2009. http://rameshsrinivasan.org/wordpress/wp-content/uploads/2013/03/18-WallackSrinivasanHICSS.pdf. Web. 20 Oct. 14

 

web. image. http://www.scientific-computing.com/features/feature.php?feature_id=37

Week Three: Netflix and Facebook

What stuck me most about the article “How Netflix Reverse Engineered Hollywood” were how many comments lamented the fact that despite the prevalence of ultra-specific altgenres, many users are only given the same suggested movies over and over. Because the function of these altgenres is to intimately personalize the film selections for a highly specific viewer, viewers are only given a select amount of options by the algorithm created by Netflix, limiting the immediate scope of their film watching. One user commented, “[This] explains why Netflix has steadily made its search function harder and harder to use. It really does not want to empower end-users, it wants to effectively program content for you… Some must be more profitable than others; hence those are the ones you are spammed with… The missing element is how profitable each and every stream might be.” While I am not sure about the factual accuracy of this comment, it does remind me of a similar site that attempts to create personalized content to enhance revenue: Facebook.

The similarities of Netflix and Facebook lie in the “design of the software system that supports them. How that software functions is the result of decisions made by programmers and leaders within the company behind the website” (Grosser). Netflix is designed to suggest films that you would want to watch based on your previous watch history. This leads to personalized streams and, most likely, increased at revenue for Netflix. Facebook is structured in a similar personalized way, but while many “personalities” can use one Netflix account, Facebook’s interface forces the user to realistically portray themselves online the same way they would as if they are in real life. It requires the use of a real name, location information, schools and jobs held, and what the music and movies one likes. According to Grosser:

This ideological position of singular identity permeates the technological design of Facebook, and is partially enforced by the culture of transparency the site promotes. The more one’s personal details are shared with the world, the harder it is to retrieve or change them without others noticing—and thus being drawn to the contradictions such changes might create. This is further enforced by the larger software ecosystem Facebook exists within, such as search engines, that index, store, and retain those personal details in perpetuity (Blanchette et al., 2002).

The personal details Facebook collects leads to a data-mining trove, and allows Facebook to use this information to target the user with personalized ads, much in the same way Netflix uses previously watched films to recommend movies a viewer will most likely watch. Both of these website’s software are what allow them to be so successful in marketing to specific interests, but also limit the variety of “interests” displayed, thereby regurgitating the same limited types of objects—-be it movies, or ads.

Grosser, Ben. “How the Technological Design of Facebook Homogenizes Identity and Limits Personal RepresentationHz-Journal. Hz-Journal, 2014. Web. 20 Oct. 2014.

Netflix and Society

Before reading Alexis Madrigal’s article “How Netflix Reverse Engineered Hollywood”, I wasn’t aware that tens of thousands of microgenres even existed. Moreover, I was skeptical of the fact that Netflix recommended movies based on all the different films and TV shows that we previously watched. I just thought that Netflix recommended the same movies to almost everybody and only claimed to tailor recommendations as some sort of marketing scheme. I didn’t think there would be a group of people who sat down, analyzed, and tagged all the different films to put such a vast project together. Therefore, after reading Madrigal’s article, my appreciation and respect for Netflix grew tenfold. It reminded me about the years and years of hard work that Pandora employees had to invest in order to create the music genome project, which is similar to Netflix in that it endeavored to analyze and tag every song, artist, and album in order to recommend music to individuals. Another example would be Yelp. Yelp endeavors to compile a list of all the restaurants and different arenas of the service sector and tags each establishment in order to recommend places to eat and where one should take his car for maintenance or repair as an example. In this post, however, I wanted to focus on the system of classification and how recommendations and reviews in Netflix lead the masses in society to watch similar movies and shows.

Screenshot (8)

Although Netflix has 76,897 ways to describe a movie through genre tagging, only a very small fraction of those genres are seen in one’s personalized Netflix home-page. The genres that you see are most likely the most popular genres. Moreover, one huge determining factor that goes into the selection of movies that are shown in the front page is the review or star process. The movies and shows that we see in the front page are usually ones with great reviews or the most number of stars. This is one reason why I see a vast majority of similar recommended movies and films. For example, my Netflix home-page and my friend’s home-page share many of the same movies. In fact, sometimes our home-pages look almost identical. Although, of course, personalization comes into play and we do see some differences in the recommendation of movies, for the most part, it seems as if Netflix showcases movies that are highly rated and popular. Since, people only want to watch the best and top-quality movies, as a society, we end up watching movies from a pool that is essentially not that vast at all. This makes me think about the future and how maybe if this trend of classifying and recommending continue as it has for the movie industry such as Netflix and the music industry such as Pandora, Spotify, etc, then we as individuals in one nation might prove to be strikingly similar to one another.

Week 3: Plateau Peoples’ Web Portal & Christopher Columbus

Plateau Peoples’ Web Portal is a brilliant and elaborate site is a “gateway to Plateau peoples’ cultural materials” held in multiple historical preservation establishments. Tribal administrators (working with their tribal government) provided information and their own materials as to expand the archives. This website is crucial to celebrating diversity within the Indigenous People’s culture rather as to group them together singularly as “Native American.” The website provides cultural materials, respectively to each tribe, digitally along with a map so that one can see exactly where it used to be.

Just looking at the six tribes featured is only six of the 560 federally recognized tribes that exist in the United States alone. Most of the cultural materials provided provide insight to the devastating imperialism and cultural genocide of these tribes. 50 million people had been living, thriving, existing in America before the voyage of 1492. The Plateau Peoples’ Web Portal is a terribly real reminder that the land we live on today is occupied illegally and the persistence of Indigenous Peoples’ Day should be federally recognized instead of Columbus Day. Replacing Columbus Day with Indigenous Peoples’ Day is a huge feat in recognizing that Christopher Columbus did not discover anything. One cannot discover land that is already inhabited by millions of people. On top of that, Columbus is the notorious catalyst that caused the genocide of millions of North American inhabitants and the cultural annihilation of hundreds of diverse cultures. The Plateau Peoples’ Web Portal is an homage to the tragically destroyed cultures and classifies them individually giving them the recognition that is due.

 

Indigenous Peoples’ Day is not a new phenomenon, just incredibly unrecognized. Minneapolis recently passed a resolution this April for Indigenous Peoples Day to rename the second Monday of October. Hopefully we can observe just the start of a historical observance revolution. Going through this website and viewing the pictures available is a huge reality check.

(Note: The Indigenous People of the Plateau include some parts of Canada, not just the U.S.)

Sources:

Plateau Portal

Indian Country Today

City Pages Blog