Course blog

Week 3: Course Evaluation Forms and Mismatched Ontologies

image

This is a screencap of the webpage for the Evaluation of Instruction Program’s (EIP) course evaluations. Toward the end of each quarter, students at UCLA receive emails to complete and submit these forms, but instructors give the impression that few students actually fill them out. They always stress the importance of this feedback mechanism to improve the quality of instruction and to better serve their students.

This source is related to the discussion in “Local-Global: Reconciling Mismatched Ontologies in Development Information Systems” by Jessica Seddon Wallack and Ramesh Srivasan. It illustrates how the school collects data on the academic activities of community members, in this case faculty and students, in order to make better decisions about policy. However, this evaluation form also illustrates the phenomenon of mismatched ontologies between students and administration. Whereas common academic problems for students may involve thoughts like “I don’t really get what this assignment is asking me to do and how I’m supposed to do it,” “My reading comprehension and note-taking skills need work,” “I don’t know how I should be preparing for the exam,” or “It takes a really long time to do all of the reading and writing assignments,” this form was not designed to address such issues despite its role in improving the quality of education. As far as I know, there is currently no mechanism directed at collecting data on improving academic services, and unsurprisingly there is no Academic Skills Center that provides formal training in effective learning skills. This evaluation form demonstrates how the administration’s meta ontology influences how it attempts to address community problems, but has difficulty taking local knowledge into account and therefore incurs an information loss.

Wallack and Srivasan make several recommendations for how meta ontologies can incorporate local knowledge. The first is to develop collaborative and inclusive ontologies. This online form does not provide for student input on what questions are asked and which ones are the most important, though the technological capability does exist. The second recommendation is to allow the community to provide feedback on the data that the administration has collected, and to help them make good critiques of that data through education and appropriate communication strategies. Currently, the contents of these evaluations are confidential, and students remain unaware of how other students felt about the course (outside of the usual gossip), or more importantly how the administration understands the data. Finally, the third recommendation is to provide for alternative means of communication and decentralizing decision-making to more local levels. The final comments section on the form allows for some flexibility regarding the former, though the response may not be relevant to the question, while the latter issue is beyond the scope of this form entirely.

Decoding Netflix compared to Plateau People’s Web Portal

Source: http://www.slashgear.com/wp-content/uploads/2009/06/apple_macbook_pro_13-inch_teardown_1.jpg
Source: http://www.slashgear.com/wp-content/uploads/2009/06/apple_macbook_pro_13-inch_teardown_1.jpg

This is an image of all the bits and pieces that go into a Macbook computer. It reminds me of the difference between the Plateau People’s Portal and Madrigal’s Netflix exploration. Just as the computer must be built before it can be broken down, a website must be put together before the semantics that make up that website can be revealed.

The difference between creating content and cracking method by which content is created is the difference between building up and breaking down. Thousands of memories and years of history go into creating the background of the Plateau People, just as thousands of directors and actors put together the movies that make up Netflix’s endless categorization. All those moments and all those movies were uploaded meticulously into a data base, then made public for browsing and viewing. Breaking down that content, however, takes just a few people and some really good data recovery software. This can be seen when comparing Madrigal’s “How Netflix Reverse Engineered Hollywood” and Washington State University’s “Plateau People’s Web Portal”.

Clicking to the “About” page on the Plateau People’s Web Portal reveals that “tribal administrators, working with their tribal governments, have provided information and their own additional materials to the portal as a means of expanding and extending the archival record.” Memories, artifacts, dates and events were used to create a comprehensive history of the Plateau people. The curators pulled out the most potent pieces of information, deciding what must be shown versus what can be thrown away. All this human effort prevents the website from showing random outliers.

To crack Netflix’s “alt-genre” movie categorization algorithm, Madrigal used a plethora of software and equations. She states the programs took over 20 hours to grab all of Netflix’s possible URLs and patterns, a feat that would have taken years to accomplish in the absence of said programs. Although interpreting and finding patterns in all the data could have only been done by a human, she heavily relied on technology to get the information she decoded.

At the end of the article, Madrigal reveals the Perry Mason effect, where there are an outstanding number of categories for a person most Americans today cannot name, making it clear that the algorithm cannot decide which information is unimportant or an outlier).

Altogether, this shows that although equations and technology are both essential in cataloguing the information we use today, there is no substitute for human effort.

Week 3 – Netflix / 8tracks

When I saw the reading list for this week, I was immediately drawn to the article about Netflix, “How Netflix Reverse Engineered Hollywood” by Alexis C. Madrigal. I consider myself to be an avid Netflix binge-watcher, so I was intrigued to see what this blogger had to say about Netflix. The other day my friend and I were talking about how there’s so much on Netflix, but sometimes I still don’t know what to watch. The different categories can be overwhelming to say the least. Madrigal’s article furthered this point I had and opened my eyes to the absurd amount of movie categories Netflix has. This got me wondering if the genres that didn’t even have any movies in them served a purpose at all. Was it supposed to let Netflix know if they should add more to the “”Feel-good Romantic Spanish-Language TV Shows” genre because people were searching for it?

The way Netflix categorizes movies and tv shows reminded me of the “explore” feature on http://8tracks.com. 8tracks is similar to Pandora in the fact that it’s a free Internet radio, but the difference is that you look up playlists compiled by other users rather than stations by an artist or song. When searching for a playlist on 8tracks, you can go to the “explore” tab and from there, you search through the 1,706,776 playlists available using preset tags or by searching for something specific.

Screen Shot 2014-10-17 at 10.00.52 AM

8tracks’ system is different because you can add multiple tags. And the tags aren’t just based upon genre; you can also find the right playlist by typing in “any mood, genre or activity”. For example, when I’m looking for playlists to listen to when I study, I normally start with the tag “indie” and then from there depending on my mood, I normally either go with “chill” next or sometimes “folk”.

Screen Shot 2014-10-17 at 10.01.04 AM

Screen Shot 2014-10-17 at 10.01.24 AM

Screen Shot 2014-10-17 at 10.01.11 AM

When I’m working out, I start with “running” and then normally either chose “hip hop” or “pump up”. Even if you search for the same tags every time, you’ll find new playlists because they are constantly being added by other users. I think it would be helpful if Netflix had a similar system for searching through their database. Netflix has so many specific tags, but I always find it somewhat difficult to find something that fits what I want to watch exactly unless I know that I’m looking for something specific like “Grey’s Anatomy” or “Gossip Girl”.

 

W3 – Data-mining, Classification, and Research

How a Math Genius Hacked OkCupid to Find True Love

This week’s readings reminded me of an interesting article about a Chris McKinlay, a UCLA grad student who “hacked OK Cupid to find the girl of his dreams.” Some friends shared it on Facebook months ago; apparently he was a TA in one of their lower-div math classes. It was interesting to read about his process and the visualizations included in the article were striking as well.

 

Chris McKinlay used Python scripts to riffle through hundreds of OkCupid survey questions. He then sorted female daters into seven clusters, like “Diverse” and “Mindful,” each with distinct characteristics.
Chris McKinlay used Python scripts to riffle through hundreds of OkCupid survey questions. He then sorted female daters into seven clusters, like “Diverse” and “Mindful,” each with distinct characteristics.

 

His mathematical approach to online dating reminds me of how Alexis C. Madrigal reverse engineered Netflix’s vocabulary and grammar in “How Netflix Reverse Engineered Hollywood.” Both McKinlay and Madrigal started their projects with data-mining scripts. Once they had a sizable data set, they looked for patterns and then ran tests to (dis)prove these hypotheses. However, before they could do this, they needed a classification system. In McKinlay’s case, this meant “seven statistically distinct clusters based on…[women’s] questions and answers.” Once grouped into seven clusters such as “God,” “Tattoo,” and “Samantha” (nomenclature was nonstandard) with distinct characteristics, McKinlay could target women from a specific cluster with a profile tailored to their interests. For Madrigal, classification meant organizing genre descriptors into categories such as “Region,” “About…,” and “Based on….” A Netflix genre was a subset of these components that followed specific grammar rules.

 

McKinlay and Madrigal’s situation was unique because they were both hacking an established data set. Their data was pre-tagged, which made the process of classifying and pattern hunting much easier. In Madrigal’s case, Netflix’s movie taggers broke down movie content into “quanta” or “microtags” that could be fed into computer algorithms. The 76,897 altgenres scraped by Madrigal’s script were the product of these algorithms. In this way, Madrigal was working with “metametadata” or data about data about data. In contrast, the authors of Plateau Peoples’ Web Portal had to build their dataset from the ground up. According to the “About” page of the site, they were faced with the daunting task of curating a diverse collection of Native people’s cultural materials with varying metadata. A classification standard was set that would allow for both consistency and flexibility throughout the collection: “There are nine main categories (users can use the browse section of the portal to view these) within the portal. Each tribe can then add their own subcategories refining the typology further to allow for greater precision and flexibility in searching.”

 

Although McKinlay and Madrigal’s classification process may not have been as extensive as the the authors of Plateau Peoples’ Web Portal, their approach to metametadata was fascinating. I enjoyed reading about reverse-engineering large, cryptic datasets and using them in new ways.

Week Three: Classification, Continued; Research Techniques

Picture8

Alexis C. Madrigal’s article How Netflix Reverse Engineered Hollywood was really fascinating to read. As an avid Netflix user, I used to take these genre titles at face value. I recognized that my watching patterns were probably noted by the Netflix system and therefore suggested similar titles. I was shocked to find out the back-end of this categorization system. Not only does Madrigal’s unique research technique illustrate the complexity of rationalizing such a gigantic database, it also suggests the ideological effects of various systems of classification.

In Sorting Things Out, authors Geoffrey C. Bowker and Susan Leigh Star define classification as “a set of boxes (metaphorical or literal) into which things can be put to then do some kind of work – bureaucratic or knowledge production”. They identify the three key characteristics of an ideal classification system as “There are unique classificatory principles in operation…These categories are mutually exclusive…The system is complete” (10-11). However, Bowker and Star continue their argument to say that “no real-world working classification system that we have looked at meets these ‘simple’ requirements and we doubt that any ever could” (11). The Netflix genre generator does indeed have “literal” which are checked on a rating system. Its classification system does produce knowledge to the company, informing it of its consumers’ likes and dislikes, an obvious advantage in gaining and retaining viewers.

Madrigal explains Netflix’s tagging system in laymen’s terms, “Using large teams of people specially trained to watch movies, Netflix deconstructed Hollywood. They paid people to watch films and tag them with all kinds of metadata. This process is so sophisticated and precise that taggers receive a 36-page training document that teaches them how to rate movies on their sexually suggestive content, goriness, romance levels, and even narrative elements like plot conclusiveness…they even rate the moral status of characters” (Madrigal). While there is a human input to this system, the Netflix genre generator acts as an unprecedented catalyst between man and machine. Madrigal observes, “There’s something in the Netflix personalized genres that I think we can tell is not fully human, but is revealing in a way that humans alone might no be”. In this way, Netflix is “a tool for introspection”. Its unique categorization system sheds light on human’s reliance on machines to even tell us what we like. Can a machine capture the innate, complex human tendency to feel emotionally drawn to something?

A similar project that came to mind (which was actually mentioned in the article) is Pandora’s Music Genome Project. Much like Netflix, Pandora analyzed millions of songs “using up to 450 distinct musical characteristics by a trained musical analyst. These attributes capture not only the musical identity of a song, but also the many significant qualities that are relevant to understanding the musical preferences of listeners” (Pandora.com). Before really reading anything about the Music Genome Project specifically, I had a thought that the categorization of music would be much harder than movies. Relatively, movies tend to follow trends, while music has a longer history and many, many iterations. While it tries to do something similar to the Netflix personalized genres, it is much more ambitious of Pandora to distill this medium. For example, critics of the Music Genome Project pointed out the social aspect of music, “Music is traditionally a more collective experience…that aspect shows itself very powerfully in the way we consume music in society. We want what other people are having” (Wilkinson). Although Pandora is invested in advertising to its listeners in a similar way to Netflix, the medium of music definitely has its limitations.

Week 3: Information-Loss Through UX

While reading Wallack and Srinivasan’s piece on information miscommunication, I was reminded of miscommunications that happen in a website-user interaction. The website should be made for the person using it, but often the website’s structure and content are the only focus and the user is left unconsidered.

A local community can’t fully control their problems or resulting data, so it is up to the state to help, document, and guide in a way that makes sense to the community. Since administration is in a position of power, it is responsible for its people and for using its power in a helpful, accessible way. Similarly, it is not the user’s responsibility to adapt to a way of learning different websites. User experience should be designed in a way that takes into consideration who needs to use the website, for what purposes, and in what context.

An example I thought of was elderly people on the web. User research points out that seniors are slower and less comfortable using the web than the average user, therefore relevant websites need to be designed in a way accessible to seniors  to not exclude them. Priceline.com is an example of a relevant website. Data shows that travel websites have significant traffic from older users, so accommodating them on Priceline through ux is important. Older users need larger font sizes, fast response time, less actions per page, and ample white space between clickable objects. They mainly use tablet and desktop, so using a horizontal scroll feels natural. They prefer to read information, so limiting video content is optimal. They are uncomfortable trying new things for fear of failure, so making tasks straightforward with on-boarding guidance will encourage them. Seniors often blame themselves if they can’t figure out how to use a website. However, when designing, the user is always right. It is important to design with the specific user in mind, in this case including senior citizens, because if the user can’t use your website, it is useless and poor design.

A designer has power over the usability of their product, and they have the responsibility of making relevant websites easier and faster for, in this case, older people to use. Similarly, if administrative power was used in ontologies to tailor to specific community needs, less information would be lost along the way. People in a position of power have the responsibility of considering who, why, and in what context their services are needed in order to achieve efficient and accurate communication.

http://www.nngroup.com/articles/usability-for-senior-citizens/

http://www.priceline.com/

Week 3: Geeks Make Art

I define art as any creation that reflects something about the artist or creator whether it be an interest, a feeling, a fact about their life, their dreams, etc. Art can be a painting, a film, a book, or even a drawing that you give your mother for mother’s day. Just the concept of using resources around you to create something that, without you, would not exist is impressive to me.

While reading Alexis Mardigal’s article on The Atlantic named “How Netflix Reverse Engineered Hollywood,” I found myself intrigued and amazed by his quest to make a list of all the altgenres (a term Madrigal uses to describe the various genres found on Netflix) that the streaming service has to offer. However, there was one sentence that caught my eye in which he describes a conversation between himself and Todd Yellin, Netflix’s VP of Product. Madrigal says, “‘It’s a real combination: machine-learned, algorithms, algorithmic syntax,’ Yellin said, ‘and also a bunch of geeks who love this stuff going deep.'” I also have a definition for geeks: anyone who is very passionate about something whether it be movies, video-games, data, or any other field of interest.

This quote resonated with me, reminding me of a conversation I had with one of my friends a few days ago when I showed him a video on Youtube by a Poketuber (someone who makes videos about the media franchise, Pokemon), Nathan Smith, who goes by the name of “natewantstobattle.” Smith has over 200,000 subscribers on Youtube which he has garnered through his “Let’s Plays” and song parodies. Below is one of his parody videos called “Hoenn’s Out,” based on the song “Love Runs Out” by American pop band OneRepublic.

Smith’s song, “Hoenn’s Out” talks about the excitement Pokemon fans felt upon hearing that Pokemon Omega Ruby and Alpha Sapphire were confirmed for the Nintendo 3DS family of systems. Smith has used what many would call a “geeky” interest to create these parody videos that many people enjoy listening to and watching. Other Youtube users leave comments expressing how they feel about his parodies. Here are a few:

TyranitarTube: “I’m gonna jam to this as I drive to BestBuy in November.”

Taylor Nordman: “I heard the original song for the first time today, and in comparison to this, I thought it sucked.”

Blaze The Mincraftian: “Cant stop listening to this song.”

When Yellin mentioned the people who watched the movies for Netflix to stream and rated the movies based on violence, the level of romance, etc. to create altgenres for Netflix, he described people who were passionate about the subject and committed to doing the work. Even Madrigal, who followed road they left behind by giving each genre a certain title, was passionate enough about creating a list and data to work until the it was complete. Whether you call these people geeks in a good way or a bad way, I call them artists for creating something they believe in.

The Problem isn’t in the Data

While I understand the argument for improved data connection between state and community, I doubt that it is the only problem. Mismatched Ontologies argues that with improved connection, problems will be fixed, siting India as an example. I disagree. Data does not plague these nations and it is the least of their concerns. Every government in the world is corrupt and is essentially nothing but a group of rich citizenry. For example, Alaska has tried many times, and succeeded sometimes, to build a road to nowhere; despite the cost and uselessness of the project.

http://www.nytimes.com/2005/08/20/opinion/20lende.html

http://www.latimes.com/opinion/op-ed/la-oe-babbitt-road-to-nowhere-alaska-20140311-story.html

In this case, there is data yet no improvement. The main problem with governments is self-interest and improvement in power and wealth. The governments in most “third world” countries do not answer directly to the people. Even if the people had a problem, there would be no need to listen. Powers are not motivated to act unless there is a threat behind the complaint. For instance, my apartment is brand new and yet the dishwasher was installed so that it only opens half way. Due to lack of communication and intelligence, the dishwasher hits the stove. Even when I make a complaint, maintenance refuses to fix the mistake. When my father makes the complaint, he is the one who pays, people are more inclined to listen. Still the problem has not been fixed. It is the same with every problem in these countries.

Even if the government wanted to listen, growing population and increasing national debt discourage any infrastructure improvement. Most “third world” countries are heavily in debt to their former colonizers that they cannot afford to fix their own problems.  Instead, the poor economic system allows foreign countries to manipulate the poorer workers into working for less. Very few outside organizations will help improve the infrastructure as it is their source of cheap labor.

The fact is, the average citizen has so much working against him that reporting his problems in the correct manner is useless. The only way I can see change in the flooding of India is if a major factory or foreign company becomes water-logged. Only with influence from a rich source can improvements be made; either that or one enormous rebellion. I can only see data collection as solving moderately small communities in wealthy countries; Neighborhoods in a city for example or cities in a county. Places where citizens might actually come face to face with the person in charge of their well-being.

What Doesn’t Belong?

what doesnt belong

http://www.demotivers.com/5412/Who-Doesnt-Belong-Here

I was struck when reading Madrigal’s article by the phenomenon at the end which he dubbed the “Perry Mason effect.” It instantly made me think of these humor posters about which of these things don’t belong? It was incredible that in a categorization system with literally tens of thousands of genres, that such a strange little hiccup could occur in what one would consider a relatively important category: most popular actors. Plus, this weird occurrence was not linked to recommendations made to Netflix customers, nor did it indicate that tons of people were watching Perry Mason episodes or movies featuring Raymond Burr. In fact, it was just something that happened during the process of using human preferences, fed into a computer, to create these altgenres. There is really no explanation for the Perry Mason effect. Yet when extrapolating this to wider fields in Digital Humanities, I think this occurrence of computational serendipity may be one of the reasons that humanists are so drawn to analyzing their data with machines. The strange feed line of research to computational model or analysis, back to human presentation elucidates incredibly interesting “Perry Mason effects” which the researcher alone would not have seen.  However unlike Madrigal, I believe in some cases of research the explanatory reasoning behind the “something in the code and data” can be traced and found incredibly useful by the researcher.

For instance, archaeologists have been feeding information, spatial and quantitative data about artifacts, into databases and mapping programs to show distribution patterns over a whole site or region. Often, nothing strange happens in the translation of the data back to human presentation (the final map for instance), and it shows generally what it was expected to. But in some instances, new spatial relationships, groupings, etc. come to light during this final stage which were not readily apparent, either in the field or straight out of the field notes. Because these computer systems/programs are mechanical, they help the human researcher to investigate the data without our inherent biases and expectations (though those might still be present in the data itself), and let us see things that we would not have otherwise. Usually in these cases, once the “Perry Mason” effect has been identified, it is possible for the archaeologist to retrace how/why/where this might have happened, and to outline something about the site or culture that may otherwise have gone unnoticed.

Archiving Documents: Preservation of History and Reference For The Future

The best part of visiting the libraries is walking through the stacks full of books in a fast pace, running through the the small, white call number stickers with my finger and looking for the exact call number I have scribbled down on a small piece of paper. Going through such vast yet narrow space full of information makes me feel like I’m on a treasure hunt.

In the world that is turning increasingly digital with an astonishing amount of data collected and saved  everywhere everyday, it is important to have a system that allows us to navigate through the pile of information in an efficient and effective manner. And more, it is also highly significant that we follow that system not only to find things but also when we are archiving them. Julia Gaffield’s search through the atlantic countries seeking for Haiti’s Declaration of Independence that had gone missing for centuries is an example of the significance of the archiving systems that connect our past to our present and future.

The main problem of navigating the Declaration of Independence of Haiti wasn’t only that the people who archived the document did not follow the same archiving system that we use today, nor was it that the people searching for it had no evidences in where to look for the document. It was more because while digging through the history the people were thinking in today’s terms rather than when that of the time when the document was archived. Centuries ago when the declaration left Haiti, the world wasn’t divided into countries as we know today, but it was more intertwined and connected through different colonies, people constantly traveling through and between them, transporting and trading goods. Gaffield was able to find the missing document that the historians and government officials could not navigate for years because she did not use today’s archival search as it is but understood how it was built and dug through the archiving system’s history along with the missing document’s history.

Navigating through data will only grow more complicated with an ever-growing amount of information pouring in and storages running out of spaces, but at the same time it will grow easier and more efficient as we adjust to the system. National Archives of Malaysia has not only built an official portal for government servants, students, researchers, and general public to find documents easily on their website but also offers consulting, virtual tours, and seminars on how to use the archive.  United States’ National Archives portal also allows the public to look for documents by people, places, foreign policy, events, etc. along with their founding documents. The use of data and technology has brought us closer than ever to our history, and it will only continue to do so.

 

References: