Week 6: Twitter Network Analysis

In his blog post, Scott Weingart offers his explanation of how a network works. He gets down to the fundamental basics of what makes a network and simplifies them into terms even people with little-to-no background in math can understand. He acknowledges the flexibility of the network as a tool, which can be applicable to any data studied within network analysis. However he warns that the network tool should not be abused and should be selectively used. He also warns that the appropriation of data, especially when it comes to that of humanities scholars who are, “often dealing with the interactions of many types of things, and so the algorithms developed for traditional network studies are insufficient for the networks we often have.”

Weingart breaks down the components of any network to ultimately be simply “stuff and relationships.” These components are interdependent and their connection cannot exist without each other. He then performs a run-through of a very simple network formation starting with books as nodes before connecting their variety of attributes to form relationships, or edges.

After reading Weingart’s post, I came across an impressive, preliminary study on Twitter’s billion-scale network conducted by Masaru Watanabe and Toyotaro Suzumura of Tokyo Institute of Technology. While reading through their process and concluding results, I was delightfully surprised at how I was at least able to identify key terms in their comprehensive study after reading Weingart’s explanation. This made it a lot easier to approach Watanabe and Suzumura’s study which I found to also be fairly straightforward in general. The process of their study however is far from straightforward and seems highly complex considering the amount of data they collected among 469 million users between the months of July to October 2012. They categorized the data, which included follower-friend information, into two formats, XML and CSV and also utilized the analysis tool, Apache Hadoop and later HyperANF to compute the degree of separation.

Inspired by a Facebook network analysis study conducted by Lars Backstrom who managed to compute a degree of separation using graph analysis tool, HyperANF and resulted in a surprisingly low number of 4.74. The network structure of Facebook is more friend-based that resembles the way human relationships work in the real world, while a social graph like Twitter is based more on interests and differs from Facebook due to its directed graph. Directed graphs allow everyone to follow someone freely while an undirected graph like Facebook requires approval. Considering that Facebook and Twitter have a different network structure, Watanabe and Suzumura analyzed Twitter’s network in an attempt to compute a degree of separation and diameter. Interestingly, both degree of separation and diameter are used to measure networks in terms of scale and graph. Degree of separation is found by the average value of the shortest-path length of all pairs of users while the diameter is given through the maximum value. For the data collected within July to October 2012, the analysts concluded in a degree of separation of 4.59 in the Twitter network. After reading a simplified explanation of what makes a network and its analysis, it was interesting to see this applied to a large-scale network like Twitter.

http://www2013.wwwconference.org/companion/p531.pdf