In the online-guide “Data + Design”, various authors collaborated to discuss the complexity of comprehending and organizing various forms of data. Alistair Croll’s piece on data aggregation was particularly interesting to me because it was the first document I had ever seen that grouped and explained different kinds of ways data can be combined and explicitly delineated the logic behind the rules of these combinations. Particularly striking was the piece’s definition of “summable multiseries data”. A group of data connected by their representation of a larger statistic, these “subgroups” are often more fickle to identify and arrange than they seem. Using their example of coffee consumption, a statistic on how many cups were served to men and how many cups of regular cups of coffee were sold cannot be compared because their basic subgroups (for a visual aid, think of subgroups as “graph axes”) are not the same – one breaks down consumption by gender, the other by the kinds of coffee purchased. Even further, as Croll demonstrates, these figures cannot be leveraged against each other to “back” into the statistics of another subgroup. For example, just because you know 36.7% of cups were sold to women DOES NOT mean that 36.7% of regular cups of coffee were subsequently sold to women – those two figures did not correlated with each other and thus, do no indicate something about the other. Thus, data and context are equally important in statistics.
However, as the article points out, subgroups and categories are strictly anthropological. While working on a set of important excel data, I once made the mistake of selecting every piece of data to generate a graph, instead of the more specific set of data I intended to work with. As a result, I got an unintelligible set of strings of lines instead of the orderly, legible graph I was expecting to work with, similar to the image in the above. While I immediately registered the graph as incorrect, Excel however, never once issued an error signal. Thinking the graph was an accurate amalgamation of the data it was fed, Excel couldn’t tell the data I selected did not make sense and proudly generated the tangled lines I had before me – slapping one line charting evaluations cores over time over another line plotting satisfaction per class. While Excel is very good at interpreting data, human logic is obviously a whole other ball game it is far from winning.