James Cameron’s famous movie Titanic (1997) have brought many people to tears when the “unsinkable ship” hit an iceberg and sank to the bottom of the Atlantic Ocean, ending the short-lived romance between Jack and Rose. In the film, many have lost their lives, and those who were lucky enough to climb onto the lifeboats were mostly women and children. To examine the proportions of women and men that have survived, I have chosen the Titanic Dataset. This dataset has 4 variables: class, age, sex, and survive, and it has 2,201 records. These variables are actually dummy variables, which means that the numerical values are simply codes for categorization. Thus, for class, 0 = crew member, 1 = first class, 2 = second class, and 3 = third class. For age, 1 refers to adult, while 0 refers to child. For sex, 1 = male and 0 = female, and for survive, 1 = yes and 0 = no. According to Nathan Yau, the ingredients to a data visualization are visual cues, coordinate system, scale, and context. As you can see from the data, we can only compare between categorical variables, since the only continuous measure is the number of records. Therefore, for visual cues, I combined the length and the color aspects along with a cartesian coordinate system to create a side-by-side stacked bar chart to best represent the data using Tableau Public.

Screen Shot 2015-10-26 at 1.02.09 PM

The data has been divided into class, gender, and survival. We can now compare the groups that have either survived the voyage on Titanic or not in a side-by-side chart. And within these groups, we can also compare between classes of the passengers. The classes have also been split into gender (blue for males, red for females).

There are more people who did not survive than those who have survived, as indicated by the average line in each pane. And among those who have survived, more crew members survived than the rest of the passengers, including the first-class. More females have seemed to survive than males in each of their respective classes. We can also see that there were barely any females among the crew members, which explains the disproportionate amount of males that have survived in that group.

Screen Shot 2015-10-26 at 1.12.54 PM It is interesting to note that among those who did not survive (0), the crew members (0) and the third-class passengers (3) lost the most lives, while the first-class group lost the least amount of people. And the third-class lost the most female lives out of all the other classes. Perhaps, social class played a big part in determining the passengers’ survival.

Because the data is coded as dummy variables, it is hard to see any pattern or relationship in the data without seeing a visual representation of the data. It was helpful that the codes were defined in the dataset, but it is difficult to make meaning out of these binary codes without data visualization. Binary codes and dummy variables are useful when it comes to recording data quickly and efficiently, but data visualization puts context into the data, making it possible for humans to read and understand the data. Charts and graphs, such as this one, show us what the data is trying to tell us. And in this Titanic Dataset, the data shows us the proportions of passengers that have or have not survived based on their age, gender, and class.