The use of data visualization aids the viewer in identifying patterns that may not be recognizable in dataset form. This week, I decided to look at the “Body Fat” dataset, which is a compilation of data, from the Journal of Statistics Education website, for 252 men regarding their percent bodyfat measurements and other body size measurements.
To visualize the data, I chose to use RAW. Using its scatter plot option, I set “weight” as the x-axis, “bodyfat” as the y-axis, and “IDNO” and “age” as labels (separated by a comma and denoted on the graph as “IDNO, age”).
The result looks like this:

Previously, I predicted that the more the person weights, the higher their bodyfat percentage. This graph, which shows an upward, increasing trend, generally supports this hypothesis. The data visualization does, however, indicate instances of outliers and other extremities that you may not be able to see in a spreadsheet of the data.
Next, keeping the y-axis the same (“bodyfat”) set the x-axis as “height”:

Interestingly enough, although you can observe the proportion of body fat with height, you can also pull other interesting information off this visualization. For example, the average heights for these men seem to be between 65-75 inches (5’4”-6’, with a median around 5’8”). This information would have been a bit more difficult to obtain from just looking at the dataset.
To make it more complicated, I decided to change the x-axis to “neck,” leave y-axis as “bodyfat,” color-coded “age,” and left the “IDNO” as the label. The colors corresponded to 5-10 ages per age groups; what I mean is: 22-29 yo (red), 30-39 yo (purple), 40-49 (green), 50-58 yo (hot pink), 60-69 yo (orange), and 70, 72, 74, 81 yo are blue.

From this graph you can say that as the percentage of bodyfat increases, so do the neck measurements. Looking at the 40-49 yo block (green), I can see that this age group is more widespread, a bit more dynamic and larger in numbers compared to the other age groups.
If I continue to change the x-axis to the different body measurements, I can observe the different proportions of body part to percent body fat. For the most part, the trend continues to be increasing, so the measurement of the body part would increase as the percent body fat increases. This trend supports the purpose of this dataset as a reference for estimation of bodyfat percentages based on specific body measurements, “the goal is a regression model that will allow accurate estimation of percent body fat, given easily obtainable body measurements.”
(Apologies for the blurry snippets of the graphs, trying to embed the images kept causing my browser to crash. )’: )