This week I decided to write my weekly blog post on the Gender Breakdown of City Workers by Department listed on the Controller Data website for the Los Angeles county. It is a visualized list of the career fields and departments, and their pay, with their respective percentages of the gender breakdown of the workers who perform in that field of work: ie what percentage of the workforce in that particular field are male and what percentage are female.

It is a very well presented data set and is a good visualization of the question it wishes it wishes to answer. One of the more interesting tidbits of info that can be taken from the list is the salary of employees as it is split between men and women. Which some exceptions, it seems the data set confirms the notion that women, on average, are paid a percentage less than the men working the same jobs. With that said, the list doesn’t delve into the specific jobs of any one field or department, so in reality it is not a good data set to use when trying to make any specific claims on gender-pay inequality. Also not listed is the seniority or experience of the employees who are paid more, which might be male due to historical circumstances, which might explain to some degree the lower pay of women.

Regardless, it is a dataset that does what it sets out to accomplish and does it effectively. Perhaps a more in-depth version of this same dataset could be made that COULD provide more details on specific occupations within the department, or perhaps another dataset that would be interesting and help explain away some of my earlier questions could be one that measures the amount of years worked vs the level of seniority of employees of said department. For example, it would be interesting if, on average, it takes women longer to be seen as highly respected employees and gain higher levels of employment, or if the reverse is true and it takes men longer to rise in their career field.

Ultimately the dataset, while its data should not be taken at face value, is great in the sense that it causes educated readers to ask a lot of further questions about the data. But readers should be careful, like they should always be careful while reading ANY dataset, not to jump to conclusions too soon without asking further questions about the data.