Class Blog – Page 22 – Digital Humanities 101

Blog post 3 dataset

For my third blog post I decided to explore the Los Angeles City Payroll Calendar dataset. The data for each column entry includes year, month, month number, day, date and description. The description is what differentiates the data types between entries. The data types are broken down by payday types within a payroll calendar schedule that are important to the pay cycle so employers and employees know start, end, and holidays to account for.

A record is considered what day in the payroll cycle the entry is. The 5 types of days that matter in a payroll calendar are holidays, paydays, excess sick pay days, no deduction paydays or the day marking the end of the pay period. There are 255 records starting with January 2013 until December 2016.

Wallack and Srinivasan believe that the miss-interpretation of activities creates a divide between communities and the state because they emphasize different aspects of an issue. This inconsistency between the state and the community can represent an issue much differently than the people who are affected feel about the issue. They argue that ontologies are a shared platform in which individuals become part of a greater group; simultaneously this means that ontologies will create exclusion between groups. This will create divides between the groups understanding or interpretation of an event. The government or employers will have a different understanding for this data than the employees or lower class.

The data is organized chronological by year then month then day within the month based on its payroll type. Using this definition, I would say that this data’s ontology is based on organizing financials for companies to keep their payroll expenses easy to reference and verify.

This ontology seems to make most sense to an accountant. The chronological order of dates with the significant days within the payroll are most relevant to the accountant who would need to know the ending and beginning of pay periods and they go through the dates in that sequence.

This does not have much detail about which companies is following this payroll schedule, they also leave out any sort of numerical values if you wanted to compare pay periods to analyze trends within pay periods. I also would assume that different companies who have varying religious practices or are not a typical corporation would not all have this same payroll cycle. This dataset does tell me the generic payroll cycle for a company in LA and where the main control panel for organizing important dates is.

If you wanted to look at this from an economists perspective you could organize the payroll calendar by how much money was being paid within each of the pay periods, and could organize by company type and the company name with their total payroll expense for each payroll period to observe which months, weeks tend to have the most pay or when is the least. This information could be used to balance a company’s budget.

Analyzing Los Angeles City Payroll by Department

The dataset I chose to look at is the All City Departments by Payroll dataset for Los Angeles in 2015. It contains the data types of string, integer, and double (float). A record in this dataset consists of Department Title, Year, Job Class Title, Projected Annual Salary, Q1 Payments, Q2 Payments, Q3 Payments, Q4 Payments, Payments Over Base Pay, % Over Base, and Total Payments.

Based on the Wallack and Srinivasan reading, ontologies are “systems of categories and their interrelations by which groups order and manage information about the people, places, things, and events around them”; essentially, they are ways of defining realities surrounding an individual or group. In the case of this dataset, it seems the ontology is concerned with categorizing the data, namely salaries, based on city departments.

This data would be of most useful to someone whose main concern is finance, as that is what the dataset really highlights. For example: how much the city is spending on paying workers in total? Which department is spending the most money? How do these figures compare to other cities of a similar size? How do these figures compare to figures in past years? Specifically, this dataset focuses on the differences in total salaries for the workers in each department, so someone interested in the distribution of the city’s money may also find this data illuminating.

I think one phenomenon that this dataset shows is that the LAPD salaries take up almost ¼ of the total amount paid to city workers. This leads to a lot of potential questions, such as: how does this percentage compare to other cities? Why is it such a high percentage of the total amount of money spent on salaries? Overall, I think this dataset really highlights the stark differences in the total salaries of all the city departments. However, what is left out of this dataset, which I think is really critical, is the size of the department, i.e. how many workers does each department have? Without knowing this information, it is impossible to tell what exactly is causing the discrepancies in total salary between departments. Does the LAPD have many more workers than the other departments? Or are policemen being paid more than the average city worker? Without these numbers it is impossible to say.

If I was starting over with data collection with a different ontology, I would again look at finance and look at the salaries of city workers, but I would organize it not based on department but based on position within the department. So, instead of comparing departments against each other, the dataset would focus on comparing the salaries of those in a higher position in the department to those in lower positions and look at the distribution of the total salaries between these groups.

Exploring Los Angeles Payroll

I chose to explore the City Payroll Data of Los Angeles, which provides the payroll information for all the city departments since 2013. The information is organized under various data types and is put into a spreadsheet, which is updated after payments for each quarter are made.

The data types include year, department title, payroll department, record number, job class title, employment type, hourly or event rate, projected annual salary, Q1 payments, Q2 payments, Q3 payments, Q4 payments, payments over base pay, % over base pay, total payments, base pay, permanent bonus pay, longevity bonus pay, temporary bonus pay, lump sum pay, overtime pay, other pay & adjustments, other pay (payroll explorer), MOU, MOU title, FMS department, job class, pay grade, average health cost, average dental cost, average basic life, average benefit cost, benefits plan, and job class link. Details about the data types can be found by hovering over the information button under the column headings. Each row of the spreadsheet presents a record of each city department employee in this dataset.

According to Wallack and Srinivasan, communities and states present information about people, places, things, and events around them, organized in the form of ontologies, which are essentially “systems of categories and their interrelations”. The dataset is a meta ontology, as it is a community-based ontology and a large-scale dataset with numerous quantitative indicators, looking at payroll for the Los Angeles departments.

This data would be most useful and illuminating to city department employees, enabling them to explore and compare the payroll of other employees in various job titles and departments. It would be easier for them to go through the information, as they are able to understand the financial data types and terms used in the dataset. I think it’s also important for residents to explore this dataset and grasp a better understanding of the payroll of city employees and the imbalance of taxpayers’ contributions amongst the departments.

If I had to reorganize the dataset, I would focus on the annual salary and the hourly rate, organizing them from the highest to the lowest. It would also be interesting to organize the data according to job titles. The reorganization of the dataset could make it more focused on identifying which departments are being more favored in terms of payroll. As for data collection, I would like to include gender and race indicators to gain insight on how these data types influence payroll for city department jobs and how sexism and racism still exist in the workforce.

Blog Post, Week 3

I begin by analyzing the dataset, All City Departments by Payroll, which includes the following data types: Department Title, Year, Job Class Title, Projected Annual Salary, Q1 to Q4 payments, Payments over Base Pay (Including Bonuses and Payouts), % over Base Pay, and Total Payments. There are 56 total records within the dataset, to match the number of departments in City Hall. In understanding the dataset, a few discrepancies and inconsistencies were noted. For instance, they summarize the dataset as “Payroll information for all Los Angeles City Departments since 2013. Data for calendar years, updated on a quarterly basis by the Los Angeles City Controller’s Office.” However, the datasheet only includes information from 2015 and no other year. Next, the category “Job Class Title” seems to be mislabeled. Every other category has a description attached and appears to be self-explanatory. The “Job Class Title” section, on the other hand, consists of numbers rather than any title names. After double checking with the affiliated dataset, “Payroll by Job Class” which utilizes the same category name to describe position names, I concluded the section was mislabeled and instead indicated total number of employees in the department.

After reading through Wallack’s and Srinivasan’s analysis on ontologies, I determine the dataset to be a ‘meta-ontology’ or a state-created information system rather than a community ontology representing local needs. This is apparent by the mission of the collecting agency, City Controller’s Office, along with their choice in categorizations and descriptions used. For instance, they describe the Projected Annual Salary in terms of Budgeted Pay Amount, used for pension contribution calculations indicating the purpose of this dataset is for budgeting and administrative efficiency purposes rather than to represent any community concerns. As a result, there is significant information loss due to the mismatch between this meta-ontology and the community ontology it could have been.

Although this meta-ontology is most useful governing and administrative bodies, the dataset is accompanied by a visualization portraying outcomes that would be of interest to local communities. The chart (as shown below) reveals that the Los Angeles Police Department has the highest payroll expenses. I can imagine various local community groups utilizing this information to further advance their mission or advocate for certain reforms. For example, some groups or public servants may utilize this information on LAPD to highlight how the city prioritizes public safety while other groups may interpret the information as an example of how tax money is being spent on over-surveillance of communities rather than other services, such as health and human services.

screen-shot-2016-10-16-at-8-32-16-pm

However, while this state meta-ontology includes data of interest to the public, there is a huge information loss and does not depict the full picture – allowing for manipulation of data as depicted below. For the information to be of most use, I would expand from the budget and administrative viewpoint in collecting data to one representing community ideals. The current status of the dataset makes it difficult to make any judgments for why payroll in certain departments is significantly higher than others and what this means for individual employees. To counteract this, I would include information such as: average employee salary; largest income discrepancies between employees of the department; demographics of employees including ethnicity, gender, experience and education level; percentage of payroll paid by tax dollars; and historical trends in payroll expense and number of employees to represent any significant increases or decreases by department. Including such information would make the dataset more relevant to community groups who want to analyze the social and economical outcomes of payroll expenses.

Blog 3: Neighborhood Council Expenditures

For this week’s blog post, I chose to explore the Neighborhood Council’s Expenditure for the fiscal year of 2014 through the L.A. Controller’s Office. I thought this would be an extremely interesting section to explore as I would get to know the “behind the scenes” aspect of what types of things Neighborhood Council’s invest in.

Some of the data types included in the expenditure includes the name of the neighborhood council, date of the purchase, description of purpose, vendor, spending category, task, and the amount spent (expenditure). There are records which go under each of these specific data types.

Srinivasan’s and Wallack’s article highlights the discrepancy that is too often found between ontologies of the state and local communities. Through their definition, this becomes obvious as the dataset truly reflects a state ontology. As they describe, “State data systems are the infrastructure of administration” (Srinivasan, 1). Thus, the data is found to be very structured, with only 7 categories used to fully explain neighborhood council expenditures. It is easy to see that this ontology makes the most sense to a state’s point of view as the information can be easily found through “spending categories.” From this dataset, I can only see a cut and dry version of what goes into the upkeep of a neighborhood. A community ontology is notably absent as no history is provided regarding specific neighborhoods or what conditions are unique to each of them that would draw a connection to the “why” of each expenditure. For example, an “ANC-Narconon Drug Prevention and Education” program is purchased and put under the simple “Task” of non-profit. There is no information about why this program was brought in or its influence on the particular neighborhood. In addition, there are several “operational expenses” that only offer a vague description of “MISC PERSONAL SERVICES” without giving any more detail of what these specific services are. Overall, this ontology reflects the state’s need to be concise and to the point, without offering any community perspective that could highlight the diversity of each neighborhood.

If I was starting over with data collection, I would create an ontology based on the community’s experience, and therefore their unique reality and what they are surrounded by. I would make sure to include the history of each neighborhood and more detailed reasons for the necessity of purchasing specific things. Most importantly, I would highlight both the immediate and long lasting impact each purchase has on the members of its neighborhood. Thus, the state would come closer to having what Srinivasan and Wallack describe as an “effective engagement with communities” that would prevent an overshadowing of citizens’ concerns.

Examining Police Expenditure

Before diving into the creation of my own blog post for this week, I examined the ones already posted by classmates to read about the different datasets they chose from the L.A. Controller’s Office platform. I enjoyed this one especially because it used the concept of digital division discussed by Wallack and Srinivasan as a lens to evaluate the platform itself by noting how the popularity/value of each set (measured by # of views) could be an implication of the “mismatched ontologies between the state and individuals.”

Drop Down Menu from L.A. Controller Panel

Taking this notion into account, I used the drop down menu on the left to view the popularity of the different sets this past week, month, and year. As the police force continues to be a large part of national discussion, I decided to choose the Police Expenditures dataset. I think it is particularly reflective of public sentiment (and thus possibly community ontology) that this dataset was one of the most viewed for the year (although not for this week or month).

As denoted by the title, this dataset contains information regarding the police force’s expenditures from June 2011-January 2014. There are 25 data types consisting of ID Number, Fiscal Year, Department Name, Vendor Name, Transaction Date, Dollar Amount, Authority, Business Tax Registration Certificate, Government Activity, Fund Group Name, Fund Type, Fund Name, Account Name, Transaction ID, Expenditure Type, Settlement/Judgment, Fiscal Month Number, Fiscal Year-Month, Quarter, Calendar Month #, Calendar Month Year, Calendar Month, Data Source, Authority Name, and Authority Link. A record for this dataset is a single purchase (expenditure) made by a department within the L.A. Police Force.

Wallack and Srinivasan first discuss the importance of ontology by stating, “Ontologies represent reality, but this representation of information may in turn become the basis for actions that in turn shape reality” (3). They then proceed to delineate the differences between meta ontologies and localized, community ontologies and the consequences when there are discrepancies between the two. Based on their definitions, I would say the Police Force Expenditures dataset is a meta ontology based on the criteria of its data types. For example, ID Number, Business Tax Registration Certificate, Fund Group Name, Fund Type, and Transaction ID are all types that make little to no sense to local citizens. These types are only understood by government and city officials that work with and understand the particular taxing and funding protocol. Although it should be noted that other data types are able to be understood by non-government workers such as the Dollar Amount and Vendor Name.

This dataset can tell you about the money that was spent by a police department, for each record indicates a specific expenditure made by a department with a dollar amount and to which vendor it was paid for. However, the item or service that was purchased is not specified, only the “Government Activity” it was used for, which is rather broad considering nearly all of records have “Protection of Persons and Property” as the reason for activity. Thus, I think this dataset is only informative for quantifying purchases for the different departments, rather than trying to determine exactly what each spends their money on specifically.

If I were to create a local, community ontology on Police Expenditures the data types would not look entirely different, rather I would take away and add a few data types. I think the general public would primarily want to know an itemized list of the specific goods and services purchased by the department from the vendor and a more informative reason as to why such purchase was made. I would still collect the same data regarding the dollar amount, vendor name, department, and fiscal time periods; tax registration and identification numbers are not essential for the community ontology.

Control Panel: Gender Breakdown

The Gender Breakdown of City Workers by Department dataset consists of information represented in integer and character values for an “analysis of 2015 full-time employee earnings by gender across the various Departments of the City of Los Angeles”. Recorded in this dataset are column values such as the year, department title, employee, total payroll #female, #male, %female, %male, female total salary, male total salary, female average salary, male average salary, etc. According to Wallack and Srinivasan in “Local-Global: Recovering Mismatched Ontologies in Development Information Systems”, a dataset ontology represents reality, “but this representation of information may in turn become the basis for actions that shape reality”. The authors note the heightened problematics this dynamic introduces when state’s typically have more power to affect communities than the other way around through action based on state meta ontologies that are either inadequate or incomplete.

In the case of the Gender Breakdown Dataset, those who are likely using this data the most as an empirical representation of reality are social scientists, and policy makers and employers. While this dataset makes a good argument for the gender pay gap where “women” or “females” are paid substantially less than “men” or “males” across vocations in the year 2015, it is predicated upon the gender binary and does not necessarily account for individuals who have made gender transitions, potentially excluding an ontology that would include gender non-conforming or transgender individuals. When we talk about gender pay gaps, we should be talking about how differently the gender pay gap and the sexism at its root affects women from different races, classes, and citizenship status among other social categories in the workplace. Another absence might be that the dataset only includes statistics from 2015 limiting the ways we can read the current state of the gender pay gap in relationship to past years.

It should be noted, there is a tab in the dataset titled “discuss” which makes space for comments and discussion around the dataset, but which has not been used at all. In“Local-Global”, one way to mitigate the disparities between official portrayals and statistics about the gender pay gap and workers’ understanding of that context (including that of everyday sexism in the workplace) would be to add a feature on the Control Panel dataset for Gender Breakdown that would explicitly request comments and updates from the people the data affects about how the data is portrayed and what might be missing. But making the raw data directly available to those it affects would be one way to go further.

Blog Post 3: Gender Breakdown of City Works by Department

The “Gender Breakdown of City Works by Department” data set is a collection of information regarding the proportions of males and females working in different government departments. It also includes other data examining total payroll per department, total male and female salaries compared, average salaries per man and per woman, and the percentage of each department’s payroll that goes to males and females. For each of these categories, the data can be sorted into pie charts, line graphs, tree maps, and other forms of visualization. It is easy to discover, through manipulating the data, how many men versus women are employed overall and by department, and how their total and average salaries compare. A record type in this data set would refer to all of the information compiled from a particular department of city works.

screen-shot-2016-10-16-at-6-56-18-pm

According to the article by Wallack and Srinivasan, the meta-ontology of this data set was created by the city to monitor the hiring demographics of each department according to gender and how their salaries compare, in order to ensure that there aren’t any alarming disparities that might point to gender discrimination. However, this data gives us no context as to the culture within each city department and whether or not that culture impacts the gender ratios and salaries of that department’s employees.

A city policewoman might claim that women who want to be police officers face discrimination (28% women in the police department), while a fireman might claim that men have a difficult time getting hired in safer and more stable departments (finance and city attorney are both majority women) and are instead shunted into departments that are more dangerous, low-paying and labor-intensive (sanitation, fire and building safety are all overwhelmingly male).

From the point of view of a person who is advocating against gender discrimination in the work place, there is a lot of useful data in this data set but also a lot of information that has been left out. The fact that a man’s average salary is higher than a woman’s average salary in very nearly every city department would be concerning to this person, but there are other factors that could be skewing this data. For example, a record that shows one of the greatest disparities between male and female average salaries – the Department of Convention and Tourism Development, with $145,000 for men and $54,000 for women – only has fifteen employees, which suggests that the disparity would be significantly smaller if there were more employees to balance out the data.

For this person, information showing the number of discrimination complaints lodge annually per department would be useful, because that data could then be compared to gender disparity in the departments to see if there is a correlation. Additionally, data showing the proportion of males and females in positions of leadership in departments would be useful, as it might provide more social context.

Payroll by Job Class – Blog Post 3

The Payroll by Job Class dataset from the L.A. Controller’s Office includes 34 different data types: Year, Department Title, Payroll Department, Record Number, Job Class Title, Employment Type, Hourly or Event Rate, Projected Annual Salary, Q1 Payments, Q2 Payments, Q3 Payments, Q4 Payments, Payments Over Base Pay, % Over Base Pay, Total Payments, Base Pay, Permanent Bonus Pay, Longevity Bonus Pay, Temporary Bonus Pay, Lump Sum Pay, Overtime Pay, Other Pay & Adjustments, Other Pay (Payroll Explorer), MOU, MOU Title, FMS Department, Job Class, Pay Grade, Average Health Cost, Average Dental Cost, Average Basic Life, Average Benefit Cost, Benefits Plan, and Job Class Link. A record in this dataset consists of a row in the spreadsheet, which shows every data type for one individual employee of a Los Angeles City Department.

Wallack and Srinivasan differentiate between the ontologies of “state-created information systems,” or meta ontologies, and local communities’ ontologies. Because the Payroll by Job Class dataset does not appear to take local contexts into account in organizing and presenting the data, it seems to be a meta ontology. The data is organized alphabetically by Department Title, but the data within each department does not seem to be in any particular order. This makes it difficult to compare the data within each department in order to discover differences among those with the same or different Job Class Titles. For instance, the first record in the dataset has the job class title, General Manager of Aging Department. The twelfth record has the same job class title and appears identical to the first record in several of the data types, but differs in data types like Q2 Payments and Base Pay.

The difference between the first and twelfth records likely has a clear explanation for officials at the L.A. Controller’s Office. The dataset is probably most intelligible to city employees, as it incorporates department and job titles, as well as financial terms, which they encounter on a daily basis. Someone who works for the city government likely approaches the dataset in pursuit of very particular information that they already understand on a basic level, such as the projected salary of the General Manager of the Aging Department. Even though there is such a great quantity of data and it is organized alphabetically, a city employee knows the context well enough to find the information.

However, it seems more difficult for those outside the city government to place the data in the context of their daily lives. For instance, how are L.A. residents supposed to discern the difference between the first and twelfth records when they are so similar to the untrained eye? How can they decide if the hourly rate for each position is adequate compensation for the work, or if a certain supervisor is justified in earning twice his/her subordinate’s salary? Are they likely to look through the entire dataset, or will they accept the first set of records (for the Aging Department) as representative of the following records? Even though the dataset claims to lend insight into payroll by job class, it is surprisingly difficult to discern the cause or meaning of the salary differences. It might help community members to interpret the data in social terms if information like the race or gender of the city employees were included with their financial information.

I think many L.A. residents might find this data interesting because it provides information about which city employees earn the most and the least. This in turn provides insight into the distribution of taxpayers’ contributions to various departments and individuals within those departments, which could prove controversial. If I were organizing the data around this ontology, I would reorganize the hourly rates and projected salaries so that they appeared in descending order, from highest to lowest. During data collection, I would also attempt to trace where the funds for each salary were obtained and to determine who decided which funds would be diverted to each department. This might allow an L.A. resident to determine whether the funds are distributed fairly or whether they believe some departments or individuals are unfairly favored over others.

Blog Post Week 3

For my blog this week I chose to look at the data set titled “Payroll by Job Class.” The general purpose of this data set is to show the amount paid to different job types. This naturally lends itself towards a comparison between and analysis of what job categories get paid the most and why that may be.

There are 34 data types including: year, department title, payroll department, projected annual salary, payments over base pay, base pay, overtime pay, average health cost, pay grade, benefits plan, and several more. A record within this data set would therefore be a new row which includes an entry under each these 34 data types.

In their paper, Wallack and Srinivasan explain how a lack of compatibility between ontologies used by the government and those used by communities can result in serious consequences. They also provide several strategies that can be used to combat these “mismatched ontologies.” After consulting the definitions which they provide, I believe that this ontology was created from the state’s point of view and very much mimics government records and the ontology used to create them. This ontology makes the most sense of the government’s point of view. It documents mostly quantitative measures on the amount of pay for different employees. The fact that it takes into account the cost for the city to provide insurance and other benefits for the employee is an indicator that this data set would be most useful from the state’s point of view as it addresses information that the state would be interested in.

This data set attempts to explain several different phenomena. It looks at what kind of roles get paid the most and which departments have the higher paid employees. Information about which roles tend to cost the state the most in insurance and health benefits can also be gleaned from this data set. Furthermore, a person viewing this data set can clearly see which roles tend to receive more bonuses and extra pay.

However, while this ontology caters to the information needs of the government, it fails to provide some points of information that the community citizens would find to be useful. Especially interesting data points would be the demographics of the employee including gender and ethnicity. Also interesting would be the level of education of the employee because then the viewer of the data set could analyze the relationship between level of education and pay. If I were to construct this data set using a completely different ontology I would construct it from the viewpoint of the citizens and I would address all the data types that were left out. Particularly, I would emphasize the age and education levels of employees. Also, I would want to have a clearer idea on the proportion of taxes being spent on the paying of government employees.