Class Blog – Page 19 – Digital Humanities 101

Wordle on the NYC Tenements

Creating visualizations are often hard if the data being used is tricky. This is exactly what happened with this week’s blog post. I decided I would attempt to get a head start in our big project by using my data, as it would give me an excuse to really look into the data and make a visualization. My data consisted of about 1100 photographs of New York City Tenements taken by inspectors between the years 1934-1938. The issue with the data is that instead of there being a hyperlink for each photograph, there is a permalink that takes you to the collection website and shows you only that individual photograph (there is no scrolling function on the archive database). Additionally, because the label on all of them are “NYC Tenements” and there are only 5 different year options, I decided to use the notes. The notes, on the other hand, had a lot more information that could actually be used to create a visualization (disclaimer: I am sure I can create better digital representations of my data once we have moved further along in the course).

In the notes, there was information about the picture itself, such as “baby sitting on a bed”, generic information about what the photograph showed, such as “storefront”, and even the address of where the photograph was taken. With this, I copied all of the notes and pasted them onto the Wordle database. While I waited for Wordle to create a “word cloud” of the most common words found in the description notes of over 1100 data entries, I expected to see words like “storefront” or “child” or “st (because of the addresses” be bigger than the rest. Instead, it made me think about a whole other aspect of my data that I had not even considered exploring.

When the cloud arrived, these were the huge words: Manhattan, Brooklyn, and Bronx. That’s when I thought that maybe instead of focusing so much on what was in the picture, I could categorize them according to where in New York the picture was taken. I already had previous knowledge that those were neighborhoods in which immigrants at that time flooded to, and thought that could have something to do with why the photos showed small enclosed spaces with big families, crowded storefronts in building corners, tall buildings with many windows signaling many apartments, etc. Thanks to this word cloud, I was able to see that most of these photographs were taken in 3 specific neighborhoods, where before I was too busy focused on what each photograph contained. Now with this new outlook on my data, I can attack it in a way that is organized and much easier to manage. In other words, Online Visualizations-1 Excel Sheet-0

screen-shot-2016-10-21-at-7-44-47-pm

Blog Post Analyzing Funds Related to Health, Environment, and Sanitation

This week I decided to exam the LA Controller’s Office‘s dataset of the compilation of funds that the city of Los Angeles disperses to projects related to Health, Environment, and Sanitation. Despite LA’s infamy for its cloud of pollutants that engulf the city, I wanted to see how much the city council allocates to environmental projects.

The dataset includes 37 different “funds” in which the total monies budgeted for environmental protection adds to roughly $390,000,000. While that number may seem large, the city of Los Angeles has nearly 4 million civilians the overall revenue that the city collects through taxes is well over 390 million. The dataset further includes the type of fund, denoted by a three digit fund number, the fund name, the department requiring the fund, and the fund’s purpose.

The LA Controller’s Office’s dataset exemplified the issues discussed in the Wallack and Srinivasan article (Local-Global: Reconciling Mismatched Ontologies in Development Information Systems). The article examined how there exists a disconnect between the purpose for using the monies and the program’s lack of total, intended execution. For instance, nearly $19 million went fund the amenities for Sunshine Canyon; the intended purpose of the funds is to “fund the amenities for the Sunshine Canyon landfill facilities”, and gives little information on how the amenities are used, why the amenities consume a large portion of the budget, and if the funds are actually implementing environmentally safe landfill management. Additionally, there is a $5 million fund allocated to Mobile Source Air Pollution Reduction, with a purpose of “for air pollution reduction projects” but little is told about what those projects are, how effective they will be, or how they will be implemented.

I believe this would make the most sense to the individuals writing the budget reports and the politicians disbursing the city’s money. They are the one’s creating these budgets and the lack of detail allows them to easily justify receiving large portions of funds.

This dataset demonstrates that the money collected by government may not always be used as efficiently as possible, and raises some questions as to why the descriptions are so strangely vague. It also brings into question the efficacy of these projects. Are they (the projects) really solving the problems that they are intended, if not, to what extend are they remedying the environmental issues?

If I were not picking this dataset apart and perhaps an individual from Bakersfield, I would be astonished by the sheer amount of money that the city of Los Angeles sets aside for environmental projects. However, of course, Bakersfield is a lot smaller than Los Angeles and one would need to take into account the relative amount of money in addition the extremity of the environmental issues in order to assess whether or not the city is realistically making an effort to curb its pollution situations.

Payroll by Department Dataset

For this blog post, I decided to investigate the Payroll by Department dataset from the L.A. Controller’s Office website. This is payroll information for all the City Departments of Los Angeles since 2013 which is updated on a quarterly basis. The data types used for this dataset are Department title, Year, Job class, Projected Annual Salary, quarterly payment, payments over base pay, percent over base pay and total payments. A record in this dataset refers to above mentioned payroll information specific to one of the 56 departments included in the dataset.

Wallack and Srinivasan define ontology as “systems of categories and their interrelations by which groups order and manage information about the people, places, things, and events around them.” Based on this information we can see that this dataset’s ontology looks at how total payments differs across different departments and how that is broken down into specific categories such as quarters and payments over base pay. The people that will find this information the most illuminating are individuals interested in how much money in total is used to pay employees for each department in the city of Los Angeles. For example, those who are in charge of the city budget would find this categorization of data very useful. Other individuals that might be interested are those who are trying to compare the difference in pay for different departments across different cities within California or the US.

This dataset does a good job of giving total numbers for the money that goes into paying a city department. For example, when you click on the pie chart, the information you immediately get the department name and the total payments for that department. However, these records don’t let you get into the specifics or even tell you how many payments are totaled in the calculation. For example, the LAPD section shows a total payment of $1,344,118,166.75 but we have no sense of how that number breaks down into a payment for an individual officer.

While this ontology might be very useful for budget planning, it isn’t as useful for those trying to get a sense of what the average total payment per person is in each department. These would be people potentially interested in working for the city and wanting to compare average salaries from different departments. This kind of ontology would include data types such as average projected annual salary, average quarterly payments, and average payment over base pay.

Week 3: Funds Relating to Housing and Homelessness

This week, I will be examining the dataset titled Funds Relating to Housing and Homelessness. The dataset consists of all the funds in the Balance of City Funds that relate to housing and/or homelessness. It provides us with various kinds of information associated with each fund, such as the name, cash amount, source, associated department and contact person. An individual record in this dataset consists of one fund, with other informational categories attached.

The ontology of this dataset, defined by Wallack and Srinivasan as a system of categories and their interrelations by which groups order and manage information about the people, places, things and events around them, is one rooted in an administrative viewpoint of the various funds listed in the dataset. This is evident through the kinds of information that are being collected: information such as the purpose, cash amounts, and whether the fund is within budget will best help administrators to determine how they should allocate funds in the future. These could be current or future administrators, or they could be officials in other cities looking to implement similar programs. Such information could be helpful in planning for similar funds.

This dataset provides me with top-down understanding of the various funds within the city budget that are targeted at housing and/or homelessness. I can see what issues these funds are aimed at, which department is planning it and who is providing the funding for it. For instance, the U.S. Dept of Housing and Urban Development provides funding for 12 of the 39 funds listed in the data set. I can also see that most of the funds fall under the Housing and Community Investment Department for LA City. I am also able to see, through the purpose field, which funds are specifically targeted to alleviate homelessness.

However, one pressing thing left out from this data is the impact that these funds have had on communities, as well as the success and failure stories in the execution of these funds. As some of them were instituted several years ago, it should be possible to discern some impact of this fund on the directly-surrounding community. Another way of looking at this, and another ontology, would be to focus on those who have benefited from these funds and how their lives have been impacted.

Week 3 Post

For this week, I decided to analyze Gender Breakdown of City Workers by Department, which is a dataset that contains information about payroll men and women for a list of jobs. This is to give the readers an objective data on how men and women compare in terms of salaries, and for the readers to analyze the inequalities depending on job description. The question that naturally arises from this data is, “are women really getting paid less, and why”? The record in this dataset is the information collected from each department which contains data about # of Employee, Total Payroll, #Female, #Male, Female/Male Total Salary, and Female/Male Average Salary.

Wallack and Srinivasan identifies ontology as the “system of categories and their interrelations by which groups order and manage information about the people places, things, and events around them.” In other words, ontologies’ duty is to relay information of a reality of a certain phenomenon, which may push communities for a change. This dataset in particular is a meta-ontology, which is a state sponsored data to give an objective information to the public. In this dataset, the ontology is comparing the salary of men and women in order to report the possible income inequality between sexes.

This dataset would be the most useful for equal rights activist, to get raw and objective information on how women and men’s salary differ, and to enact change of this injustice. This data is simple in that the record can be categorized into 3 subgroups, job title, # of men and women employee, and salary difference between men and women. Therefore, by looking at the table, the user can understand exactly which job employs more men or women, and what the salary difference is. The website also allows different visualization of the data, for example, into bar graphs, pies, and treemap, which allows users to digest and compare the information more effectively.

This dataset is great in that we can easily see the difference in salaries depending on gender and the job. However, the dataset is too simplistic in that we do not know exactly how many hours both men and women work. In a society where the stigma of women as housewives still exist, perhaps women work less hours because as working parents, one usually have to take kids to school or pick kids up after school. In our society, women are often assumed to take this role. Thus, perhaps the difference in payroll could be that women are working less hours due to this social stigma. On the other hand, it is possible that men and women work the same number of hours; we would never know unless we have that informations.

From a different person’s point of view, this dataset could be information containing the gender distribution for each job. As each position have varying degrees of men and women worker, the graph shows which job is popular or more geared towards men and women. Questions that could arise from this point of view is why some job has more men or vice versa, and is this through sexism, coincidence, or other reasons.

Week 3-Dataset “Payroll by Department”

I found Payroll by Department (aka All City Departments by Payroll) for Los Angeles in the year 2015 quite interesting. When you first click its icon from the homepage, the dataset is in a chart form mainly indicating proportion of every department’s payment in the total state government payment while if you choose “view it as a table,” “view it as a rich list” or “view it as a single row” you will see the chart is a visualization of a tabular data. A record in this dataset includes department title, year, job class title, projected annual salary, payments by quarter, payments over base pay, percentage over base pay and total payments.

Wallack and Srinivasan define ontologies as “systems of categories and their interrelations by which groups order and manage information about the people, places, things, and events around them.”[1] According to those two scholars ontologies reflect register different elements and their interrelation with each other in groups. In this dataset, government employees’ salaries are first categorized by the different employer departments. And then those subsections are divided into four by quarter. At last, the projected salaries are compared to actual expenditures in one department.

It seems that this dataset is most useful for government sections related to finance and human resource such as “Economic and Workforce Development Department” and “Personnel” to track the payroll. They can detect any quarterly anomaly within one department in 2015 or investigate the discrepancies between salary budget and actual expenditure to make better plans for the future. Because of the bird-eye view this dataset offers, the upper-level government management agencies such as the mayor would also benefit from this dataset to understand how different departments work financially.

The dataset dedicated to express the differences of the payments among departments, reveals to me that the city invested greatly in legal enforcement, basic supplies and fire prevention due to the prominent payments to LAPD, DWP and LAFD. Those departments also pertain to more job class titles and more percentages over base pay. It could be an indication that jobs in those departments were more specialized, demanding or dangerous. Meanwhile, most departments worked most in the third quarter and lest in the last quarter, which suggests the city were very busy in the summer and more relaxed in the winter. The outstanding percentage over base pay in Employee Relations Board also caught my attention. I would like to know more about why this department had so few job class titles but the workers in the department seemingly worked extra hard.

This dataset seems less meaningful for an employee who considers joining in the government service. He or she cannot find out which position pays more in which department or how high his or her salary could be if he or she can get to the top of the department. He cannot even find the median of salaries in one department. Due to the different natures of tasks in different departments, it is hard to compare which job in which department is more rewarding. The existing data also need more interpretation: why did the government spent so much on law enforcement? Compared to the dataset documenting the same ontologies such as the one from last year or the one from New York City of 2015, did LA spend less or more?

If I was to design a new ontology, I would add the percentage of increased payment for every department to show the yearly change. Decision makers may need the information. Also I would list the numbers of employers, the highest and lowest salary in one department to show how the expenditure was distributed in one department to inform those who consider joining in. It may not be a bad idea to merge this tabular data with “Payroll by Position” which offers a more micro-level perspective.

[1] Jessica Seddon Wallack and Ramesh Srinivasan. “Local-Global: Reconciling Mismatched Ontologies in Development Information Systems.” Proceedings of the 42nd Hawaii International Conference on System Sciences – 2009.

What We Buy

I examined the What We Buy dataset which reveals what the city of Los Angeles buys for its dwellers using their taxpayer money. The information is presented in the form of 15 datacards, grouping the dataset into relevant chapters: $12.3 million on 1 AW139 twin-turbine helicopter, $21,929 on 72 pairs of custom fit motorcycle patrol boots, $1,159,775 on leased golf carts, $8,549 on 6,670 soccer balls, $646,533 on 100 Radar Speed Signs, $6,797 on 2,723 basketball nets, $629,218 on 6,492,750 ballots, $4,638,600 on 4,339,676 lbs of thermoplastic marking material, $21,243 Graffiti Buster $530,238 on 5 Toro Groundmaster 5900 Rotary Riding mowers, $13,368 on Federal L.U.S.T. Tax, $1,348,566 on 7,617 fire hoses, $10,654 on 11,988 high visibility white traffic gloves, $161,628 on 30,685 wet mops, and $129,218 on 52,100 frozen rats. The datacards demonstrate that the city spends a lot of tax payer money on recreational sports, policing, traffic systems, medical research, janitorial practices, gardening, petroleum spills, and fire emergencies.

Each datacard then goes into more detail about why the city invests in the object, by answering the following questions: “What’s this?”, “Why do we buy this?”, “Did you know?” In this way, the makers of the LA Control Panel microsite are able to directly communicate with their primary audience: Los Angeles taxpayers and government officials. The questions provide justification for tax money investment decisions by the city government.

From the dataset, the user can see what problems or situations Los Angeles is facing, and the city’s priorities and values. For example, the spending in soccer balls and basketball nets demonstrate that Los Angeles strives to create sports and recreational spaces. The city values building a sense of community through athletics.

There are definitely gaps in the data collection. The data cards are not transparent about which companies and brands these purchases are made from. It would be interesting to see how these objects and materials are distributed throughout the city. The taxpayer demographic is also unclear, besides the fact that they are Los Angeles residents. But how old and which neighborhood?

Wallack’s and Srinivasan’s define a dataset’s ontology as follows: “Communities and states…[such as Los Angeles] represent the realities around them through distinct ontologies, or systems of categories and their interrelations by which groups order and manage information about the people, places, things, and events around them” (p. 1). The “What We Buy” dataset is a way of organizing a dataset into a relevant framework for the intended audience in a way that makes its content accessible.

If I were to start over with the data-collection process, I would be interested in focusing on instances where the city wastes taxpayer money, or makes investments that aren’t relevant to the people’s wishes. I’d juxtapose surveys by Los Angelenos about how they want their tax dollars spent, alongside the expenditure decisions by government officials. Every citizen has a different set of values and priorities for their community. Where is the overlap and how do cities compromise their spending decisions? Are there alternative ways of sourcing these things for the city. Perhaps purchasing used basketballs from the Lakers or local college basketball teams, rather than buying new ones.

screen-shot-2016-10-17-at-12-56-09-pm

L.A. Controller’s Office: Street Grades

This data set within the L.A. Controller’s office compiles Los Angeles County’s street pavement conditions according to a Pavement Condition Index. Each record is a numerical rating of the street conditions, with ranges going from good, fair, to poor. Each record contains a street name, location, and date. The data is arranged on an interactive map, showing green, yellow, and red areas for good, fair, and poor road conditions, respectively. It also highlights the neighborhood councils and council districts partitioning the city and allows a viewer to see the varying road repair plans across time by toggling by year.

This dataset’s ontology organizes data with an aim to understand where and when street conditions have been suffering and where they have been improved. The options to toggle between time periods and view district boundaries implies that those are pieces of information that provide contrast depending on spatial and temporal context. This ontology would benefit any worker under the Bureau of Street Services, which is where this dataset and interactive site originates from. It would be helpful in understanding the terrain of Los Angeles, the current conditions of the roads, and what work has been done in the past to remedy problem areas. It also goes to show what work needs to be done further regarding road conditions in certain areas, as there are certain districts with predominantly green (good) conditions, while others are overwhelmingly red (poor). It can also give insight to where funds are being allocated within those districts, specifically what amount of funds are being invested in street pavement repair.

This dataset gives a lot of insight into what the road conditions are like in Los Angeles, and also accurately shows the road repair plans as well. It communicates this data effectively through visual assets that only strengthen the narrative it sets out to convey. With regards to what’s left out of this dataset, it does seem like the data paints an incomplete picture. Many of the roads in the San Fernando Valley are documented and categorized, but lots of areas are lacking in representation. West LA and mid-city, for example, are sparse in data. Understanding the street conditions form the data presented in those areas would be more difficult and possibly misleading, as one may not be able to reach legitimate conclusions from just this data.

I think an interesting ontology to present this data in would be one that demonstrates some cultural/political information along with the street conditions. Incorporating information about the different council districts and their financial brackets or their budget breakdown would be interesting, so one would be able to make conclusions about the causes of the varying road conditions throughout the LA area.

Data Analysis: Gender Breakdown by Department

Gender Breakdown of City Works by Department documents the percentage of male and female full-time employees in 2015 across the various Departments of Los Angeles, including city planning, fire, and sub-departments of public works, such as engineering and sanitation. The data set also reports the employee count and total payroll per department, the number of males and females in each department, and what percentage of the department are male and female. Additionally, the information also breaks down the male and female total salary within departments, the average salaries of males and females within departments, and the percent of the payroll given to males and given to females.

This dataset was created by the Los Angeles City Controller’s Office. I believe Wallack and Srinivasan would identify this dataset’s ontology as a comparison between employee gender and salary within and between government departments. This data set is very easy to navigate, and theres a tool guide that allows viewers to make data visualizations for even easier juxtaposition and comparison.

screen-shot-2016-10-17-at-10-12-20-am

The line graph above, for example, shows average female salary in navy and average male salary in orange across the various departments. This data is very straightforward: on average, men make more money than women in 37 (out of the 40) departments, with women making more only in the Library, Recreation and Parks, and Public Works – Street Lighting Departments.

On the ground level, grassroots coalitions and social justice organizations, particularly feminist advocacy groups, would find this data very useful. Pulling up these statistics could have a big impact on arguments for women’s rights or affirmative action. Seeing as though Los Angeles is one of the most liberal and diverse major cities in California and in the entirety of the Unites States, one could use these numbers to argue that there are still mass inequalities in the workforce today. At a higher level, this ontology also makes sense for policy makers and those in the City Planning and City Ethics Commission Departments who: (1) (hopefully) want equal and just opportunities for women, and (2) want to appear as though they are working towards equal and just opportunities for women.

While the numbers state the “what” in this gender breakdown, there is no “why” to explain the reasons behind them. In the fire department, for example, 92.8% of the full-time employees were male whereas only 7.2% were female. I assume this disparity has less to do with discrimination and more to do with the fact that less women want to be firefighters. Nevertheless, this could certainly lead to further social science analyses to explain this kind of information that has been left out of the data set.

If I were to start over with data-collection, I would attempt to describe the ontology of higher rates of males in leadership positions than females. In the current data set, in the City Administrative Officer Department, almost 70% of the employees are female, and yet the average female salary is about $34,000.00 less than the average male salary. This is (also hopefully) because males hold most of the leadership/managerial roles than females in this department, and not because males are making more money for the same work. By including columns stating how many males/females in each department hold leadership positions, and how many males/females in each department make over/under $50,000.00, the spreadsheet could produce different narratives based on a different ontology described by the data.

Funds Relating to Housing and Homelessness

I looked at the Funds Relating to Housing and Homelessness dataset. This dataset includes information on the different funds supporting housing initiatives for the homeless population. The data includes financial data breaking down the individual funds into outstanding, receivables and liabilities. It has been organized into clear depictions on where the fund is from, what it is eligible to be used for and how that purpose is broken down.

A record in this dataset consists of the following major categories: Fund, Fund Name, Cash, Department Name, Fund Purpose, Sources of Funds, Eligible Uses, Fund Category, Ending Fund Balance, Assets, Liabilities, Grant Receivable, Other Assets, Current collected Revenue, Cash Disbursements, Outstanding Commitment, Date Fund Established, Fund Group Name and Fund Type Name.

Wallack’s and Srinivasan’s definition of ontology states that it is a system “of categories and their interrelations by which groups order and manage information about the people, places, things and events around them”. This definition is applicable to this dataset because the data establishes a relationship between the donors, the general public and the government systems operating the flow of funds to these housing initiatives. It provides a clear and transparent picture to see how donations and taxpayer dollars are allocated and what those funds are used for.

This ontology makes the most sense for the government and researchers in seeing how funds are broken down and distributed to aid housing for the homeless population. It makes sense because it shows what each individual fund is used for, as well as how much is leftover. This puts the funds in a very logical and transparent order.

This dataset tells me that a lot of different funds have been set up for housing for the homeless. However, this funding is primarily for housing rehabilitation and housing preservation. Not as much funding is allocated to the building of new affordable housing for the homeless.

Details about the completion of homeless and housing projects are left out of this dataset. In addition, there is not much information regarding the livability and effectiveness of this housing for the underserved population. It is one thing to put something on paper, but it is another to see these housing initiatives in action.

If I was starting over with data collection, I would be interviewing the occupants of the refurbished housing. I would ask questions on if the availability of housing has helped get them back on their feet. I would also ask if these housing initiatives are sustainable solutions to their situation. In addition, I would ask if the housing is suitably furnished, built and located to fit basic living needs.