This week, I will be examining the dataset “Listing of Active Businesses” from the City of L.A. database. Like the name suggests, it describes the businesses that are currently still operating in the city of Los Angeles. The dataset was last updated a week ago, and is typically updated once a month.
The “Listing of Active Businesses” is organized into 16 different data types, such as: Business Name, DBA Name, Mailing Address, Mailing Zip Code, North American Industry Classification System (NAICS), Primary NAICS Description, Council District, Location, and more. There are 496904 rows of data, which is pretty comprehensive. The dataset elucidates the following: where the business is located, what the business does, and when did the business start.
This information is useful to anyone: consumers can find out and identify which businesses are close by, and the services they offer by looking at the mailing city, zip code or the longitude and latitude under “Location”. Filtering through these metrics, one can easily triangulate the location of the business and check it out. Moreover, businesses or new entrants can make use of this dataset to identify competition within the region by filtering the business they are going into through the “Primary NAICS Description” data type, as well as the city they are operating in.
However, I think given the sheer size and the amount of information, it can be divided into further subcategories. Specifically, subcategories can be introduced to refine the nature of the business, on top of the “Primary NAICS Description”. For example, for “Independent artists, writers, and performers”, it would be useful to specify what specific services are offered by these artists, which could help potential consumers to refine their search and identify the businesses they wish to visit. Moreover, in today’s digital age, it would also be helpful to include the email address or telephone number of these businesses.
For businesses looking to enter the industry, another helpful metric would be the size of business. Currently, there is no way of finding out if the business is a small, medium or a large one. Including data such as market size or market capitalization could be one way of elucidating that information.
If I were to change the ontology of this dataset, I would include the aforementioned information, i.e. subcategory of the nature of business, telephone number, and market size, as well as the business’ website or facebook and Yelp profile, so that viewers can get a better understanding of the services offered, as well as the quality of service offered.
Given that I did the same dataset, I liked seeing the difference between our analyzations of the ontology of it. I went in more of the direction of turning the site into Yelp – which maybe wasn’t necessarily fair. I liked that you suggested linking their yelp page instead.
Hi Tenn Shaun,
Great job on this week’s blog post! I can tell you spent a lot of time coming up with your good insights! I completely agree that the amount of data was a lot but can definitely be simplified. I especially liked your idea of linking their webiste or facebook page or yelp page onto the dataset for a more comprehensive look. Perhaps it can even have the number of Yelp starts, discounted with number of reviews, and create a cool visual from that!
Superb job!
Wanda
Hi-
I love how detailed this article is. I worked on the same dataset and do agree that more subcategories are needed in order to optimize the dataset. For instance, I think adding email addresses into the dataset is a great idea as most of the business entities are trying to go paperless. It will be convenient for the Office of Finance to keep contact with the businesses in an environmentally-friendly way.
Hi,
It was really interesting that you said that further aforementioned information would make this database more well rounded and I agree. I also think that including some statistics on the employees or some information on how long this business has been active for. I analyzed the same database and it was really interesting to read your analyzation of it!
Hi,
Great analysis of the dataset! I like how you mentioned linking the business’ website as well as their Yelp page for the consumer’s convenience. However, do you think this will skew the dataset in any way if you bring in non-objective information like customer’s personal reviews?