Blog Post 3

1. Describe your dataset’s ontology
a. Looking through our class blog post, I seemed to go with a popular option by picking the “listing of active businesses” dataset. The dataset is owned and updated monthly by the Office of Finance. The site describes the dataset as a “listing of all active businesses currently registered with the Office of Finance.” And goes on to disclose that an “active business is defined as a registered business whose owner has not notified the Office of Finance of a cease of business operations.” In detail, the dataset consists of 497,000 rows and 16 columns. Each column is a different piece of information about the listed businesses. Column names are: Location Account #, Business Name, DBA (doing business as) Name, Street Address, City, Zip Code, Location Description, Mailing Address, Mailing City, Mailing Zip Code, NAICS (north American industry classification system) code, Primary NAICS description, Council District, Location Start Date, Location End Date, Location. The dataset also allows for the data to be organized into various types of graphs, charts, and even calenders. I think it’s safe to say there is plenty of data contained in this site.
2. From whose point of view does this ontology make the most sense?
a. The first thing I thought of while studying this dataset is yelp. I’m an avid yelper myself, and momentarily mistook this dataset as yelp’s equivalent. That was wrong. Though this dataset does allow the user to gather a fair deal of information about each business, it’s data that the average yelp user wouldn’t care to know. Therefore, I’m lead to believe that this dataset is used for formalities of some sort. For example, this information could be of good use to the police/government. Things like the NAICS code and the Council District point to this. Potential investors would likely also find this information useful.
3. What can this dataset tell you about the phenomenon it claims to describe?
a. Overall, the dataset is a description of the LA’s economy – generally speaking of course. If there are nearly 500K active businesses in LA then I think we’re doing alright. However, this question also points to what’s missing in the dataset: relevant information to everyday civilians.
4. What gets left out?
a. As mentioned above, I want to know all about the active businesses of LA – but I’m not talking about their NAICS codes and mailing addresses. Instead, I want to know what their peak business hours are, what their most popular products are, etc.
5. Imagine you’re starting over with data-collection and describe a completely different ontology, from someone else’s point of view.
a. To put it simply – if I were collecting this data, my 16 columns would be completely different. They would include hours of operation, type of products sold, average age of shoppers, things like that.

3 comments

  1. I very much agree with the changes you propose to this dataset. I think including things like hours of operation and type of products sold would be more conducive to the average shopper. That way they can make rational and more culturally astute decisions rather than extraneous information they don’t really need to know all that much about.

  2. I think that approaching the data from the perspective you propose would offer interesting insights into the each business as a whole. I imagine the dataset would them prove useful for marketing purposes. They could track the kind of shoppers that trend certain kinds of stores!

  3. Hi,
    I agreed that the dataset will be holistic if including information that are actually useful for users such as consumers. However, it is a government collected and issued dataset, so the main purpose of it is for bureaucratic functionalities. The limitation resulted from this main goal is inevitable. It will be a good idea and direction to build a similar dataset to serve the general public’s usage!

Leave a Reply