Course blog

Untitled

This week’s readings immediately made me think of Spotify’s music library. Though it is not an index of multiple different types of data, it does hold immense amounts of information. As a Spotify user, I am able to enter into different genres, artists, and playlists easily and efficiently in everyday life. If I remember a song from my childhood, or need to remember the lyrics of a song, I can instantly access that information. Spotify acts as an example of a database that I utilize every single day, however the importance of these electronic databases extends past convenient music on demand.

As a World Arts and Cultures major, databases have become a saving grace for me during finals weeks. Through museums such as the Fowler and Hammer, I am able to access art pieces based on the specific requirements I need. To elaborate upon the concept of databases being used to “help people keep track of things”, these museums’ databases give individuals an understanding of not only the specific dimensions of an artwork, but also an understanding of the culture it comes from and the ideas behind the piece.

In a world that is constantly evolving and reinventing itself, databases ensure that we will not loose the very aspects of society that have come to define us. Historical documents, controversial articles and photographs of protest help our present society to progress, while still considering the steps that were taken in the past to get to the present. .

Just as Spotify helps me to discover new and old music, art databases help me to discover new cultures and new ways of thinking. I am able to learn about the artist, witness the final project, and gain insight into more information about the piece. Both aspects of culture help to support a constantly growing and changing society because they give society an opportunity to preserve the past, without being altered, as changes continue to develop. Essentially, databases can be snapshots of history, each documenting a different aspect of the human experience.

Week 4: Disparate Arrivals, Ellis Island and Slave Journeys

In Companion to Digital Humanities, Stephen Ramsay asserts that: “The design of [databases] has been a mainstay of humanistic endeavor for centuries; the seeds of the modern computerized database being fully evident in the many text-based taxonomies and indexing systems which have been developed since the Middle Ages.” This analysis, coupled with Emory University’s Trans-Atlantic Slave Trade Database got me thinking about the origins of my own family in California, and ultimately the United States.

Back in 2001, I took a trip to New York with my mom and dad on vacation. Unsurprisingly, we visited Ellis Island in hopes of discovering our long-lost relatives names on one of the ship’s manifests. If my memory serves correctly, I was able to find several passengers on my father’s side that emigrated from Ireland during the fin de siècle era.

After exploring the Emory database this weekend, I got curious not so much about the content of registry at Ellis Island, but how this data is displayed online today in comparison with that of the Trans-Atlantic Slave Trade Database. I registered myself as a user at http://www.libertyellisfoundation.org/. Obviously both databases seek to tell the story of demographic historical shifts, however the presentation of information on both reflects the starkly different story told by the journeys reflected in the data. The Ellis Island database projects color, imagery and an aura of excitement through its user interface, while the Emory database limits photos and presents African passengers in a much less human light than the stories highlighted in the former.

Screen Shot 2014-10-26 at 10.32.31 PMScreen Shot 2014-10-26 at 10.57.31 PM

After I signed up, the Ellis Island database asked me to “Become a Member” by contributing just $50.00 per year, and adding a photo in order to “Honor [my] Ancestors, [my] Family and [myself].” The rhetoric of honoring my family and the very personal connection the database attempts to make with its audience contrasts starkly with that of the Emory database.

The very function of searching someone by name, a crucial component of many immigrant’s stories and identities who entered the U.S. through Ellis Island during the early 20th century is not even an option within the Trans-Atlantic Slave Trade Database. In the basic Ellis Island search there isn’t even another input field other than first name, initial or last name! I find it fascinating how today we can access and view information through digital databases that still overtly reflect the historical period or population on which they are based.

Screen Shot 2014-10-26 at 11.17.11 PMScreen Shot 2014-10-26 at 11.18.06 PM

 

Databases at the Mall

When I went shopping this weekend with my sister, I didn’t expect to run into a situation that related to my digital humanities studies. After trying on a dress that was too small, I asked the woman in the store if they had the next size up that I could try. Instead of searching through hundreds of racks to find one particular item of clothing, all she had to do was walk over to the computer, scan the barcode on the dress, and up popped a screen that looked much like image here:

 

deptstore2j

 

The woman located the item on a database and from there, was able to tell me that the store was currently out of stock in that particular size, but it was available to purchase at a few other specific malls close by. When I asked her how much longer the dress was in the larger size, she was able to expand on the item’s details and tell me that exact information.

 

This store, like many department stores, uses a database to track all the items in its inventory. The rows list individual articles of clothing differentiated by an item number, while the columns contain data about the elements of an article of clothing, such as its manufacturer, its department category, its product sub-category, its size, and its availability. Using a fixed vocabulary for this data, it is easy to keep track of the same types of products without them being separated from each other. I believe this serves as an example of the Relational Model database design, where the different data points can relate to each other in interaction, and entire data sets would not be deleted if other items were. The independence of one item from another is established by the primary key, “a unique value associated with each individual record in a table” according to Ramsay. In the store’s database, the Item # serves as the primary key. When the bar code is scanned on a particular item of clothing, the system reads the code and connects it to the product listed in the database by connecting to its primary key.

 

After seeing the use of databases in the clothing store, I noticed other instances used throughout my day. Ordering at a fast food restaurant, scanning my Bruin Card to use the dining hall, my iTunes library, and internship search engines are just a few of the many databases I encounter on a day to day basis.

 

http://www.csharpkey.com/visualcsharp/adonet/forms/deptstore2j.gif

Disaster Prevention Through Database Utilization

The first sentence of David Kroenke’s piece, Database Concepts, provides the reader with a simple yet appropriate definition of what databases do: “help people keep track of things”. Database management systems lie at the core of all databases and are responsible for keeping the database wheels turning. These self-describing “collections of related records” are responsible for holding endless amounts of tables made up of like data that can be used later for analysis or simple reference. The number of instances in which databases can be useful or helpful is endless, and in today’s digital era they are becoming more crucial than ever.

This article I came across (link) illustrated the increasing importance of databases tied to the exponential advances in technology. The recent outbreak of Ebola throughout West Africa has kept the world on its toes, no one sure of where the next positive diagnosis might occur. Health specialists are in the process of developing effective methods to help track the geographical movement of the viruses associated with Ebola in order prepare the areas that are in potential danger. The article stresses the key role that data and metadata collection/interpretation can play in the attempt to thwart the spread of the deadly disease. It specifically references Harvard’s HealthMap service, which gathers and analyzes millions of social media posts in order to track where potential “global disease outbreaks” are occurring in real time. This service uses a massive database to store information collected from around the world and processes all of it to geographically locate where certain key words associated with various diseases are appearing. Relief organizations are aiming to use technology similar to this to be able to anticipate where diseases such as Ebola are heading, in order to be more prepared for immediate aid and support.

The collection of mass amounts of cell phone data, including both phone calls and text messages sent, has been one specific method of disease-related data collection that has provided promising results. Using the databases of cell providers, relief organizations have been able to see the patterns connected to disease outbreaks. Data collation and analysis by the cellular database management systems allows experts to see where high concentrations of emergency calls are being made, helping to pinpoint problem towns and regions. The article highlights how vital the development of large-scale data collection is in the near future, in order to collect and sort through data in a timelier manner. If we are able to utilize databases and the helpful data and metadata they store to their full potential, we will be able to more efficiently track world disasters as well as ensure that all those involved in helping the cause are up to date with what is occurring at all times.

 

Sources:

  1. Ebola Crisis and Big Data – http://recode.net/2014/10/24/the-ebola-crisis-and-where-big-data-can-help/
  2. David M Kroenke and David J Auer, Database Concepts (Upper Saddle
    River, N.J.: Pearson Prentice Hall, 2008), chapter one (link).

 

Infographics and Data Visualization

Many of us are familiar with the easy knowledge acquisition opportunity presented to us through infographics. I know that when I encounter a hefty article online I look to see if there are any visual aids to break up the monotony of reading through all that information or better yet, if there is an image that digests and  summarizes the main point of the article.

Reading through the Data + Design online book, I began realizing just how much work goes into constructing an appealing infographic. In addition to actually making the graphic there is so much thought that goes into how to effectively collect data to display. The book dedicates pages, and even chapters, to describing some of the best modes of data collection for research; researchers have to be careful in the language they choose for questions and the format in which they present the questions and possible answers. Once the information has been collected, there is a long process of data cleaning and prepping (who knew data needed so much attention before going out into the big world!). This cleaning involves sorting through things that are relevant to the research question/goal and figuring out a way to organize it nicely so that it can become the best visualization it can be.

Browsing through this lengthy process gave me so much more respect for the work that anonymous internet people do in making up these easy to use and disseminate informative images. One of my favorite sites that has some really great infographics/visualized data is Brain Pickings. (The creator, Maria Popova, actually got a mention in the Data + Design foreword!) What I love about this site is that Popova does so much research on some very interesting topics, usually on literary or art related topics. In many of her articles she includes images from the books she is discussing, or hand-drawn visualizations of quotes from famous literary folk.

wendycamus

Reading a quote from someone is one thing, but having a way to interact with the author’s ideas in a new way gives it a level of engagement that continues to fascinate and bring me back to their words.

 sontagart

I recently came across an article on the Brain Pickings site about Infographics and some key principles to keep the visualization interesting and trustworthy. This article features designs that compile information on jazz musicians from the 1920s, junk emails, and the London Tube. This goes to show that data visualization does not always have to come in graph form or present information on groundbreaking sociopolitical topics. As with any presentation of data though, there is the possibility that data is incorrect or skewed in some way, so there is always that thought to keep in mind. With that said, however, here are two of my favorite images from this article that organize two very different types of information – statistics about the world’s makeup and sleep habits of famous writers.

bai1

sleepproductivitywriters_500_2

Week 4: Data + Design: Surveys

survey-research-design

A lot of time and work is involved when developing a research project, accumulating the data, and then using that information to ultimately find a sense of clarity regarding the significance behind the entire activity.  The focus of Data + Design appeared to describe the steps involved in a manner that would be easily understood and beneficial to the reader.  While simple, the reading was very thorough and went into detail regarding data visualization, data organization, creating eloquent questions, homing in on the purpose of the research, and much more.

I found a lot of the material to be somewhat of a review since I took a Sociology class that focused on quantitative research and an upper division stats course.  Therefore, I could see how helpful this could be for someone with no knowledge of such information and to now be able to have all of it in one location.  Nonetheless, the review of measurements and research questions was a great refresher and I enjoyed the explanations as to why these subtle aspects of research can be so significant.  On the other hand, I found one area of the reading to be troubling and did not properly describe the subject efficiently.

During the portion of the reading that described the various types of surveys that can be conducted, Ginette Law seemed to critically underplay the effectiveness of Administered Surveys.  Although the author included pros and cons for each of the various forms of surveys, she did not go into any great length in describing them.  Furthermore, she made it appear as if surveys conducted through the internet, over the phone, and other indirect methods were just as viable and effective as Administered Surveys.  Taking into account of the information I was given in my previous classes, I would have to disagree and say that Administered Surveys are one of the best options of acquiring unbiased and diverse data.

For example, if a researcher is looking to obtain a diverse population, the internet would not be a good way to go about it.  Not only would certain ages (very young and very old) be unlikely to participate in the survey/poll, but there is also a high likelihood of being bias.  Say the Fox website has a poll displayed on their website for viewers to participate in, not only would there be an age group neglected but the results would most likely lean towards a right-wing, conservative view.  This is because Fox happens to be a right-wing network and most likely the audience is as well.  Therefore, this would not be a proper method in acquiring data from a diverse population to make any sort of conclusion from.

While this is just one example, the other indirect methods of surveys fall in great risk of biased results as well.  All in all, I do understand that Law was trying to provide the reader with various ways to conduct research, but I feel that obtaining unskewed and accurate results cannot be stressed enough.  It is especially important when the researcher intends to form statements and conclusions from the acquired data.  Overall, I did enjoy the reading and found it to be very informative, but just thought that the survey portion could have been improved upon.

 

 

Sources: https://infoactive.co/data-design/titlepage01.html

Week Four: From Data to Database

Screen Shot 2014-10-26 at 9.33.15 PM

The most applicable and clear example of a database system I can think of from personal experience is UCLA’s Degree Audit Reporting System (DARS). DARS is “a document that evaluates your progress toward meeting UCLA graduation requirements in your major. The system is “a critical tool you will use to select classes and plot your academic course” (admission.ucla.edu). From much first-hand experience, DARS is definitely a well-configured database system that presumably includes all four of the components Kroenke lists within his definition of database systems in Database Concepts; the database, database management system (DBMS), database application, and users.

The database, a “self-describing collection of related records” (13), of DARS contains every course at UCLA.  The specifications are most likely refined into tables labeled something like “Undergraduate General Education Courses”, “Lower Division Major Courses”, “Upper Division Major Courses”, etc. These are the bits of data that are pulled to create the audit report.

The most complicated element of any database system, the database management system is a conglomerate of “related tables” and other configurations of the system. The DBMS is a complex computer program that “receives requested encoded in SQL (Structured Query Language) and translates those requests into actions on the database” (12). I was not surprised to learn that the companies that use database systems almost never write the DBMS programs. They are almost always outsourced to an outside software vendor. Therefore, UCLA most likely did not write its own DBMS program. I looked into finding out what company the university used to create DARS’s DBMS, but was unsuccessful.

Next, the application program has three functions within itself. First, it creates and processes forms. Next, the application program processes user queries – meaning it responds to a user who needs to find a piece of information. Lastly, the program formats the found results of the user’s request as a report (16). This process of the application program is very clear-cut in regards to DARS. As a student user, I inquire about my current progress with my courses. I click a few options, including my expected graduation date, major, and minor and nearly immediately am presented with a formatted report. It is clear that the application program is calling upon the related tables within the DBMS to determine what I have and have not completed thus far in my enrollment at UCLA.

Lastly, as the user, I am the final component of this database system. What is the point in making such a complex system? It seems so simple as I enter a few requisites that I sometimes take for granted how calculated and detailed DARS really is. Sure, one could make a list of all the courses at UCLA and simply check off which of the ones I have completed. However, in order to supply me with correct information, DARS employs much more contingent data, i.e. the courses I need to complete my major, minor, etc. Kroenke concludes his overview of database systems by explaining why we have database systems anyway, “The purpose of a database is to help people keep track of things. Lists can be used for this purpose, but it a list involved more than one topic, problems occur when data are inserted, updated, or deleted” (19). I cannot imagine how difficult it would be to keep track of my progress (including inserting completed courses, updating my minor, etc.) without the advent of a database like DARS.

Everyday Databases

blog post 3

Last week, I was sitting in my apartment talking to my roommates about future jobs and why our friends who graduated last year all migrated north to San Francisco. We brought up one of our friends, Danny, who recently got a job with Google. We were looking through some questions Google asks in their interviews, and one was: “How would you explain a database in three sentences to your eight year old nephew?” An answer we found online and really enjoyed was:

“A database is a machine that remembers lots of information about lots of things. People use them to help remember that information. Go play outside.”

The point of this question is to see if the applicant can take a complex idea and translate it in a simple, dumbed-down language. While reading through “Databases,” by Stephen Ramsay, I realized it might actually take a lot more than three sentences to accurately portray the importance of databases. Ramsay defines the purpose of databases “to store information about a particular domain,” and having the capability for one to “ask questions about the state of the domain.” The Relational Model, Ramsay notes, finds a relationship between individual data points, opposed to just storing these sets of data. Under the header ‘database design,’ Ramsay uses American novels as his the subject for his fictitious database. With the use of primary and foreign keys, links are formed between the various data points, which point the user toward the desirable output.

I was scrolling through my iTunes this morning and noticed how it acted as a database for all my music, and how it could be categorized in various ways: song title, artist, genre, etc. Shortly after, I headed over to Trader Joes to pick up some groceries and realized as they scanned every item, the price is being looked up in a database that’s based on the Universal Product Code. The UPC refers to the usage of barcodes that stores use in order to track items in the store. I learned every time I make a phone call, the caller ID information has to be retrieved from some sort of database. Even most of our cars have a little database inside that makes the light come on when it’s time to ‘Check Engine.’  These databases make our society function, and it’s hard to imagine how everyday life would run without their assistance.

Works Cited:

Business Insider

Stephen Ramsay: Databases

iTunes data

momouse
my library

The above image is a screenshot of my iTunes library. While the the first two chapters of Kroenke’s book DataBase included many examples for each new vocabulary word, the iTunes library is a little easier for me to relate to.   The rows contain data about a song entry. The columns contain data about attributes of the entity such as who is the artist, how long the track is or what genre it belongs to. When I download music sometimes there are null values. Some musicians, like Beck, cannot fit into any particular genre because their body of work is so diverse and crosses many types of genres. This sometimes complicates things because it makes his work harder to find and classify. As of now, iTunes still does not have the ability to list something under two genres and I’m sure music enthusiasts and DJ’s could really appreciate such a new feature. The first chapter explains how tables can be more useful than lists because information can be lost if it is deleted from a list where everything is linked, instead of a few tables that are related but can be reconfigured and still maintain data that is important. The text Databases by Stephen Ramsay states that “humanist inquiry reveals itself as an activity fundamentally dependent upon the location of pattern” Dealing with pattern necessarily implies the cultivation of certain habits of seeing; as one critic has averred ‘Recognizing a pattern implies remaining open to gatherings, groupings, clusters, repetitions, and responding to the internal and external relations they set up’.” With this being said, I feel like iTunes has already recognized many patterns but could always make room to improve.

ITunes is a relational database table because you can look at data from different perspectives and things relate in a certain way. For example, if the Rolling Stones are listed under “Rock n’ Roll” and you want to change the genre to “Rock,” you can do it in such a way that does not delete all the other information. My making sure you use a restricted or set vocabulary you can make sure it will be easier to find different types of music. Because much of the vocabulary was so new to me I am not sure what the SQL (Structured Querey Language) of iTunes might be and am I not sure what the normalization process for it might be…. But I think the primary key could be the album because then it means that genre and artist are linked to it and have to agree to be part of the same album, but I am not completely sure. If anyone has any ideas, let me know in your comments on what you think the Keys in iTunes might be.

Databases and the Study of Stuff

http://www.abebooks.com/Maadi-Vol-Predynastic-Cemeteries-Wadi-Digla/9410760608/bd

  • Stephen Ramsay,  “Databases,” in Companion to Digital Humanities, edited by Susan Schreibman, Ray Siemens, and John Unsworth (Oxford: Blackwell Publishing Professional, 2004)

Archaeology is essentially the study of stuff – material culture remains, or artifacts, are studied in various ways to extrapolate information about a wider extinct society. Certain case studies in archaeology are incredibly well suited to being organized and then further examined with the use of a database.

Stephan Ramsey defines  the purpose of a database (especially in the relational sense): “to store information about a particular domain, and to allow one to ask questions about the state of that domain.” He emphasizes that the particular usefulness of the Relational Model in database design is in the language; instead of simply storing large amounts of data, the Relational Model allows interaction between the individual data points. The example he uses is a database of American Novels, and he demonstrates how primary and foreign keys can provide links between the data points. Almost like a game of bingo, the primary and foreign keys allow one to relate information across categories: for example, assuming the bingo call is B7, B would be the primary key of “Author,” and 7 the foreign key “name of work,”  thus a search for B7 would tell us that Mark Twain had written Tom Sawyer.

This type of relational interaction of data can be extremely useful in the study of settlement layout and function, or even for mortuary archaeology. Imagine you have uncovered a grave yard with over individual 100 burials (which is actually a very modest data set). Within each burial you have specific data points such as sex of the deceased, approximate age, health, location of the grave, contents (did the person have burial goods? If so what, how many of each type, etc?). By inputting all of the information into a Relational Model database, the investigator can begin to draw comparisons between relative wealth or status (quantity/quality of burial goods) and the age or sex of the individual. A pattern in these types of correlations can begin to elucidate the mechanisms of social hierarchy and status within a society, whether status is achieved or inherited (finding infant graves with a lot of wealth is a great example of inherited status), how the society works in terms of gender roles, etc.

These databases can also produce a picture of the larger society by relating the location of artifact finds in a settlement site to their function. For instance, if a database search demonstrates that there was a high occurrence of food waste materials in a certain location, it may have been a cooking area. This can then be cross-referenced with the location of any ovens or firepits at the site to further the argument.