Week 4: Data + Design: Surveys

survey-research-design

A lot of time and work is involved when developing a research project, accumulating the data, and then using that information to ultimately find a sense of clarity regarding the significance behind the entire activity.  The focus of Data + Design appeared to describe the steps involved in a manner that would be easily understood and beneficial to the reader.  While simple, the reading was very thorough and went into detail regarding data visualization, data organization, creating eloquent questions, homing in on the purpose of the research, and much more.

I found a lot of the material to be somewhat of a review since I took a Sociology class that focused on quantitative research and an upper division stats course.  Therefore, I could see how helpful this could be for someone with no knowledge of such information and to now be able to have all of it in one location.  Nonetheless, the review of measurements and research questions was a great refresher and I enjoyed the explanations as to why these subtle aspects of research can be so significant.  On the other hand, I found one area of the reading to be troubling and did not properly describe the subject efficiently.

During the portion of the reading that described the various types of surveys that can be conducted, Ginette Law seemed to critically underplay the effectiveness of Administered Surveys.  Although the author included pros and cons for each of the various forms of surveys, she did not go into any great length in describing them.  Furthermore, she made it appear as if surveys conducted through the internet, over the phone, and other indirect methods were just as viable and effective as Administered Surveys.  Taking into account of the information I was given in my previous classes, I would have to disagree and say that Administered Surveys are one of the best options of acquiring unbiased and diverse data.

For example, if a researcher is looking to obtain a diverse population, the internet would not be a good way to go about it.  Not only would certain ages (very young and very old) be unlikely to participate in the survey/poll, but there is also a high likelihood of being bias.  Say the Fox website has a poll displayed on their website for viewers to participate in, not only would there be an age group neglected but the results would most likely lean towards a right-wing, conservative view.  This is because Fox happens to be a right-wing network and most likely the audience is as well.  Therefore, this would not be a proper method in acquiring data from a diverse population to make any sort of conclusion from.

While this is just one example, the other indirect methods of surveys fall in great risk of biased results as well.  All in all, I do understand that Law was trying to provide the reader with various ways to conduct research, but I feel that obtaining unskewed and accurate results cannot be stressed enough.  It is especially important when the researcher intends to form statements and conclusions from the acquired data.  Overall, I did enjoy the reading and found it to be very informative, but just thought that the survey portion could have been improved upon.

 

 

Sources: https://infoactive.co/data-design/titlepage01.html

Week Four: From Data to Database

Screen Shot 2014-10-26 at 9.33.15 PM

The most applicable and clear example of a database system I can think of from personal experience is UCLA’s Degree Audit Reporting System (DARS). DARS is “a document that evaluates your progress toward meeting UCLA graduation requirements in your major. The system is “a critical tool you will use to select classes and plot your academic course” (admission.ucla.edu). From much first-hand experience, DARS is definitely a well-configured database system that presumably includes all four of the components Kroenke lists within his definition of database systems in Database Concepts; the database, database management system (DBMS), database application, and users.

The database, a “self-describing collection of related records” (13), of DARS contains every course at UCLA.  The specifications are most likely refined into tables labeled something like “Undergraduate General Education Courses”, “Lower Division Major Courses”, “Upper Division Major Courses”, etc. These are the bits of data that are pulled to create the audit report.

The most complicated element of any database system, the database management system is a conglomerate of “related tables” and other configurations of the system. The DBMS is a complex computer program that “receives requested encoded in SQL (Structured Query Language) and translates those requests into actions on the database” (12). I was not surprised to learn that the companies that use database systems almost never write the DBMS programs. They are almost always outsourced to an outside software vendor. Therefore, UCLA most likely did not write its own DBMS program. I looked into finding out what company the university used to create DARS’s DBMS, but was unsuccessful.

Next, the application program has three functions within itself. First, it creates and processes forms. Next, the application program processes user queries – meaning it responds to a user who needs to find a piece of information. Lastly, the program formats the found results of the user’s request as a report (16). This process of the application program is very clear-cut in regards to DARS. As a student user, I inquire about my current progress with my courses. I click a few options, including my expected graduation date, major, and minor and nearly immediately am presented with a formatted report. It is clear that the application program is calling upon the related tables within the DBMS to determine what I have and have not completed thus far in my enrollment at UCLA.

Lastly, as the user, I am the final component of this database system. What is the point in making such a complex system? It seems so simple as I enter a few requisites that I sometimes take for granted how calculated and detailed DARS really is. Sure, one could make a list of all the courses at UCLA and simply check off which of the ones I have completed. However, in order to supply me with correct information, DARS employs much more contingent data, i.e. the courses I need to complete my major, minor, etc. Kroenke concludes his overview of database systems by explaining why we have database systems anyway, “The purpose of a database is to help people keep track of things. Lists can be used for this purpose, but it a list involved more than one topic, problems occur when data are inserted, updated, or deleted” (19). I cannot imagine how difficult it would be to keep track of my progress (including inserting completed courses, updating my minor, etc.) without the advent of a database like DARS.

Everyday Databases

blog post 3

Last week, I was sitting in my apartment talking to my roommates about future jobs and why our friends who graduated last year all migrated north to San Francisco. We brought up one of our friends, Danny, who recently got a job with Google. We were looking through some questions Google asks in their interviews, and one was: “How would you explain a database in three sentences to your eight year old nephew?” An answer we found online and really enjoyed was:

“A database is a machine that remembers lots of information about lots of things. People use them to help remember that information. Go play outside.”

The point of this question is to see if the applicant can take a complex idea and translate it in a simple, dumbed-down language. While reading through “Databases,” by Stephen Ramsay, I realized it might actually take a lot more than three sentences to accurately portray the importance of databases. Ramsay defines the purpose of databases “to store information about a particular domain,” and having the capability for one to “ask questions about the state of the domain.” The Relational Model, Ramsay notes, finds a relationship between individual data points, opposed to just storing these sets of data. Under the header ‘database design,’ Ramsay uses American novels as his the subject for his fictitious database. With the use of primary and foreign keys, links are formed between the various data points, which point the user toward the desirable output.

I was scrolling through my iTunes this morning and noticed how it acted as a database for all my music, and how it could be categorized in various ways: song title, artist, genre, etc. Shortly after, I headed over to Trader Joes to pick up some groceries and realized as they scanned every item, the price is being looked up in a database that’s based on the Universal Product Code. The UPC refers to the usage of barcodes that stores use in order to track items in the store. I learned every time I make a phone call, the caller ID information has to be retrieved from some sort of database. Even most of our cars have a little database inside that makes the light come on when it’s time to ‘Check Engine.’  These databases make our society function, and it’s hard to imagine how everyday life would run without their assistance.

Works Cited:

Business Insider

Stephen Ramsay: Databases

iTunes data

momouse
my library

The above image is a screenshot of my iTunes library. While the the first two chapters of Kroenke’s book DataBase included many examples for each new vocabulary word, the iTunes library is a little easier for me to relate to.   The rows contain data about a song entry. The columns contain data about attributes of the entity such as who is the artist, how long the track is or what genre it belongs to. When I download music sometimes there are null values. Some musicians, like Beck, cannot fit into any particular genre because their body of work is so diverse and crosses many types of genres. This sometimes complicates things because it makes his work harder to find and classify. As of now, iTunes still does not have the ability to list something under two genres and I’m sure music enthusiasts and DJ’s could really appreciate such a new feature. The first chapter explains how tables can be more useful than lists because information can be lost if it is deleted from a list where everything is linked, instead of a few tables that are related but can be reconfigured and still maintain data that is important. The text Databases by Stephen Ramsay states that “humanist inquiry reveals itself as an activity fundamentally dependent upon the location of pattern” Dealing with pattern necessarily implies the cultivation of certain habits of seeing; as one critic has averred ‘Recognizing a pattern implies remaining open to gatherings, groupings, clusters, repetitions, and responding to the internal and external relations they set up’.” With this being said, I feel like iTunes has already recognized many patterns but could always make room to improve.

ITunes is a relational database table because you can look at data from different perspectives and things relate in a certain way. For example, if the Rolling Stones are listed under “Rock n’ Roll” and you want to change the genre to “Rock,” you can do it in such a way that does not delete all the other information. My making sure you use a restricted or set vocabulary you can make sure it will be easier to find different types of music. Because much of the vocabulary was so new to me I am not sure what the SQL (Structured Querey Language) of iTunes might be and am I not sure what the normalization process for it might be…. But I think the primary key could be the album because then it means that genre and artist are linked to it and have to agree to be part of the same album, but I am not completely sure. If anyone has any ideas, let me know in your comments on what you think the Keys in iTunes might be.

Databases and the Study of Stuff

http://www.abebooks.com/Maadi-Vol-Predynastic-Cemeteries-Wadi-Digla/9410760608/bd

  • Stephen Ramsay,  “Databases,” in Companion to Digital Humanities, edited by Susan Schreibman, Ray Siemens, and John Unsworth (Oxford: Blackwell Publishing Professional, 2004)

Archaeology is essentially the study of stuff – material culture remains, or artifacts, are studied in various ways to extrapolate information about a wider extinct society. Certain case studies in archaeology are incredibly well suited to being organized and then further examined with the use of a database.

Stephan Ramsey defines  the purpose of a database (especially in the relational sense): “to store information about a particular domain, and to allow one to ask questions about the state of that domain.” He emphasizes that the particular usefulness of the Relational Model in database design is in the language; instead of simply storing large amounts of data, the Relational Model allows interaction between the individual data points. The example he uses is a database of American Novels, and he demonstrates how primary and foreign keys can provide links between the data points. Almost like a game of bingo, the primary and foreign keys allow one to relate information across categories: for example, assuming the bingo call is B7, B would be the primary key of “Author,” and 7 the foreign key “name of work,”  thus a search for B7 would tell us that Mark Twain had written Tom Sawyer.

This type of relational interaction of data can be extremely useful in the study of settlement layout and function, or even for mortuary archaeology. Imagine you have uncovered a grave yard with over individual 100 burials (which is actually a very modest data set). Within each burial you have specific data points such as sex of the deceased, approximate age, health, location of the grave, contents (did the person have burial goods? If so what, how many of each type, etc?). By inputting all of the information into a Relational Model database, the investigator can begin to draw comparisons between relative wealth or status (quantity/quality of burial goods) and the age or sex of the individual. A pattern in these types of correlations can begin to elucidate the mechanisms of social hierarchy and status within a society, whether status is achieved or inherited (finding infant graves with a lot of wealth is a great example of inherited status), how the society works in terms of gender roles, etc.

These databases can also produce a picture of the larger society by relating the location of artifact finds in a settlement site to their function. For instance, if a database search demonstrates that there was a high occurrence of food waste materials in a certain location, it may have been a cooking area. This can then be cross-referenced with the location of any ovens or firepits at the site to further the argument.

Week 4: Databases-The Push of a Button

This morning, I turned on my laptop, went to my folder labeled “Fall 2014,” clicked on another folder, “Astronomy 3” and finally opened a word document file named “Ast 3 Lec 10-24-14” to reviews my notes from the last Astronomy 3 lecture that I went to. It’s so easy for the average person to create a database, no matter how small, and store their files on their computers. The internet makes this even easier by letting us post texts, pictures, videos, etc. on an open space where others can view them with the click of their mouse.

Databases are helpful in many ways, especially when it comes to immortalizing historical texts. In Computer Databases and Aboriginal Knowledge, Michael Christie talks about the aboriginal population Larrakia from Darwin, The Northwestern Territory in Australia and how some of their women want to put their elder’s knowledge onto a database so that the younger generations can have that information even after the elders pass. With youth now constantly on their devices whether it be a phone, laptop, or tablet, they are constantly being fed information through websites like the Yahoo home page, social networks, and plain research so it makes perfect sense to put the Aboriginal elders’ knowledge on databases that the younger generations can access.

Databases are essentially virtual archives which can be derived from the Greek word “Arkhe” and defined as the “commencement” and the “commandment” as described in Jacques Derrida’s Archive Fever: A Freudian Impression.  As the commencement, archives describe nature and history as the origin where things commence. As the commandment, they show how men command because archives are a man-made creation. When looking at archives with these two principles, we can truly appreciate their importance and creation. By applying this to the Aboriginals from Darwin, we can see the significance in inputting their elder’s knowledge into a database.

The Aboriginal’s can create a database for their children and grandchildren and they’ve done their job when it comes to bringing them the information. The old saying “You can lead a horse to water but you can’t make him drink” comes into play here. If the children do not show interest in using the database to learn about their history and culture, then that is their choice. The real magic happens when someone reads the information and uses it or tells someone else about it. Text is nothing more than text until someone reads it. At that point, it becomes knowledge that someone can use by teaching and using that information. Databases are great because they offer us information that we can tell others about.

Works Cited

http://www.cdu.edu.au/centres/ik/pdf/CompDatAbKnow.pdf

http://books.google.com/books?hl=en&lr=&id=6KNJmNkE11UC&oi=fnd&pg=PA4&dq=archive+fever&ots=lrKZ2mSmXe&sig=CIde5g-wdIhhSFQIxgXcj9aZzWE#v=onepage&q=archive%20fever&f=false

Week 4: Wardrobes and Relational Databases

image

This chart is an example wardrobe plan from the book New Image for Men: Color and Wardrobe by Marge Swenson and Gerrie Pinckney (published in 1983). It shows all of the pieces of an imaginary wardrobe and puts them into categories according to level of formality and type of clothing. There are pieces for business, dress, and casual wear, and they include suits, sport coats, shirts, pants, ties, jewelry, belts, shoes, socks, sweaters, and jackets/coats. Each piece has attributes such as color, pattern, and material. Also, there are a number of pieces for each type of clothing at each level of formality, such as five shirts and three ties that match a suit for dress wear. This plan results in a flexible, efficient wardrobe that it is easy to make outfits with and avoids extraneous or redundant pieces that clutter up your closet.

Stephen Ramsay discusses relational databases in the chapter “Databases,” which are based on the idea that a database can be “a set of relations.” If all of the outfits that you can put together comprises a database, a wardrobe plan is analogous to a database design. A simple, old-fashioned tabular database would mean that each piece in an outfit is only used for that outfit. If you had 18 outfits that included black Louboutin pumps, you would actually have 18 pairs instead of one, which is an improbable situation. A relational database describes the reality of wardrobes much better, since a single pair of shoes can be used in many outfits (what Ramsay calls a one-to-many or 1:M relationship), thereby minimizing redundancy. In a relational database, each outfit would be a record or entity with its own primary key, and the black pumps and other pieces in the various categories (tables) would be referred to via foreign keys that can be reused in other records. Furthermore, the ways that pieces are mixed and matched, indicated here by horizontal lines that separate the levels of formality, would be described by entity relationship diagrams. However, like Ramsay notes in regard to real-world data, actual wardrobes are more complex than this idealized wardrobe plan.

Just like databases, a person’s wardrobe reveals things about him or her. The particular items of clothing they buy and the way that they make outfits can give clues about a person’s tangible and intangible characteristics such as body type, their “color season,” personality, age, occupation, socioeconomic background, etc. Likewise, what data goes into a database and what is left out, and how the database is designed, reveals the ideology of the people who made it.

Week 4- Incan Databases

Reading Stephen Ramsay’s article Databases and also the Data + Design book really got me thinking about the way that data is visualized.  In both readings, the database, or specifically the computerized database, is described to be a complex system in which to store and sort information. Specifically, Ramsay describes the digital humanities database as a series of relationships.  He describes these relationships as being able to “hold out the possibility not merely of an increased ability to store and retrieve information, but of an increased critical and methodological self-awareness.”  This got me thinking about different origins for non-digital databases, what kind of relationships were they created to represent?

Inca_Quipu

The quipu (alternate spelling Khipu) is an artifact of the Incan empire (1400-1532 AD).  Quipus were used by the Incas to record information. As the Incas did not utilize a written language or numerical system, quipus were used both to document numerical information, historic myths, and imperial decrees.  Quipus consisted of several long strings.  Each string would hold its own pattern, spacing, and style of knots representing the recorded information.  Although, full knowledge of the Quipu system is lost on the modern western world, it is known from contemporary accounts that Quipus were used for highly complex tasks, not unlike modern databases.

This got me thinking about different, or perhaps non-western, ideas for organizing the database.  In the Incan context, the quipu relied heavily on the knowledge of the “reader” and also heavily on the notion of relationships.  From what little is known about the Quipu, it is clear that information is not recorded in a direct manner.  A specific kind of knot does not correspond directly to a letter or a word, it is highly contextual and is perhaps intended as a type of nemonic device for the reader.  This to me, seemed exactly what Ramsay was referring to when he said that, for the digital humanist, the real purpose of the database lies in the relations produced.  Moreover, the physical structure of the quipu brings up questions of data presentation.

Screen Shot 2014-10-26 at 2.19.52 PM

Moreover, I thought that it was an interesting aside that Harvard is now creating its own database about Quipus.  The database will function to record all of the data presented on existing Quipus today.  Even cooler is the fact that this data base has mirrored its data scheme on the Quipu calling it the “khipu data scheme.”  The website for the project explains the data structuring as a “branching network in which the number of branching levels is highly variable, but in which components at every level share certain characteristics.” Moreover, the computer database will look at interpreting the physical nature of the Quipu focusing on: “the interlocking relationships between khipu components, the branching or tree-like structure of khipu, the similarity of certain components, and the multi-dimensionality of khipu variables.”  I thought this was a fascinating instance of mediation and also of episteme! The quipu uses its own unique system to address how it structures and presents information.  The fact that this system, while seemingly foreign, so easy coordinates into a computer database is fascinating to me.  Perhaps this speaks to a universality of databases?  I am intrigued and curious if anyone else has instances of early databases!

 

 

An addendum to “Classics and the Computer: An End of the History”

 

 

 

These images are examples of a roll and a codex. The ancients transcribed their written works in scrolls made out of papyrus. Eventually, codices made out of parchment were used to transcribe classical texts. Writing on scrolls were difficult in terms of space. They were also inconvenient as references, since they had to be completely rolled open. They were subject to fast deterioration as well. Therefore, authorships were lost through literary corruptions, deterioration, and misplacement of scrolls. When codices were developed, perhaps to address these problems, it introduced new ways of organizing written work and new ways of reading. The transition from books to codices, like tapes to CDs, introduced a much efficient way to circulate knowledge. Greg Crane notes in “a Companion to Digital Humanities” that “The adoption of electronic methods thus reflects a very old impulse within the field of classics.” Classicists have an obsession for truth prompted no less by the loss of great works through deterioration and manuscript corruptions. The transition from scrolls were not without consequences. Certain authors and works were not transcribed into codices and were lost. Crane also notes, “Many non-classicists from academia and beyond still express surprise that classicists have been aggressively integrating computerized tools into their field for a generation.” Perhaps this is to address that the transition to a digital media is complete and that no work is lost? Computers spurred a new way of circulating knowledge reminiscent to codices. My Classics professors, for instance, use digital dictionaries and grammar books for reference. In some ways, how we read now are much authoritative in comparison to the ancient Romans themselves. A number of reasons, such as education being limited only to the upper class and the limitations of the papyrus, have limited the understanding of certain works to only a few readers. Classicists now use digital tools to easily navigate through these works, to learn ancient languages, and to inspire new questions by looking at these texts from a different perspective in a way allowed by computers.

 

This is a lemmatization of a Latin piece. Such visualization allows readers to gain not only a better understanding of the piece, but also to gain new insights and questions. Crane ends with this note: “Our history now lies with the larger story of computing and academia in the twenty-first century.” Perhaps Classicists today are not just learning digital tools to simply increase the chances of their employability, but are simply part of a new transition.

 

Citations:

A Companion to Digital Humanities, ed. Susan Schreibman, Ray Siemens, John Unsworth. Oxford: Blackwell, 2004.

http://www.digitalhumanities.org/companion/

From Scroll to Codex. http://courses.educ.ubc.ca/etec540/July03/batchelorj/researchtopic/. Image. Web. 25 October 14

Disambiguation and Lemmatisation of Automatically Computed Texts. http://wiki.hudesktop.hucompute.org/index.php/Lemmatisation/Disambiguation. 14 October 2014. Image. Web. 25 October 14

Week 4 – 12 Graphs and Charts that Perfectly Illustrate What It’s Like Trying to Get Ready in the Morning

enhanced-7340-1400009601-5

This week I was drawn to the book, Data and Design, by Trina Chiasson, Dyanna Gregory and many other contributors. I flipped through a lot of the pages of this book and was impressed not only by the helpful information that was presented, but also by the pleasing visual display. Immediately, I saw a lot of similar terms and ideas as I had seen in my Stats 10 book. I had just taken my first midterm for that class this week so all the different definitions of qualitative and quantitative data were fresh in my mind. I could personally attest to the premise of this book that data visualization and data in general can be overwhelming and confusing for “non-math” people. My brain is math oriented, but I had never taken a stats class before and I haven’t taken any sort of math class in over a year, so I was a little rusty. Both my stats book and Data and Design have helped me to see how certain types of graphs better display different types of data. For example, a pie chart is better for categorical data and a histogram is a more appropriate data visualization for numerical data.

All this talk about graphs and data visualization reminded me of some of the fun Buzzfeed graphs I had come across when I was procrastinating one day. Typically, we think of graphs and charts as boring and inapplicable to daily life, but these graphs demonstrate that data visualization can be humorous also. My favorite set of graphs and charts from Buzzfeed is “12 Graphs and Charts that Perfectly Illustrate What It’s Liike Getting Ready in the Morning” by Adam Ellis, Buzzfeed Staff.

http://www.buzzfeed.com/adamellis/graphs-and-charts-that-perfectly-illustrate-what-its-like

These graphs depict information that we can all identify with and do it in a fun whimsical way. This article would not be as effective if it was just called “12 Things We Can All Relate To When Getting Ready In The Morning” and for #5 Types of Breakfast, Ellis just wrote there are different types of breakfasts depending on how the morning is going, but there is almost always coffee. These graphs take data visualization to a whole new level because they include actual pictures that make the data more appealing and memorable for the reader. Although, these graphs and charts do not have vital or completely accurate information, I think the authors of Data and Design would very much appreciate their design and creativity.

original-29965-1400005596-11