Blog 1 – Reverse Engineering

The project that I am going to reverse engineer within this blog post is In 500 Billion Worlds, a New Window on Culture. The project allows you to use an online database which enables you to type in an up to 5 word phrase and you can see how over time those phrases were cultivated, used in literature, and changed over time. This scholarly algorithm has been used by many PHD students and even middle schoolers. For example typing the word “women” can show that up until the 1970’s modern literature did not put emphasis on feminism and readily classed the word “men” more so than women.

The work’s algorithm can help you find out information in relation to sciences, math, and academic articles. But it can also help you find out things like, Jimmy Carter was a much more widely conversed topic then was Mickey Mouse or Marilyn Monroe. This research started back in 2004, before Google Books ever existed. The Leiberman couple who originated the project talked about the pressure and long hours that went into the project stating that,

“We were exhausted,” Mr. Lieberman Aiden said. That painstaking work “was a total Hail Mary pass; we could have collected this data set and proved nothing.”

The painstaking work of the couple did, however pay off, in that 11 percent of the entirety of published books has been added to Google Scholar and Google Books. This includes over two trillion words used within Google Books, meaning the amount of phrases and word manipulations are endless.

Sources: The sources generally included most computational learning because of the need to code an algorithm for a strong enough site. Additionally the study and knowledge of old english and modern day dictionaries played a large part in analyzing words and phrases, and picking up on there origins, stems, and meanings. Additionally, this program was supported by the Foundational Questions in Evolutionary Biology Prize Fellowship and the Systems Biology Program (Harvard Medical School). Y.K.S. was supported by internships at Google. S.P. acknowledges support from NIH grant HD 18381. E.L.A. was supported by the Harvard Society of Fellows, the Fannie and John Hertz Foundation Graduate Fellowship, a National Defense Science and Engineering Graduate Fellowship, an NSF Graduate Fellowship, the National Space Biomedical Research Institute, and National Human Genome Research Institute grant T32 HG002295. This work was supported by a Google Research Award. The Program for Evolutionary Dynamics acknowledges support from the Templeton Foundation. Additionally many other grants and awards were given to the funding and knowledge of this project.

Processes: Culturomic analyses was used to study millions of books at once. The algorithm which was used utilized usage frequency and computed by dividing the number of instances of the n-gram in a given year by the total number of words in the corpus in that year. For instance, in 1861, the 1-gram “slavery” appeared in the corpus 21,460 times, on 11,687 pages of 1208 books. The corpus contains 386,434,758 words from 1861; thus, the frequency is 5.5 × 10⁻⁵. The use of “slavery” peaked during the Civil War (early 1860s) and then again during the civil rights movement (1955–1968). The process to generate this algorithm focuses on historical precedence, and did not follow a new age approach. History was the initial reasoning behind the project and in order to create a factual and intellectual algorithm in order to generate Google Books, the teams worked tirelessly to combine new age technology and knowledge of language. The site understands German, Chinese, French, Hebrew, and Russian. Additionally the evolution of grammar had to be looked at in order to determine similar or same words from Old English to modern day language.

Additional Critique: This highly innovative algorithm was a smart move on Google’s part in that the utilization and origin of words is common practice. Books that are paperback are hard to come by in a digital age so fueled by e-books and such, that the digitizing of books both old and new helps the academic process along by making more resources readily available to both researchers and students alike.

DH101

Introduction to Digital Humanities

Blog 1 – Reverse Engineering

About this course

Recent Posts

Archives

Categories

Meta

DH101

Introduction to Digital Humanities

Blog 1 – Reverse Engineering

Previous post

Next post

About this course

Recent Posts

Archives

Categories

Meta