I will first clean the dataset so that there are not replicating titles in all the content types, using the clustering in text facet. Our interest in this dataset lie in various directions, such as the historical relevance of this dataset, what is reflected on the entertainment history and politics during the period of 1918-1930, the development of patent and usage rights, and how does technology affect the history of sound recordings. In order to discover such topics, I will group relevant columns together in order to only look at necessary data. The facet feature is very helpful in terms of looking at a certain period of time, or one kind of featured instrument type. Moreover, I split the column that contains the composer statement source into 3 separate ones so that I can take a look of what kind of copyright distribution source the recording adopted and how do different kinds of sources overlap and interact with each other. I wanted to move all the ADP statement source type into one single column, because now they are split into different ones (every record can have multiple statement source types and they are logged in different orders, so ADP ended up in different columns) and it is kind of hard to only select this one type to look at.