Skip to main content
Top
Published in: International Journal on Digital Libraries 4/2020

23-01-2020

The HathiTrust Digital Library’s potential for musicology research

Authors: J. Stephen Downie, Sayan Bhattacharyya, Francesca Giannetti, Eleanor Dickson Koehl, Peter Organisciak

Published in: International Journal on Digital Libraries | Issue 4/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The HathiTrust Digital Library (HTDL) is one of the largest digital libraries in the world, containing seventeen million volumes from the collections of major academic and research libraries. In this paper, we discuss the HTDL’s potential for musicology research by providing a bibliometric analysis of the collection as a whole, and of the music materials in particular. A series of case studies illustrates the kinds of musicological research that may be conducted using the HTDL. We highlight several opportunities for improvement and discuss promising future directions for new knowledge creation through the processing and analysis of large amounts of retrospective data. The HTDL presents significant new opportunities to the study of music that will continue to expand as data, metadata and collection enhancements are introduced.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
For example, the Bach digital database portal, designed to provide Bach researchers with “solid information on the works of Johann Sebastian Bach and other composers from the Bach family and their whereabouts,” http://​www.​bach-digital.​de/​content/​infos.​xml.
 
2
For example, Alexander Street’s Classical Scores Library, http://​alexanderstreet.​com/​products/​classical-scores-library-package.
 
3
For example, a virtual library of nineteenth-century California sheet music (from between the years 1852 and 1900), http://​people.​ischool.​berkeley.​edu/​~mkduggan/​neh.​html.
 
7
See “What are the different copyright statuses of items in HathiTrust, and what do they mean?” https://​www.​hathitrust.​org/​help_​copyright#RightsCodes.
 
9
We first converted the records we obtained from the HathiTrust into MARCXML (see: http://​www.​loc.​gov/​standards/​marcxml/​) using a Perl module (see: http://​search.​cpan.​org/​~gmcharlt/​MARC-File-MiJ/​lib/​MARC/​File/​MiJ.​pm). Then we used a Library of Congress XSLT stylesheet to convert the records to the Metadata Object Description Schema (MODS) format. The stylesheet was enhanced locally by consolidating information encoded in multiple MARC data and control fields, to reduce data loss and retain more detail about the conceptual characterizations of the items. Finally, a locally developed XSLT style sheet was used to transform the records into Structured Query Language (SQL) insert statements for populating the customized MODS database tables.
 
10
Of the 14.6 million item records that we examined, 6,672,311 (46%) had Library of Congress Classification numbers and 2,461,361 (16.8%) had Dewey Decimal Classification numbers. 2,165,140 had both, leaving only 296,221 (2% of the total items) that had a Dewey Decimal number but not a Library of Congress classification number. Of the volumes with a recorded classification authority (approximately 50% of the total records), only a very small number had any authority other than Library of Congress or Dewey Decimal. Classification authorities were not ascertainable for the remaining volumes, primarily because the HTDL does not retain local call numbers that can help in determining classification information. Although not retained, the local call number for a volume can, however, be retrieved if needed—via the holding record for the volume, using the contributing library’s local system number, which is stored by the HTDL.
 
11
An important source of inconsistency is the significant variation that is found in the format of the date field across records, originating from variations in the local standards used by contributing institutions for their own bibliographic records. For example, some bibliographic records use wildcard characters in the date field, which are not consistent with each other as, sometimes, different wildcard characters have been used. Ranges of years often appear, and their formats, too, are frequently different. (For example, “1904–1924,” “between 1920 and 1950,” etc.). Other sources of variation include the fact that some records use the character ‘u’, for ‘unknown’, in place of a digit—as in ‘18uu’ to denote an year which is not precisely known but is from the nineteenth century—other records may use a dash (for example, ‘198–’).
 
12
See https://​www.​hathitrust.​org/​visualizations_​dates for up-to-date chronological information.
 
14
The footprints of languages in the overall HTDL collection are shown in Fig. 3.
 
15
Consistent with the relative decline of German as a language of international scholarship starting in the 1920s, the proportion of German-language material in the music-class in the HTDL collection declines from about 61% for texts that are considered to be in public domain for researchers in the USA (which are mostly pre-1923 publications) to about 30% for texts that are in-copyright (which are mostly post-1923).
 
16
Each distinct Library of Congress Subject Heading in the HathiTrust metadata explored in this paper is counted separately, so that “Piano Music,” “Vocal Music,” etc. are categories distinct from the category “Music.”
 
17
Tune books constituted a genre of early American music publication that had a pedagogical aim, and they usually contained an instructional preface. That so few tune books are classified as such by subject in the HTDL is probably due to their having often been cataloged by libraries as ‘hymns.’
 
18
‘Music’ and ‘Musical Scores’ are overlapping categories for a small number items in the HTDL, which is an artifact of how these metadata are generated from the MARC bibliographic records.
 
19
Important early histories and reference works related to music and contained in the HTDL collection include Jean-Jacques Rousseau’s Dictionnaire de Musique (1768), Sir John Hawkins’s General History of the Science and Practice of Music (1776), and Charles Burney’s A General History of Music: From the Earliest Ages to the Present Period (1789).
 
20
This was the focus of a collaborative project carried out under the auspices of the HTRC in 2015; details can be found at: https://​www.​hathitrust.​org/​htrc_​acs_​awards_​spring2015.
 
21
For simple examples of how comparison and contrast between two corpora created from the HTDL collection can be performed by using the algorithmic tools provided by the HathiTrust Research Center, see: ‘Workset Builder and Portal of the HathiTrust Research Center’. HathiTrust Research Center UnCamp. Ann Arbor, Michigan. 30–31 March 2015, http://​bit.​ly/​1NF7QLi.
 
22
Sag [28] notes: ‘The HathiTrust aims to develop and facilitate the development of data mining and analysis of its digital collection. This activity would have qualified as “non-consumptive research” under the now defunct Amended Settlement Agreement [ASA]. “Non-consumptive research” as defined in the ASA is a form of non-expressive use...’
 
Literature
1.
go back to reference Beers, S., Parker, B.: HathiTrust and the challenge of digital audio. IASA J. 36, 38–46 (2011) Beers, S., Parker, B.: HathiTrust and the challenge of digital audio. IASA J. 36, 38–46 (2011)
12.
go back to reference Fujinaga, I., Hankinson, A., Cumming, J.E.: Introduction to SIMSSA (single interface for music score searching and analysis). In: Proceedings of the 1st International Workshop on Digital Libraries for Musicology, pp. 1–3. ACM. https://doi.org/10.1145/2660168.2660184 (2014) Fujinaga, I., Hankinson, A., Cumming, J.E.: Introduction to SIMSSA (single interface for music score searching and analysis). In: Proceedings of the 1st International Workshop on Digital Libraries for Musicology, pp. 1–3. ACM. https://​doi.​org/​10.​1145/​2660168.​2660184 (2014)
17.
go back to reference Moretti, F.: Distant Reading. Verso, London (2013) Moretti, F.: Distant Reading. Verso, London (2013)
19.
go back to reference Newcomer, N.L., Belford, R., Kulczak, D., Szeto, K., Matthews, J., Shaw, M.: Music discovery requirements: a guide to optimizing interfaces. Notes 69(3), 494–524 (2013)CrossRef Newcomer, N.L., Belford, R., Kulczak, D., Szeto, K., Matthews, J., Shaw, M.: Music discovery requirements: a guide to optimizing interfaces. Notes 69(3), 494–524 (2013)CrossRef
20.
go back to reference November, N.: Editing Beethoven’s middle-period quartets: performers, scholars and sources in dialogue. Ad Parnassum J. Eighteenth- Nineteenth-century Instrum. Music 12(24), 31–53 (2004) November, N.: Editing Beethoven’s middle-period quartets: performers, scholars and sources in dialogue. Ad Parnassum J. Eighteenth- Nineteenth-century Instrum. Music 12(24), 31–53 (2004)
22.
go back to reference Rathey, M.: Bach’s Major Vocal Works: Music, Drama, Liturgy. Yale University Press, New Haven (2016)CrossRef Rathey, M.: Bach’s Major Vocal Works: Music, Drama, Liturgy. Yale University Press, New Haven (2016)CrossRef
23.
go back to reference Ratliff, B.: Every Song Ever: Twenty Ways to Listen in an Age of Musical Plenty. Farrar, Straus and Giroux, New York (2016) Ratliff, B.: Every Song Ever: Twenty Ways to Listen in an Age of Musical Plenty. Farrar, Straus and Giroux, New York (2016)
24.
go back to reference Riley, J., Fujinaga, I.: Recommended best practices for digital image capture of musical scores. OCLC Syst. Serv. 19(2), 62–69 (2003)CrossRef Riley, J., Fujinaga, I.: Recommended best practices for digital image capture of musical scores. OCLC Syst. Serv. 19(2), 62–69 (2003)CrossRef
25.
go back to reference Romani, F.: I Due Figaro, Ossia Il Soggetto Di Una Commedia: Da Rappresentarsi Nel Ducale Teatro Di Parma La Primavera Del 1840. Filippo Carmignani (1840) Romani, F.: I Due Figaro, Ossia Il Soggetto Di Una Commedia: Da Rappresentarsi Nel Ducale Teatro Di Parma La Primavera Del 1840. Filippo Carmignani (1840)
26.
go back to reference Root, G.F.: Our National War Songs; A Complete Collection of Grand Old War Songs, Battle Songs, National Hymns, Memorial Hymns, Decoration Day Songs, Quartettes, Etc., with Accompaniment for Piano or Organ. S. Brainard’s Sons (1892) Root, G.F.: Our National War Songs; A Complete Collection of Grand Old War Songs, Battle Songs, National Hymns, Memorial Hymns, Decoration Day Songs, Quartettes, Etc., with Accompaniment for Piano or Organ. S. Brainard’s Sons (1892)
27.
go back to reference Rumsey, A.S.: When We Are No More: How Digital Memory Is Shaping Our Future. Bloomsbury, London, UK (2016) Rumsey, A.S.: When We Are No More: How Digital Memory Is Shaping Our Future. Bloomsbury, London, UK (2016)
28.
go back to reference Sag, M.: Orphan works as grist for the data mill. Berkeley Technol. Law J. 27(3), 1503–1550 (2012) Sag, M.: Orphan works as grist for the data mill. Berkeley Technol. Law J. 27(3), 1503–1550 (2012)
30.
go back to reference Sheer, M.: Dynamics in Beethoven’s late instrumental works: a new profile. J. Musicol. 16(3), 358–378 (1998)CrossRef Sheer, M.: Dynamics in Beethoven’s late instrumental works: a new profile. J. Musicol. 16(3), 358–378 (1998)CrossRef
31.
go back to reference Solomon, M.: Reason and imagination: Beethoven’s aesthetic evolution. Historical Musicology: Sources, pp. 188–203. Interpretations, University of Rochester Press, Methods (2008) Solomon, M.: Reason and imagination: Beethoven’s aesthetic evolution. Historical Musicology: Sources, pp. 188–203. Interpretations, University of Rochester Press, Methods (2008)
32.
go back to reference Tillett, B.B.: Authority control: state of the art and new perspectives. Cataloging Classif. Q. 38(3/4), 23–41 (2004)CrossRef Tillett, B.B.: Authority control: state of the art and new perspectives. Cataloging Classif. Q. 38(3/4), 23–41 (2004)CrossRef
Metadata
Title
The HathiTrust Digital Library’s potential for musicology research
Authors
J. Stephen Downie
Sayan Bhattacharyya
Francesca Giannetti
Eleanor Dickson Koehl
Peter Organisciak
Publication date
23-01-2020
Publisher
Springer Berlin Heidelberg
Published in
International Journal on Digital Libraries / Issue 4/2020
Print ISSN: 1432-5012
Electronic ISSN: 1432-1300
DOI
https://doi.org/10.1007/s00799-020-00283-7

Other articles of this Issue 4/2020

International Journal on Digital Libraries 4/2020 Go to the issue

Premium Partner