Skip to main content
Top

2019 | OriginalPaper | Chapter

Curatr: A Platform for Semantic Analysis and Curation of Historical Literary Texts

Authors : Susan Leavy, Gerardine Meaney, Karen Wade, Derek Greene

Published in: Metadata and Semantic Research

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The increasing availability of digital collections of historical and contemporary literature presents a wealth of possibilities for new research in the humanities. The scale and diversity of such collections however, presents particular challenges in identifying and extracting relevant content. This paper presents Curatr, an online platform for the exploration and curation of literature with machine learning-supported semantic search, designed within the context of digital humanities scholarship. The platform provides a text mining workflow that combines neural word embeddings with expert domain knowledge to enable the generation of thematic lexicons, allowing researches to curate relevant sub-corpora from a large corpus of 18th and 19th century digitised texts.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
3
Ricorso: Database of Irish writers http://​www.​ricorso.​net.
 
4
Farjeon features more prominently in histories of Australian literature as he emigrated there.
 
Literature
1.
go back to reference Bailey, E., et al.: CULTURA: supporting enhanced exploration of cultural archives through personalisation. In: the Proceedings of the 2nd International Conference on Humanities, Society and Culture, ICHSC. ICHSC (2012) Bailey, E., et al.: CULTURA: supporting enhanced exploration of cultural archives through personalisation. In: the Proceedings of the 2nd International Conference on Humanities, Society and Culture, ICHSC. ICHSC (2012)
2.
go back to reference Barry, C.L.: User-defined relevance criteria: an exploratory study. J. Am. Soc. Inf. Sci. 45(3), 149–159 (1994)CrossRef Barry, C.L.: User-defined relevance criteria: an exploratory study. J. Am. Soc. Inf. Sci. 45(3), 149–159 (1994)CrossRef
3.
go back to reference Bates, M.J.: The Getty end-user online searching project in the humanities: report no. 6: overview and conclusions. Coll. Res. Libr. 57(6), 514–523 (1996)CrossRef Bates, M.J.: The Getty end-user online searching project in the humanities: report no. 6: overview and conclusions. Coll. Res. Libr. 57(6), 514–523 (1996)CrossRef
4.
go back to reference Camacho-Collados, J., Pilehvar, M.T.: On the role of text preprocessing in neural network architectures: an evaluation study on text categorization and sentiment analysis. arXiv preprint arXiv:1707.01780 (2017) Camacho-Collados, J., Pilehvar, M.T.: On the role of text preprocessing in neural network architectures: an evaluation study on text categorization and sentiment analysis. arXiv preprint arXiv:​1707.​01780 (2017)
5.
go back to reference Chanen, A.: Deep learning for extracting word-level meaning from safety report narratives. In: Integrated Communications Navigation and Surveillance (ICNS), p. 5D2-1. IEEE (2016) Chanen, A.: Deep learning for extracting word-level meaning from safety report narratives. In: Integrated Communications Navigation and Surveillance (ICNS), p. 5D2-1. IEEE (2016)
6.
go back to reference Chiticariu, L., Li, Y., Reiss, F.R.: Rule-based information extraction is dead! Long live rule-based information extraction systems! In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 827–832 (2013) Chiticariu, L., Li, Y., Reiss, F.R.: Rule-based information extraction is dead! Long live rule-based information extraction systems! In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 827–832 (2013)
7.
go back to reference Clarke, C.L., et al.: Novelty and diversity in information retrieval evaluation. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 659–666. ACM (2008) Clarke, C.L., et al.: Novelty and diversity in information retrieval evaluation. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 659–666. ACM (2008)
8.
go back to reference Cohn, S.K.: Pandemics: waves of disease, waves of hate from the plague of athens to aids. Hist. Res. 85(230), 535–555 (2012)CrossRef Cohn, S.K.: Pandemics: waves of disease, waves of hate from the plague of athens to aids. Hist. Res. 85(230), 535–555 (2012)CrossRef
9.
go back to reference Dempster, J.A.: Thomas Nelson and Sons in the late nineteenth century: a study in motivation. Part 1. Publ. Hist. 13, 41 (1983) Dempster, J.A.: Thomas Nelson and Sons in the late nineteenth century: a study in motivation. Part 1. Publ. Hist. 13, 41 (1983)
10.
go back to reference Fast, E., Chen, B., Bernstein, M.S.: Empath: understanding topic signals in large-scale text. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 4647–4657. ACM (2016) Fast, E., Chen, B., Bernstein, M.S.: Empath: understanding topic signals in large-scale text. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 4647–4657. ACM (2016)
11.
go back to reference Firth, J.R.: A synopsis of linguistic theory, 1930–1955. In: Studies in Linguistic Analysis (1957) Firth, J.R.: A synopsis of linguistic theory, 1930–1955. In: Studies in Linguistic Analysis (1957)
12.
go back to reference Flanders, J., Jannidis, F.: The Shape of Data in Digital Humanities: Modeling Texts and Text-based Resources. Routledge, Abingdon (2018)CrossRef Flanders, J., Jannidis, F.: The Shape of Data in Digital Humanities: Modeling Texts and Text-based Resources. Routledge, Abingdon (2018)CrossRef
13.
go back to reference Frank, A., Bögel, T., Hellwig, O., Reiter, N.: Semantic annotation for the digital humanities. Linguist. Issues Lang. Technol. 7(1), 1–21 (2012) Frank, A., Bögel, T., Hellwig, O., Reiter, N.: Semantic annotation for the digital humanities. Linguist. Issues Lang. Technol. 7(1), 1–21 (2012)
14.
go back to reference Hamilton, W.L., Clark, K., Leskovec, J., Jurafsky, D.: Inducing domain-specific sentiment lexicons from unlabeled corpora. In: Proceedings of the EMNLP 2016, vol. 2016, p. 595. NIH Public Access (2016) Hamilton, W.L., Clark, K., Leskovec, J., Jurafsky, D.: Inducing domain-specific sentiment lexicons from unlabeled corpora. In: Proceedings of the EMNLP 2016, vol. 2016, p. 595. NIH Public Access (2016)
15.
go back to reference Hampson, C., Munnelly, G., Bailey, E., Lawless, S., Conlan, O.: Improving user control and transparency in the digital humanities. In: 2013 International Conference on Culture and Computing (Culture Computing), pp. 196–197. IEEE (2013) Hampson, C., Munnelly, G., Bailey, E., Lawless, S., Conlan, O.: Improving user control and transparency in the digital humanities. In: 2013 International Conference on Culture and Computing (Culture Computing), pp. 196–197. IEEE (2013)
16.
go back to reference Hinrichs, U., et al.: Trading consequences: a case study of combining text mining and visualization to facilitate document exploration. Digit. Sch. Humanit. 30(suppl\(\_\)1), i50–i75 (2015) Hinrichs, U., et al.: Trading consequences: a case study of combining text mining and visualization to facilitate document exploration. Digit. Sch. Humanit. 30(suppl\(\_\)1), i50–i75 (2015)
17.
go back to reference Jackson, H.J.: Marginalia: Readers Writing in Books. Yale University Press, New Haven (2002) Jackson, H.J.: Marginalia: Readers Writing in Books. Yale University Press, New Haven (2002)
18.
go back to reference Jockers, M.: Detecting and characterizing national style in the 19th century novel. In: Digital Humanities, Stanford, CA (2011) Jockers, M.: Detecting and characterizing national style in the 19th century novel. In: Digital Humanities, Stanford, CA (2011)
19.
go back to reference Kinealy, C.: This Great Calamity: The Great Irish Famine: The Irish Famine 1845–52. Gill & Macmillan Ltd., Dublin (2006) Kinealy, C.: This Great Calamity: The Great Irish Famine: The Irish Famine 1845–52. Gill & Macmillan Ltd., Dublin (2006)
20.
go back to reference Kopaczyk, J.: The Legal Language of Scottish Burghs: Standardization and Lexical Bundles (1380–1560). Oxford University Press, Oxford (2013)CrossRef Kopaczyk, J.: The Legal Language of Scottish Burghs: Standardization and Lexical Bundles (1380–1560). Oxford University Press, Oxford (2013)CrossRef
22.
go back to reference Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:​1301.​3781 (2013)
23.
go back to reference Morash, C.: The Hungry Voice: The Poetry of the Irish Famine. Irish Academic Press, Newbridge (2009) Morash, C.: The Hungry Voice: The Poetry of the Irish Famine. Irish Academic Press, Newbridge (2009)
24.
go back to reference Mulvey-Roberts, M.: The Handbook of the Gothic. Springer, Heidelberg (2016) Mulvey-Roberts, M.: The Handbook of the Gothic. Springer, Heidelberg (2016)
25.
go back to reference Murray, J.: The social enterprise law market. Md. L. Rev. 75, 541 (2015) Murray, J.: The social enterprise law market. Md. L. Rev. 75, 541 (2015)
26.
go back to reference Nelkin, D., Gilman, S.L.: Placing blame for devastating disease. Soc. Res. 55, 361–378 (1988) Nelkin, D., Gilman, S.L.: Placing blame for devastating disease. Soc. Res. 55, 361–378 (1988)
27.
go back to reference Park, D., Kim, S., Lee, J., Choo, J., Diakopoulos, N., Elmqvist, N.: ConceptVector: text visual analytics via interactive lexicon building using word embedding. IEEE Trans. Visual Comput. Graph. 24(1), 361–370 (2018)CrossRef Park, D., Kim, S., Lee, J., Choo, J., Diakopoulos, N., Elmqvist, N.: ConceptVector: text visual analytics via interactive lexicon building using word embedding. IEEE Trans. Visual Comput. Graph. 24(1), 361–370 (2018)CrossRef
28.
go back to reference Rochelson, M.J.: “They that walk in darkness”: Ghetto tragedies: the uses of Christianity in Israel Zangwill’s fiction. Victorian Lit. Cult. 27(1), 219–233 (1999)CrossRef Rochelson, M.J.: “They that walk in darkness”: Ghetto tragedies: the uses of Christianity in Israel Zangwill’s fiction. Victorian Lit. Cult. 27(1), 219–233 (1999)CrossRef
29.
go back to reference Rochelson, M.J.: A Jew in the Public Arena: The Career of Israel Zangwill. Wayne State University Press, Detroit (2010) Rochelson, M.J.: A Jew in the Public Arena: The Career of Israel Zangwill. Wayne State University Press, Detroit (2010)
30.
go back to reference Subramanian, A., Pruthi, D., Jhamtani, H., Berg-Kirkpatrick, T., Hovy, E.: SPINE: sparse interpretable neural embeddings. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018) Subramanian, A., Pruthi, D., Jhamtani, H., Berg-Kirkpatrick, T., Hovy, E.: SPINE: sparse interpretable neural embeddings. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
31.
go back to reference Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1555–1565 (2014) Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1555–1565 (2014)
32.
go back to reference Udelson, J.H.: Dreamer of the Ghetto: The Life and Works of Israel Zangwill. University of Alabama Press, Tuscaloosa (1990) Udelson, J.H.: Dreamer of the Ghetto: The Life and Works of Israel Zangwill. University of Alabama Press, Tuscaloosa (1990)
33.
go back to reference Van Cranenburgh, A., van Dalen-Oskam, K., van Zundert, J.: Vector space explorations of literary language. Lang. Resour. Eval. (2019) Van Cranenburgh, A., van Dalen-Oskam, K., van Zundert, J.: Vector space explorations of literary language. Lang. Resour. Eval. (2019)
34.
go back to reference Vane, O.: Text visualisation tool for exploring digitised historical documents. In: Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility, pp. 153–158. ACM (2018) Vane, O.: Text visualisation tool for exploring digitised historical documents. In: Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility, pp. 153–158. ACM (2018)
35.
go back to reference Wohlgenannt, G., Chernyak, E., Ilvovsky, D.: Extracting social networks from literary text with word embedding tools. In: Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities, pp. 18–25 (2016) Wohlgenannt, G., Chernyak, E., Ilvovsky, D.: Extracting social networks from literary text with word embedding tools. In: Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities, pp. 18–25 (2016)
36.
go back to reference Wolfe, J.: Annotations and the collaborative digital library: effects of an aligned annotation interface on student argumentation and reading strategies. Int. J. Comput.-Support. Collab. Learn. 3(2), 141 (2008)CrossRef Wolfe, J.: Annotations and the collaborative digital library: effects of an aligned annotation interface on student argumentation and reading strategies. Int. J. Comput.-Support. Collab. Learn. 3(2), 141 (2008)CrossRef
Metadata
Title
Curatr: A Platform for Semantic Analysis and Curation of Historical Literary Texts
Authors
Susan Leavy
Gerardine Meaney
Karen Wade
Derek Greene
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-36599-8_31