Skip to main content
Top

2019 | OriginalPaper | Chapter

How Lexical Gold Standards Have Effects on the Usefulness of Text Analysis Tools for Digital Scholarship

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper describes how the current lexical similarity and analogy gold standards are built to conform to certain ideas about what the models they are designed to evaluate are used for. Topical relevance has always been the most important target notion for information access tools and related language technology technologies, and while this has proven a useful starting point for much of what information technology is used for, it does not always align well with other uses to which technologies are being put, most notably use cases from digital scholarship in the humanities or social sciences. This paper argues for more systematic formulation of requirements from the digital humanities and social sciences and more explicit description of the assumptions underlying model design.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Spärck Jones argues that this should be understood in terms of occurrence statistics rather than more elusive statistical notions. However, the target notion is a relevance-oriented one.
 
Literature
1.
go back to reference Baroni, M., Bernardi, R., Do, N.Q., Shan, C.C.: Entailment above the word level in distributional semantics. In: Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics. ACL (2012) Baroni, M., Bernardi, R., Do, N.Q., Shan, C.C.: Entailment above the word level in distributional semantics. In: Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics. ACL (2012)
2.
go back to reference Baroni, M., Lenci, A.: How we BLESSed distributional semantic evaluation. In: Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics. Association for Computational Linguistics (2011) Baroni, M., Lenci, A.: How we BLESSed distributional semantic evaluation. In: Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics. Association for Computational Linguistics (2011)
3.
go back to reference Chiarello, C., Burgess, C., Richards, L., Pollock, A.: Semantic and associative priming in the cerebral hemispheres: some words do, some words don’t... sometimes, some places. Brain Lang. 38(1), 75–104 (1990)CrossRef Chiarello, C., Burgess, C., Richards, L., Pollock, A.: Semantic and associative priming in the cerebral hemispheres: some words do, some words don’t... sometimes, some places. Brain Lang. 38(1), 75–104 (1990)CrossRef
4.
go back to reference Da, N.Z.: The computational case against computational literary studies. Crit. Inq. 45(3), 601–639 (2019)CrossRef Da, N.Z.: The computational case against computational literary studies. Crit. Inq. 45(3), 601–639 (2019)CrossRef
5.
go back to reference Da, N.Z.: The digital humanities debacle—computational methods repeatedly come up short. The Chronicle of Higher Education (2019) Da, N.Z.: The digital humanities debacle—computational methods repeatedly come up short. The Chronicle of Higher Education (2019)
6.
go back to reference Finkelstein, L., et al.: Placing search in context: the concept revisited. In: Proceedings of the International Conference on World Wide Web. ACM (2001) Finkelstein, L., et al.: Placing search in context: the concept revisited. In: Proceedings of the International Conference on World Wide Web. ACM (2001)
7.
go back to reference Fitzpatrick, K.: The humanities, done digitally. The Chronicle of Higher Education (2011) Fitzpatrick, K.: The humanities, done digitally. The Chronicle of Higher Education (2011)
8.
go back to reference Hill, F., Reichart, R., Korhonen, A.: Simlex-999: evaluating semantic models with (genuine) similarity estimation. Comput. Linguist. 41, 665–695 (2016)MathSciNetCrossRef Hill, F., Reichart, R., Korhonen, A.: Simlex-999: evaluating semantic models with (genuine) similarity estimation. Comput. Linguist. 41, 665–695 (2016)MathSciNetCrossRef
9.
go back to reference Jänicke, S., Franzini, G., Cheema, M.F., Scheuermann, G.: On close and distant reading in digital humanities: a survey and future challenges. In: Eurographics Conference on Visualization (EuroVis), vol. 2 (2015) Jänicke, S., Franzini, G., Cheema, M.F., Scheuermann, G.: On close and distant reading in digital humanities: a survey and future challenges. In: Eurographics Conference on Visualization (EuroVis), vol. 2 (2015)
10.
go back to reference Katz, S.M.: Distribution of content words and phrases in text and language modelling. Nat. Lang. Eng. 2(1), 15–59 (1996)CrossRef Katz, S.M.: Distribution of content words and phrases in text and language modelling. Nat. Lang. Eng. 2(1), 15–59 (1996)CrossRef
11.
go back to reference Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211–240 (1997)CrossRef Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211–240 (1997)CrossRef
12.
go back to reference Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (2013) Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (2013)
13.
go back to reference Moretti, F.: Distant Reading. Verso Books, London (2013) Moretti, F.: Distant Reading. Verso Books, London (2013)
14.
go back to reference O’Connor, B., Bamman, D., Smith, N.A.: Computational text analysis for social science: model assumptions and complexity. In: Second Workshop on Computational Social Science and the Wisdom of Crowds (2011) O’Connor, B., Bamman, D., Smith, N.A.: Computational text analysis for social science: model assumptions and complexity. In: Second Workshop on Computational Social Science and the Wisdom of Crowds (2011)
15.
go back to reference Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)CrossRef Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)CrossRef
16.
go back to reference Schwartz, H.A., Gomez, F.: Evaluating semantic metrics on tasks of concept similarity. In: Proceedings of FLAIRS (2011) Schwartz, H.A., Gomez, F.: Evaluating semantic metrics on tasks of concept similarity. In: Proceedings of FLAIRS (2011)
17.
go back to reference Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)CrossRef Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)CrossRef
18.
go back to reference Underwood, T.: Dear Humanists: Fear Not the Digital Revolution. The Chronicle of Higher Education (2019) Underwood, T.: Dear Humanists: Fear Not the Digital Revolution. The Chronicle of Higher Education (2019)
Metadata
Title
How Lexical Gold Standards Have Effects on the Usefulness of Text Analysis Tools for Digital Scholarship
Author
Jussi Karlgren
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-28577-7_14

Premium Partner