Skip to main content

2016 | OriginalPaper | Buchkapitel

From Close to Distant and Back: How to Read with the Help of Machines

verfasst von : Rudi Bonfiglioli, Federico Nanni

Erschienen in: History and Philosophy of Computing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In recent years a common trend characterised by the adoption of text mining methods for the study of digital sources emerged in digital humanities, often in opposition to traditional hermeneutic approaches. In our paper, we intend to show how text mining methods will always need a strong support from the humanist. On the one hand we remark how humanities research involving computational techniques should be thought of as a three steps process: from close reading (identification of a specific case study, initial feature selection) to distant reading (text mining analysis) to close reading again (evaluation of the results, interpretation, use of the results). Moreover, we highlight how failing to understand the importance of all the three steps is a major cause for the mistrust in text mining techniques developed around the humanities. On the other hand we observe that text mining techniques could be a very promising tool for the humanities and that researchers should not renounce to such approaches, but should instead experiment with advanced methods such as the ones belonging to the family of deep learning. In this sense we remark that, especially in the field of digital humanities, exploiting complementarity between computational methods and humans will be the most advantageous research direction.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Hockey, S.: The history of humanities computing. In: A Companion to Digital Humanities, pp. 3–19 (2004) Hockey, S.: The history of humanities computing. In: A Companion to Digital Humanities, pp. 3–19 (2004)
2.
Zurück zum Zitat Svensson, P.: The landscape of digital humanities. Digit. Humanit. (2010) Svensson, P.: The landscape of digital humanities. Digit. Humanit. (2010)
3.
Zurück zum Zitat Berry, D.M.: The computational turn: thinking about the digital humanities. Cult. Mach. 12, 2 (2011) Berry, D.M.: The computational turn: thinking about the digital humanities. Cult. Mach. 12, 2 (2011)
4.
Zurück zum Zitat Moretti, F.: Distant Reading. Verso Books, London (2013) Moretti, F.: Distant Reading. Verso Books, London (2013)
5.
Zurück zum Zitat Schulz, K.: What is distant reading. The New York Times 24 (2011) Schulz, K.: What is distant reading. The New York Times 24 (2011)
7.
Zurück zum Zitat Underwood, T.: Why digital humanities isn’t actually ’the next thing in literary studies’. The Stone and the Shell 27 (2011) Underwood, T.: Why digital humanities isn’t actually ’the next thing in literary studies’. The Stone and the Shell 27 (2011)
8.
Zurück zum Zitat Underwood, T.: The literary uses of high-dimensional space. Big Data Soc. 2(2) (2015) Underwood, T.: The literary uses of high-dimensional space. Big Data Soc. 2(2) (2015)
9.
Zurück zum Zitat Marche, S.: Literature is not data: against digital humanities. LA Review of Books 28 (2012) Marche, S.: Literature is not data: against digital humanities. LA Review of Books 28 (2012)
12.
Zurück zum Zitat Busa, R.: Index Thomisticus Sancti Thomae Aquinatis Operum Omnium Indices Et Concordantiae in Quibus Verborum Omnium Et Singulorum Formae Et Lemmata Cum Suis Frequentiis Et Contextibus Variis Modis Referuntur (1974) Busa, R.: Index Thomisticus Sancti Thomae Aquinatis Operum Omnium Indices Et Concordantiae in Quibus Verborum Omnium Et Singulorum Formae Et Lemmata Cum Suis Frequentiis Et Contextibus Variis Modis Referuntur (1974)
13.
Zurück zum Zitat Dalbello, M.: A genealogy of digital humanities. J. Documentation 67(3), 480–506 (2011)CrossRef Dalbello, M.: A genealogy of digital humanities. J. Documentation 67(3), 480–506 (2011)CrossRef
14.
Zurück zum Zitat Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inform. Sci. Technol. 60(3), 538–556 (2009)CrossRef Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inform. Sci. Technol. 60(3), 538–556 (2009)CrossRef
15.
Zurück zum Zitat Knowles, A.K.: GIS and history. In: Placing History: How Maps, Spatial Data, and GIS are Changing Historical Scholarship. Esri Press (2008) Knowles, A.K.: GIS and history. In: Placing History: How Maps, Spatial Data, and GIS are Changing Historical Scholarship. Esri Press (2008)
16.
Zurück zum Zitat Boschetti, F., Romanello, M., Babeu, A., Bamman, D., Crane, G.: Improving OCR accuracy for classical critical editions. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) ECDL 2009. LNCS, vol. 5714, pp. 156–167. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04346-8_17 CrossRef Boschetti, F., Romanello, M., Babeu, A., Bamman, D., Crane, G.: Improving OCR accuracy for classical critical editions. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) ECDL 2009. LNCS, vol. 5714, pp. 156–167. Springer, Heidelberg (2009). doi:10.​1007/​978-3-642-04346-8_​17 CrossRef
17.
Zurück zum Zitat Ide, N., Veronis, J.: Text Encoding Initiative: Background and Contexts, vol. 29. Springer Science & Business Media, Dordrecht (1995)CrossRef Ide, N., Veronis, J.: Text Encoding Initiative: Background and Contexts, vol. 29. Springer Science & Business Media, Dordrecht (1995)CrossRef
18.
Zurück zum Zitat Rydberg-Cox, J.: Digital Libraries and the Challenges of Digital Humanities. Elsevier, Boston (2005) Rydberg-Cox, J.: Digital Libraries and the Challenges of Digital Humanities. Elsevier, Boston (2005)
19.
Zurück zum Zitat Mitkov, R.: The Oxford Handbook of Computational Linguistics. Oxford University Press, New York (2005)MATH Mitkov, R.: The Oxford Handbook of Computational Linguistics. Oxford University Press, New York (2005)MATH
20.
Zurück zum Zitat Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing, vol. 999. MIT Press, Cambridge (1999)MATH Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing, vol. 999. MIT Press, Cambridge (1999)MATH
21.
Zurück zum Zitat Lenneberg, E.H., Chomsky, N., Marx, O.: Biological Foundations of Language, vol. 68. Wiley, New York (1967) Lenneberg, E.H., Chomsky, N., Marx, O.: Biological Foundations of Language, vol. 68. Wiley, New York (1967)
22.
Zurück zum Zitat Nadkarni, P.M., Ohno-Machado, L., Chapman, W.W.: Natural language processing: an introduction. J. Am. Med. Inform. Assoc. 18(5), 544–551 (2011)CrossRef Nadkarni, P.M., Ohno-Machado, L., Chapman, W.W.: Natural language processing: an introduction. J. Am. Med. Inform. Assoc. 18(5), 544–551 (2011)CrossRef
23.
Zurück zum Zitat Juola, P.: Authorship attribution. Found. Trends Inf. Retrieval 1(3), 233–334 (2006)CrossRef Juola, P.: Authorship attribution. Found. Trends Inf. Retrieval 1(3), 233–334 (2006)CrossRef
24.
Zurück zum Zitat Kirschenbaum, M.G.: The remaking of reading: data mining and the digital humanities. In: Proceedings of the National Science Foundation Symposium on Next Generation of Data Mining and Cyber-Enabled Discovery for Innovation, Baltimore, MD (2007) Kirschenbaum, M.G.: The remaking of reading: data mining and the digital humanities. In: Proceedings of the National Science Foundation Symposium on Next Generation of Data Mining and Cyber-Enabled Discovery for Innovation, Baltimore, MD (2007)
25.
Zurück zum Zitat Rothman, J.: An Attempt to Discover the Laws of Literature. The New Yorker (2014) Rothman, J.: An Attempt to Discover the Laws of Literature. The New Yorker (2014)
26.
Zurück zum Zitat Moretti, F.: Graphs, Maps, Trees: Abstract Models for a Literary History. Verso Books, London (2005) Moretti, F.: Graphs, Maps, Trees: Abstract Models for a Literary History. Verso Books, London (2005)
27.
Zurück zum Zitat Liu, A.: The state of the digital humanities: a report and a critique. Arts Human. High. Educ. 11(1–2), 8–41 (2012)CrossRef Liu, A.: The state of the digital humanities: a report and a critique. Arts Human. High. Educ. 11(1–2), 8–41 (2012)CrossRef
28.
Zurück zum Zitat Merriman, B.: A Science of Literature. Boston Review (2015) Merriman, B.: A Science of Literature. Boston Review (2015)
29.
Zurück zum Zitat Jockers, M.L.: Macroanalysis: Digital Methods and Literary History. University of Illinois Press, Urbana (2013) Jockers, M.L.: Macroanalysis: Digital Methods and Literary History. University of Illinois Press, Urbana (2013)
30.
Zurück zum Zitat Graham, S., Milligan, I., Weingart, S.: The Historian’s Macroscope: Big Digital History. Imperial College Press, London (2016) Graham, S., Milligan, I., Weingart, S.: The Historian’s Macroscope: Big Digital History. Imperial College Press, London (2016)
31.
Zurück zum Zitat Fish, S.: Mind your P’s, B’s: The digital humanities and interpretation. New York Times 23, no. 1 (2012) Fish, S.: Mind your P’s, B’s: The digital humanities and interpretation. New York Times 23, no. 1 (2012)
33.
Zurück zum Zitat Owens, T.: Discovery, justification are different: Notes on science-ing the humanities (2012) Owens, T.: Discovery, justification are different: Notes on science-ing the humanities (2012)
34.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
35.
Zurück zum Zitat Meeks, E., Weingart, S.: The digital humanities contribution to topic modeling. J. Digit. Humanit. 2(1) (2012) Meeks, E., Weingart, S.: The digital humanities contribution to topic modeling. J. Digit. Humanit. 2(1) (2012)
36.
Zurück zum Zitat Yang, T.I., Torget, A.J., Mihalcea, R.: Topic modeling on historical newspapers. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 96–104. Association for Computational Linguistics (2011) Yang, T.I., Torget, A.J., Mihalcea, R.: Topic modeling on historical newspapers. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 96–104. Association for Computational Linguistics (2011)
37.
Zurück zum Zitat Hall, D., Jurafsky, D., Manning, C.D.: Studying the history of ideas using topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2008) Hall, D., Jurafsky, D., Manning, C.D.: Studying the history of ideas using topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2008)
38.
Zurück zum Zitat Weingart, S.: Topic Modeling and Network Analysis. The Scottbot Irregular (2011) Weingart, S.: Topic Modeling and Network Analysis. The Scottbot Irregular (2011)
39.
Zurück zum Zitat Rhody, L.: Topic modeling and figurative language. J. Digit. Humanit. 2(1) (2012) Rhody, L.: Topic modeling and figurative language. J. Digit. Humanit. 2(1) (2012)
40.
Zurück zum Zitat Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: Advances in Neural Information Processing Systems, pp. 288–296 (2009) Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: Advances in Neural Information Processing Systems, pp. 288–296 (2009)
41.
Zurück zum Zitat Nanni, F., Fabo, P.R.: Entities as topic labels: improving topic interpretability and evaluability combining entity linking and labeled LDA. arXiv preprint arXiv:1604.07809 (2016) Nanni, F., Fabo, P.R.: Entities as topic labels: improving topic interpretability and evaluability combining entity linking and labeled LDA. arXiv preprint arXiv:​1604.​07809 (2016)
42.
Zurück zum Zitat Maas, A.L., Ng, A.Y.: A probabilistic model for semantic word vectors. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2010) Maas, A.L., Ng, A.Y.: A probabilistic model for semantic word vectors. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2010)
43.
Zurück zum Zitat Wolfreys, J.: Readings: Acts of Close Reading in Literary Theory. Edinburgh University Press, Edinburgh (2000) Wolfreys, J.: Readings: Acts of Close Reading in Literary Theory. Edinburgh University Press, Edinburgh (2000)
44.
Zurück zum Zitat Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)CrossRef Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)CrossRef
45.
Zurück zum Zitat Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). doi:10.1007/BFb0026683 CrossRef Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). doi:10.​1007/​BFb0026683 CrossRef
46.
Zurück zum Zitat Diederich, J., Kindermann, J., Leopold, E., Paass, G.: Authorship attribution with support vector machines. Appl. Intell. 19, 109–123 (2003)CrossRefMATH Diederich, J., Kindermann, J., Leopold, E., Paass, G.: Authorship attribution with support vector machines. Appl. Intell. 19, 109–123 (2003)CrossRefMATH
47.
Zurück zum Zitat Sculley, D., Pasanek, B.M.: Meaning and mining: the impact of implicit assumptions in data mining for the humanities. Literary Linguist. Comput. 23(4), 409–424 (2008)CrossRef Sculley, D., Pasanek, B.M.: Meaning and mining: the impact of implicit assumptions in data mining for the humanities. Literary Linguist. Comput. 23(4), 409–424 (2008)CrossRef
48.
Zurück zum Zitat Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 5, 1798–1828 (2013)CrossRef Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 5, 1798–1828 (2013)CrossRef
49.
Zurück zum Zitat Christenson, H.: HathiTrust. Libr. Res. Techn. Serv. (2011) Christenson, H.: HathiTrust. Libr. Res. Techn. Serv. (2011)
50.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)
51.
Zurück zum Zitat Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: HLT-NAACL (2013) Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: HLT-NAACL (2013)
52.
Zurück zum Zitat Bjerva, J., Praet, R.: Word embeddings pointing the way for late antiquity. In: LaTeCH (2015) Bjerva, J., Praet, R.: Word embeddings pointing the way for late antiquity. In: LaTeCH (2015)
53.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)
54.
Zurück zum Zitat Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2013) Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2013)
55.
Zurück zum Zitat Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (2012) Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (2012)
56.
Zurück zum Zitat Trask, A., Michalak, P., Liu, J.: sense2vec-a fast and accurate method for word sense disambiguation. In: Neural Word Embeddings (2015) Trask, A., Michalak, P., Liu, J.: sense2vec-a fast and accurate method for word sense disambiguation. In: Neural Word Embeddings (2015)
57.
Zurück zum Zitat Nanni, F., Kuemper, H., Ponzetto, S.P.: Semi-supervised textual analysis, historical research helping each other: some thoughts and observations. Int. J. Humanit. Arts Comput. (2016) Nanni, F., Kuemper, H., Ponzetto, S.P.: Semi-supervised textual analysis, historical research helping each other: some thoughts and observations. Int. J. Humanit. Arts Comput. (2016)
58.
Zurück zum Zitat Mimno, D.: Computational historiography: data mining in a century of classics journals. J. Comput. Cult. Heritage 5, 1–19 (2012)CrossRef Mimno, D.: Computational historiography: data mining in a century of classics journals. J. Comput. Cult. Heritage 5, 1–19 (2012)CrossRef
59.
Zurück zum Zitat Siemens, L.: It’s a team if you use ‘reply all’: an exploration of research teams in digital humanities environments. Literary Linguist. Comput. 24, 225–233 (2009)CrossRef Siemens, L.: It’s a team if you use ‘reply all’: an exploration of research teams in digital humanities environments. Literary Linguist. Comput. 24, 225–233 (2009)CrossRef
61.
Zurück zum Zitat Thaller, M.: Controversies around the Digital Humanities: An Agenda. Historical Social Research/Historische Sozialforschung (2012) Thaller, M.: Controversies around the Digital Humanities: An Agenda. Historical Social Research/Historische Sozialforschung (2012)
62.
Zurück zum Zitat Cohen, D.J., Frisch, M., Gallagher, P., Mintz, S., Sword, K., Taylor, A.M., Thomas, W.G., Turkel, W.J.: Interchange: the promise of digital history. J. Am. Hist. (2008) Cohen, D.J., Frisch, M., Gallagher, P., Mintz, S., Sword, K., Taylor, A.M., Thomas, W.G., Turkel, W.J.: Interchange: the promise of digital history. J. Am. Hist. (2008)
63.
Zurück zum Zitat Autor, D.H.: Why are there still so many jobs? The history and future of workplace automation. J. Econ. Perspect. 29, 3–30 (2015)CrossRef Autor, D.H.: Why are there still so many jobs? The history and future of workplace automation. J. Econ. Perspect. 29, 3–30 (2015)CrossRef
64.
Zurück zum Zitat Top, N.M.: Counterterrorism’s new tool: ‘Metanetwork’ analysis (2009) Top, N.M.: Counterterrorism’s new tool: ‘Metanetwork’ analysis (2009)
Metadaten
Titel
From Close to Distant and Back: How to Read with the Help of Machines
verfasst von
Rudi Bonfiglioli
Federico Nanni
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-47286-7_6

Premium Partner