Skip to main content

2014 | OriginalPaper | Buchkapitel

3. Automated Information Extraction

verfasst von : Claudio Cioffi-Revilla

Erschienen in: Introduction to Computational Social Science

Verlag: Springer London

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Chapter 1 identified automated information extraction (also known as computational content analysis or media-mining) as the first area of Computational Social Science. Chapter 3 takes a close look at this area, beginning with roots in linguistics. Computational text mining has been the main application of this area of CSS, but audio, imagery, and social media data are also components of the expanding Big Data universe. Theory and research in automated information extraction is at the base of major social science discoveries, such as universal semantic spaces and the fundamental structure of human information-processing. A major focus of this chapter is on the methodology of automated information extraction, including phases that extend from the formulation of research questions to the selection of sources, preprocessing preparations, to analysis in a technical sense. Illustrative examples are provided, including some recent transformative breakthroughs in computational events data analysis and geospatial data structures. The material in this chapter has intrinsic value as well as being instrumental for understanding networks, complexity, and simulation modeling approaches in subsequent chapters.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Much of modern science is said to have roots in the ancient Greeks. This is quite true, but others before them may have contributed earlier scientific ideas contained in media that have been lost (manuscripts, inscriptions) due to the destruction of many large ancient libraries, such as those of Alexandria, Antioch, Baghdad, Córdoba, and Damascus, just to mention some of those in the Mediterranean world. India and China also experienced the destruction of many libraries during their early history.
 
2
By contrast, John von Neumann's (1958) computer model of the human brain-mind phenomenon turned out to be wrong. Unlike von Neumann's, the EPA-space model of the human mind is empirically validated, even if it still lacks deep theoretical explanation.
 
3
The predecessor of Surveyor was called Attitude, which was also developed by David Heise (1982) as the first computer-based extractor of EPA ratings, replacing the old paper-based forms used since Charles E. Osgood and his collaborators.
 
4
Unfortunately, in social science the term “data mining” has quite a negative connotation, since it is understood as lacking in theoretical understanding and symptomatic of so-called “barefoot empiricism,” akin to “a fishing expedition.” CSS assigns high priority to theory—the basis of understanding—while recognizing the scientific value of inductive data mining.
 
5
Besides its scientific value in CSS research, the popular media also uses basic forms of vocabulary analysis when counting the frequency of words used by politicians, such as in inaugural addresses or similar major speeches. The value of such anecdotal uses is rather limited, sometimes even misleading, since speechwriters and communication experts are well-versed in scientific principles of applied linguistics and human information processing, including sophisticated understanding of semantic differentials and other affect control, marketing, and propaganda devices.
 
6
The operationalization of the NRR in terms of two standard deviations from the process mean was suggested to political scientist and events data pioneer Edward E. Azar [1938–1991] by the mathematician Anatol Rapoport [1911–2007]. It was first applied to international relations events data series to study protracted conflicts in the Middle East. Azar was founder and director of the Conflict and Peace Data Bank (COPDAB), founded at the University of North Carolina at Chapel Hill in the 1970s and moved to the Centre for International Development and Conflict Management (CIDCM) of the University of Maryland at College Park in the 1980s.
 
Literatur
Zurück zum Zitat N. Agarwal, H. Liu, Modeling and Data Mining in Blogosphere (Morgan & Claypool, New York, 2009). Available free online N. Agarwal, H. Liu, Modeling and Data Mining in Blogosphere (Morgan & Claypool, New York, 2009). Available free online
Zurück zum Zitat E.E. Azar, S. Lerner, The use of semantic dimensions in the scaling of international events. Int. Interact. 7(4), 361–378 (1981) CrossRef E.E. Azar, S. Lerner, The use of semantic dimensions in the scaling of international events. Int. Interact. 7(4), 361–378 (1981) CrossRef
Zurück zum Zitat R. Feldman, J. Sanger, The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data (Cambridge University Press, Cambridge, 2007) R. Feldman, J. Sanger, The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data (Cambridge University Press, Cambridge, 2007)
Zurück zum Zitat M.D. Fischer, S.M. Lyon, D. Sosna, Harmonizing diversity: tuning anthropological research to complexity. Soc. Sci. Comput. Rev. 31(1), 3–15 (2013) CrossRef M.D. Fischer, S.M. Lyon, D. Sosna, Harmonizing diversity: tuning anthropological research to complexity. Soc. Sci. Comput. Rev. 31(1), 3–15 (2013) CrossRef
Zurück zum Zitat D.J. Gerner, P.A. Schrodt, Ö. Yilmaz, R. Abu-Jabr, The Creation of CAMEO (Conflict and Mediation Event Observations): An Event Data Framework for a Post Cold War World. Paper presented at the annual meeting of the American Political Science Association, San Francisco (2002) D.J. Gerner, P.A. Schrodt, Ö. Yilmaz, R. Abu-Jabr, The Creation of CAMEO (Conflict and Mediation Event Observations): An Event Data Framework for a Post Cold War World. Paper presented at the annual meeting of the American Political Science Association, San Francisco (2002)
Zurück zum Zitat M. Gorman, Simulating Science (Indiana University Press, Bloomington, 1992) M. Gorman, Simulating Science (Indiana University Press, Bloomington, 1992)
Zurück zum Zitat L.A. Grenoble, L.J. Whaley, Endangered Languages: Current Issues and Future Prospects (Cambridge University Press, Cambridge, 1998) CrossRef L.A. Grenoble, L.J. Whaley, Endangered Languages: Current Issues and Future Prospects (Cambridge University Press, Cambridge, 1998) CrossRef
Zurück zum Zitat T. Hermann, H. Ritter, Listen to your data: model-based sonification for data analysis, in Proceedings of the ISIMADE'99, Baden-Baden, Germany (1999) T. Hermann, H. Ritter, Listen to your data: model-based sonification for data analysis, in Proceedings of the ISIMADE'99, Baden-Baden, Germany (1999)
Zurück zum Zitat O.R. Holsti, Content Analysis for the Social Sciences and Humanities (Addison-Wesley, Reading, 1969) O.R. Holsti, Content Analysis for the Social Sciences and Humanities (Addison-Wesley, Reading, 1969)
Zurück zum Zitat D.J. Hopkins, G. King, A method for automated nonparametric content analysis for social science. Am. J. Polit. Sci. 54(1), 229–247 (2010) CrossRef D.J. Hopkins, G. King, A method for automated nonparametric content analysis for social science. Am. J. Polit. Sci. 54(1), 229–247 (2010) CrossRef
Zurück zum Zitat W. Hsu, M.L. Lee, J. Wang, Temporal and Spatio-Temporal Data Mining (IGI Publishing, New York, 2008) W. Hsu, M.L. Lee, J. Wang, Temporal and Spatio-Temporal Data Mining (IGI Publishing, New York, 2008)
Zurück zum Zitat G. King, W. Lowe, An automated information extraction tool for international conflict data with performance as good as human coders: a rare events evaluation design. Int. Organ. 57, 617–642 (2003) CrossRef G. King, W. Lowe, An automated information extraction tool for international conflict data with performance as good as human coders: a rare events evaluation design. Int. Organ. 57, 617–642 (2003) CrossRef
Zurück zum Zitat K. Krippendorf, Content Analysis: An Introduction to Its Methodology (Sage, Thousand Oaks, 2004) K. Krippendorf, Content Analysis: An Introduction to Its Methodology (Sage, Thousand Oaks, 2004)
Zurück zum Zitat K. Krippendorf, M.A. Bock (eds.), The Content Analysis Reader (Sage, Thousand Oaks, 2008) K. Krippendorf, M.A. Bock (eds.), The Content Analysis Reader (Sage, Thousand Oaks, 2008)
Zurück zum Zitat P. Langley, Data-driven discovery of physical laws. Cogn. Sci. 5(1), 31–54 (1981) CrossRef P. Langley, Data-driven discovery of physical laws. Cogn. Sci. 5(1), 31–54 (1981) CrossRef
Zurück zum Zitat P. Langley, Heuristics for scientific discovery: the legacy of Herbert Simon, in Models of a Man: Essays in Memory of Herbert A. Simon, ed. by M. Augier, J.G. March (MIT Press, Cambridge, 2004), pp. 461–471 P. Langley, Heuristics for scientific discovery: the legacy of Herbert Simon, in Models of a Man: Essays in Memory of Herbert A. Simon, ed. by M. Augier, J.G. March (MIT Press, Cambridge, 2004), pp. 461–471
Zurück zum Zitat D. Lazer, A. Pentland, L. Adamic, S. Aral, A.-L. Barabasi, D. Brewer, M. Van Alstyne, Computational social science. Science 323(5915), 721–723 (2009) CrossRef D. Lazer, A. Pentland, L. Adamic, S. Aral, A.-L. Barabasi, D. Brewer, M. Van Alstyne, Computational social science. Science 323(5915), 721–723 (2009) CrossRef
Zurück zum Zitat K. Leetaru, Data Mining Methods for the Content Analyst: An Introduction to the Computational Analysis of Content (Routledge, London, 2011) K. Leetaru, Data Mining Methods for the Content Analyst: An Introduction to the Computational Analysis of Content (Routledge, London, 2011)
Zurück zum Zitat B.L. Monroe, P.A. Schrodt (eds.), in Political Analysis (2008). Special Issue: The Statistical Analysis of Political Text 16(4), Autumn B.L. Monroe, P.A. Schrodt (eds.), in Political Analysis (2008). Special Issue: The Statistical Analysis of Political Text 16(4), Autumn
Zurück zum Zitat I.-C. Moon, K.M. Carley, Modeling and simulation of terrorist networks in social and geospatial dimensions. IEEE Intell. Syst. 22(5), 40–49 (2007). Special Issue on Social Computing CrossRef I.-C. Moon, K.M. Carley, Modeling and simulation of terrorist networks in social and geospatial dimensions. IEEE Intell. Syst. 22(5), 40–49 (2007). Special Issue on Social Computing CrossRef
Zurück zum Zitat C.E. Osgood, W.H. May, M.S. Miron, Cross-Cultural Universals of Affective Meaning (University of Illinois Press, Urbana, 1975) C.E. Osgood, W.H. May, M.S. Miron, Cross-Cultural Universals of Affective Meaning (University of Illinois Press, Urbana, 1975)
Zurück zum Zitat R. Popping, Computer-Assisted Text Analysis (Sage, Thousand Oaks, 2000) R. Popping, Computer-Assisted Text Analysis (Sage, Thousand Oaks, 2000)
Zurück zum Zitat P.A. Schrodt, Short term prediction of international events using a Holland classifier. Math. Comput. Model. 12, 589–600 (1989) CrossRefMATH P.A. Schrodt, Short term prediction of international events using a Holland classifier. Math. Comput. Model. 12, 589–600 (1989) CrossRefMATH
Zurück zum Zitat P.A. Schrodt, Pattern recognition of international crises using hidden Markov models, in Political Complexity, ed. by D. Richards (University of Michigan Press, Ann Arbor, 2000) P.A. Schrodt, Pattern recognition of international crises using hidden Markov models, in Political Complexity, ed. by D. Richards (University of Michigan Press, Ann Arbor, 2000)
Zurück zum Zitat H.A. Simon, Autobiography, in Nobel Lectures, Economics 1969–1980, ed. by A. Lindbeck (World Scientific, Singapore, 1992) H.A. Simon, Autobiography, in Nobel Lectures, Economics 1969–1980, ed. by A. Lindbeck (World Scientific, Singapore, 1992)
Zurück zum Zitat P.J. Stone, R.F. Bales, J.Z. Namenwirth, D.M. Ogilvie, The general inquirer: a computer system for content analysis and retrieval based on the sentence as a unit of information. Behav. Sci. 7(4), 484–498 (1962) CrossRef P.J. Stone, R.F. Bales, J.Z. Namenwirth, D.M. Ogilvie, The general inquirer: a computer system for content analysis and retrieval based on the sentence as a unit of information. Behav. Sci. 7(4), 484–498 (1962) CrossRef
Zurück zum Zitat L. Tang, H. Liu, Community Detection and Mining in Social Media (Morgan & Claypool, New York, 2010). Available free online L. Tang, H. Liu, Community Detection and Mining in Social Media (Morgan & Claypool, New York, 2010). Available free online
Zurück zum Zitat J.J. Thomas, K.A. Cook (eds.), Illuminating the Path (IEEE Comput. Soc., Los Alamitos, 2005) J.J. Thomas, K.A. Cook (eds.), Illuminating the Path (IEEE Comput. Soc., Los Alamitos, 2005)
Zurück zum Zitat C. Williford, C. Henry, A. Friedlander (eds.), One Culture: Computationally Intensive Research in the Humanities and Social Sciences—A Report on the Experiences of First Respondents to the Digging into Data Challenge (Council on Library and Information Resources, Washington, 2012) C. Williford, C. Henry, A. Friedlander (eds.), One Culture: Computationally Intensive Research in the Humanities and Social Sciences—A Report on the Experiences of First Respondents to the Digging into Data Challenge (Council on Library and Information Resources, Washington, 2012)
Zurück zum Zitat T. Zhang, C.-C.J. Kuo, Audio content analysis for online audiovisual data segmentation and classification. IEEE Trans. Speech Audio Process. 9(4), 441–457 (2001) CrossRef T. Zhang, C.-C.J. Kuo, Audio content analysis for online audiovisual data segmentation and classification. IEEE Trans. Speech Audio Process. 9(4), 441–457 (2001) CrossRef
Metadaten
Titel
Automated Information Extraction
verfasst von
Claudio Cioffi-Revilla
Copyright-Jahr
2014
Verlag
Springer London
DOI
https://doi.org/10.1007/978-1-4471-5661-1_3