Skip to main content
Erschienen in: Information Systems and e-Business Management 3/2014

01.08.2014 | Original Article

On developing indicators with text analytics: exploring concept vectors applied to English and Chinese texts

verfasst von: Steven O. Kimbrough, Christine Chou, Yi-Ting Chen, Hilary Lin

Erschienen in: Information Systems and e-Business Management | Ausgabe 3/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper investigates how high-quality, vocabulary-based classifiers, useful for competitive intelligence, can be found for relatively small corpora of publicly available documents. Two corpora of recent annual reports are examined and compared, one in English and one in Chinese. The paper tests whether vocabularies can predict whether firms are relatively innovative or not, examining vocabularies of both content words and function words. We find that indeed the tested vocabularies do produce effective indicators or classifiers and, surprisingly, that function words are especially effective. The paper also provides extensive conceptual and theoretical background to frame the investigation in the context of an EMCUT problematic, that of mapping entities to classification schemes using information derived from text.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
We are aware of the distinction that is made in parts of the relevant literature between assignment and matching, where matching is assignment when the entities on all sides are players in a game. Such, for example, is the case in the two-sided matching problem (Gale and Shapley 1962). We note, however, that although we might substitute assignment for matching in the name of our problem, the resulting acronym is less felicitous, there is little risk of confusion in keeping the present name, and indeed there will often be an element of strategic interaction in the composition of the relevant texts.
 
Literatur
Zurück zum Zitat Andrew JP, Manget J, Michael D, Taylor A, Zablit H (2010) Innovation 2010: a return to prominence—and the emergence of a new world order. Boston Consulting Group, Boston, MA Andrew JP, Manget J, Michael D, Taylor A, Zablit H (2010) Innovation 2010: a return to prominence—and the emergence of a new world order. Boston Consulting Group, Boston, MA
Zurück zum Zitat Bird S, Klein E, Loper E (2009) Natural language processing with python. O’Reilly, Sebastopol, CA Bird S, Klein E, Loper E (2009) Natural language processing with python. O’Reilly, Sebastopol, CA
Zurück zum Zitat Blair DC, Kimbrough SO (2002) Exemplary documents: a foundation for information retrieval design. Inf Process Manage 38(3):363–379CrossRef Blair DC, Kimbrough SO (2002) Exemplary documents: a foundation for information retrieval design. Inf Process Manage 38(3):363–379CrossRef
Zurück zum Zitat Blair DC, Maron ME (1985) An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun ACM 28(3):289–299CrossRef Blair DC, Maron ME (1985) An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun ACM 28(3):289–299CrossRef
Zurück zum Zitat Bowman EH (1973) Corporate social responsibility and the investor. J Contemp Bus 2:21–43 Bowman EH (1973) Corporate social responsibility and the investor. J Contemp Bus 2:21–43
Zurück zum Zitat Bowman EH (1984) No access content analysis of annual reports for corporate strategy and risk. Interfaces 14(1):61–71CrossRef Bowman EH (1984) No access content analysis of annual reports for corporate strategy and risk. Interfaces 14(1):61–71CrossRef
Zurück zum Zitat Breiman L, Friedman R, Olshen R, Stone C (1984) Classification and regression trees. CRC Press, Boca Raton, FL Breiman L, Friedman R, Olshen R, Stone C (1984) Classification and regression trees. CRC Press, Boca Raton, FL
Zurück zum Zitat Camiciottoli BC (2010) Discourse connectives in genres of financial disclosure: earnings presentations versus earnings releases. J Pragmat 42(3):650–663CrossRef Camiciottoli BC (2010) Discourse connectives in genres of financial disclosure: earnings presentations versus earnings releases. J Pragmat 42(3):650–663CrossRef
Zurück zum Zitat Chen GT, Kimbrough S, Lee T (2004) A note on automated support for product application discovery. In: Dutta A, Goes P (eds) Proceedings of the fourteenth annual workshop on information technologies and systems (WITS2004), Washington, DC, pp 128–133 Chen GT, Kimbrough S, Lee T (2004) A note on automated support for product application discovery. In: Dutta A, Goes P (eds) Proceedings of the fourteenth annual workshop on information technologies and systems (WITS2004), Washington, DC, pp 128–133
Zurück zum Zitat Chou CH, Sinha AP, Zhao H (2008) A text mining approach to Internet abuse detection. Inf Syst e-Bus Manage 6(4):419–439CrossRef Chou CH, Sinha AP, Zhao H (2008) A text mining approach to Internet abuse detection. Inf Syst e-Bus Manage 6(4):419–439CrossRef
Zurück zum Zitat D’Aveni RA, MacMillan IC (1990) Crisis and content of managerial communications: a study of the focus of attention of top managers in surviving and failing firms. Adm Sci Q 35:634–657CrossRef D’Aveni RA, MacMillan IC (1990) Crisis and content of managerial communications: a study of the focus of attention of top managers in surviving and failing firms. Adm Sci Q 35:634–657CrossRef
Zurück zum Zitat den Hertog P, van der Aa W, de Jong MW (2010) Capabilities for managing service innovation: towards a conceptual framework. J Serv Manage 21(4):490–514CrossRef den Hertog P, van der Aa W, de Jong MW (2010) Capabilities for managing service innovation: towards a conceptual framework. J Serv Manage 21(4):490–514CrossRef
Zurück zum Zitat Forsman H, Temel S (2011) Innovation and business performance in small enterprises: an enterprise-level analysis. Int J Innov Manage 15(3):641–665CrossRef Forsman H, Temel S (2011) Innovation and business performance in small enterprises: an enterprise-level analysis. Int J Innov Manage 15(3):641–665CrossRef
Zurück zum Zitat Gale D, Shapley LS (1962) College admissions and the stability of marriage. Am Math Mon 69(1):9–15CrossRef Gale D, Shapley LS (1962) College admissions and the stability of marriage. Am Math Mon 69(1):9–15CrossRef
Zurück zum Zitat Gebauer J, Tang Y, Baimai C (2008) User requirements of mobile technology: results from a content analysis of user reviews. Inf Syst E-Bus Manage 6(4):361–384CrossRef Gebauer J, Tang Y, Baimai C (2008) User requirements of mobile technology: results from a content analysis of user reviews. Inf Syst E-Bus Manage 6(4):361–384CrossRef
Zurück zum Zitat Gottschalk LA (1995) Content analysis of verbal behavior: new findings and clinical applications. Lawrence Erlbaum Associates, Hillsdale, NJ Gottschalk LA (1995) Content analysis of verbal behavior: new findings and clinical applications. Lawrence Erlbaum Associates, Hillsdale, NJ
Zurück zum Zitat Gottschalk LA, Gleser GC (1969) The measurement of psychological states through the content analysis of verbal behavior. University of California Press, Berkeley, CA Gottschalk LA, Gleser GC (1969) The measurement of psychological states through the content analysis of verbal behavior. University of California Press, Berkeley, CA
Zurück zum Zitat Gottschalk LA, Winget CN, Gleser GC (1969) Manual of instructions for using the Gottschalk-Gleser content analysis scales: anxiety, hostility, and social alienation—personal disorganization. University of California Press, Berkeley, CA Gottschalk LA, Winget CN, Gleser GC (1969) Manual of instructions for using the Gottschalk-Gleser content analysis scales: anxiety, hostility, and social alienation—personal disorganization. University of California Press, Berkeley, CA
Zurück zum Zitat He ZL, Wong PK (2004) Exploration versus exploitation: an empirical test of the ambidexterity hypothesis. Organ Sci 15(4):481–494CrossRef He ZL, Wong PK (2004) Exploration versus exploitation: an empirical test of the ambidexterity hypothesis. Organ Sci 15(4):481–494CrossRef
Zurück zum Zitat Kabanoff B, Keegan J (2007) Studying strategic cognition by content analysis of annual reports: a validation involving firm innovation. In: Chapman R (eds) Proceedings of managing our intellectual and social capital: 21st ANZAM 2007 Conference, Sydney, Australia, pp 1–14 Kabanoff B, Keegan J (2007) Studying strategic cognition by content analysis of annual reports: a validation involving firm innovation. In: Chapman R (eds) Proceedings of managing our intellectual and social capital: 21st ANZAM 2007 Conference, Sydney, Australia, pp 1–14
Zurück zum Zitat Kimbrough MR, Kimbrough SO, Murphy P (2011) On using text analytics for event studies. In: Proceedings of the 2011 international conference on artificial intelligence and law (ICAIL 2011) Kimbrough MR, Kimbrough SO, Murphy P (2011) On using text analytics for event studies. In: Proceedings of the 2011 international conference on artificial intelligence and law (ICAIL 2011)
Zurück zum Zitat Kimbrough SO, Lee TY, Oktem U (2012) On deriving indicators from texts. In: Dolk D, Granat J (eds) Modeling for decision support in network-based services, Lecture Notes in Business Information Processing, vol 42. Springer, Berlin, pp 196–225 Kimbrough SO, Lee TY, Oktem U (2012) On deriving indicators from texts. In: Dolk D, Granat J (eds) Modeling for decision support in network-based services, Lecture Notes in Business Information Processing, vol 42. Springer, Berlin, pp 196–225
Zurück zum Zitat Krippendorff K (2004) Content analysis: an introduction to its methodology, 2nd edn. Sage Publications, Thousand Oaks, CA Krippendorff K (2004) Content analysis: an introduction to its methodology, 2nd edn. Sage Publications, Thousand Oaks, CA
Zurück zum Zitat Li H, Cai Z, Graesser AC, Duan Y (2012) A comparative study on English and Chinese word uses with LIWC. In: Proceedings of the twenty-fifth international Florida artificial intelligence research society conference, Association for the Advancement of Artificial Intelligence, pp 238–243 Li H, Cai Z, Graesser AC, Duan Y (2012) A comparative study on English and Chinese word uses with LIWC. In: Proceedings of the twenty-fifth international Florida artificial intelligence research society conference, Association for the Advancement of Artificial Intelligence, pp 238–243
Zurück zum Zitat Loewenstein J, Ocasio W, Jones C (2012) Vocabularies and vocabulary structure: a new approach linking categories, practices, and institutions. The Academy of Management Annals Available online 13 March 2012.doi:10.1080/19416520.2012.660763 Loewenstein J, Ocasio W, Jones C (2012) Vocabularies and vocabulary structure: a new approach linking categories, practices, and institutions. The Academy of Management Annals Available online 13 March 2012.doi:10.​1080/​19416520.​2012.​660763
Zurück zum Zitat Lukas BA, Ferrell O (2000) The effect of market orientation on product innovation. J Acad Mark Sci 28(2):239–247CrossRef Lukas BA, Ferrell O (2000) The effect of market orientation on product innovation. J Acad Mark Sci 28(2):239–247CrossRef
Zurück zum Zitat Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge, UKCrossRef Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge, UKCrossRef
Zurück zum Zitat Mitchell T (1997) Machine learning. Mcgraw-Hill, New York, NY Mitchell T (1997) Machine learning. Mcgraw-Hill, New York, NY
Zurück zum Zitat Morris R (1994) Computerized content analysis in management research: a demonstration of advantages and limitations. J Manage 20(4):903–931 Morris R (1994) Computerized content analysis in management research: a demonstration of advantages and limitations. J Manage 20(4):903–931
Zurück zum Zitat Muller E, Zenker A (2001) Business services as actors of knowledge transformation: the role of kibs in regional and national innovation systems. Res Policy 30(9):1501–1516CrossRef Muller E, Zenker A (2001) Business services as actors of knowledge transformation: the role of kibs in regional and national innovation systems. Res Policy 30(9):1501–1516CrossRef
Zurück zum Zitat Neuendorf KA (2002) The content analysis guidebook. Sage Publications, Thousand Oaks, CA Neuendorf KA (2002) The content analysis guidebook. Sage Publications, Thousand Oaks, CA
Zurück zum Zitat Newman ML, Pennebaker JW, Berry DS, Richards JM (2003) Lying words: predicting deception from linguistic styles. Pers Soc Psychol Bull 29(5):665–675CrossRef Newman ML, Pennebaker JW, Berry DS, Richards JM (2003) Lying words: predicting deception from linguistic styles. Pers Soc Psychol Bull 29(5):665–675CrossRef
Zurück zum Zitat OECD-EUROSTAT (1997) Proposed guidelines for collecting and interpreting technological innovation data. Oslo Manual, 2nd edn. OECD-EUROSTAT, Paris OECD-EUROSTAT (1997) Proposed guidelines for collecting and interpreting technological innovation data. Oslo Manual, 2nd edn. OECD-EUROSTAT, Paris
Zurück zum Zitat Oliveira MD, Murphy P (2009) The leader as the face of a crisis: Philip Morris’ CEO’s speeches during the 1990s. J Public Relat Res 21(4):361–80CrossRef Oliveira MD, Murphy P (2009) The leader as the face of a crisis: Philip Morris’ CEO’s speeches during the 1990s. J Public Relat Res 21(4):361–80CrossRef
Zurück zum Zitat Pennebaker JW (2011) The secret life of pronouns: what our words say about us. Bloomsbury Press, New York, NY Pennebaker JW (2011) The secret life of pronouns: what our words say about us. Bloomsbury Press, New York, NY
Zurück zum Zitat Prester J, Bozac MG (2012) Are innovative organizational concepts enough for fostering innovation? Int J Innov Manage 16(1):1250005 Prester J, Bozac MG (2012) Are innovative organizational concepts enough for fostering innovation? Int J Innov Manage 16(1):1250005
Zurück zum Zitat Raisch S, Birkinshaw J (2008) Organizational ambidexterity: antecedents, outcomes, and moderators. J Manage 34(3):375–409 Raisch S, Birkinshaw J (2008) Organizational ambidexterity: antecedents, outcomes, and moderators. J Manage 34(3):375–409
Zurück zum Zitat Shadish WR, Cook TD, Campbell DT (2001) Experimental and quasi-experimental designs for generalized causal inference, 2nd edn. Wadsworth Publishing, New York, NY Shadish WR, Cook TD, Campbell DT (2001) Experimental and quasi-experimental designs for generalized causal inference, 2nd edn. Wadsworth Publishing, New York, NY
Zurück zum Zitat Tausczik YR, Pennebaker JW (2010) The psychological meaning of words: LIWC and computerized text analysis methods. J Lang Soc Psychol 29(1):24–54CrossRef Tausczik YR, Pennebaker JW (2010) The psychological meaning of words: LIWC and computerized text analysis methods. J Lang Soc Psychol 29(1):24–54CrossRef
Zurück zum Zitat Turney PD, Pantel P (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37:141–188 Turney PD, Pantel P (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37:141–188
Zurück zum Zitat Uotila J, Maula M, Keil T, Zahra SA (2009) Exploration, exploitation, and financial performance: analysis of S&P 500 corporations. Strateg Manage J 30(2):221–231CrossRef Uotila J, Maula M, Keil T, Zahra SA (2009) Exploration, exploitation, and financial performance: analysis of S&P 500 corporations. Strateg Manage J 30(2):221–231CrossRef
Zurück zum Zitat Walter F, Battiston S, Yildirim M, Schweitzer F (2012) Moving recommender systems from on-line commerce to retail stores. Inf Syst E-Bus Manage 10:367–393. doi:10.1007/s10257-011-0170-8 Walter F, Battiston S, Yildirim M, Schweitzer F (2012) Moving recommender systems from on-line commerce to retail stores. Inf Syst E-Bus Manage 10:367–393. doi:10.​1007/​s10257-011-0170-8
Zurück zum Zitat Wang HY, Liao C, Kao CH (2012) A credit assessment mechanism for wireless telecommunication debt collection: an empirical study. Inf Syst E-Bus Manage 1–19. doi:10.1007/s10257-012-0192-x Wang HY, Liao C, Kao CH (2012) A credit assessment mechanism for wireless telecommunication debt collection: an empirical study. Inf Syst E-Bus Manage 1–19. doi:10.​1007/​s10257-012-0192-x
Zurück zum Zitat Weber RP (1990) Basic content analysis, 2nd edn. Sage Publications, Newbury Park, CA Weber RP (1990) Basic content analysis, 2nd edn. Sage Publications, Newbury Park, CA
Zurück zum Zitat Wei CP, Chen YM, Yang CS, Yang C (2010) Understanding what concerns consumers: a semantic approach to product feature extraction from consumer reviews. Inf Syst E-Bus Manage 8:149–167. doi:10.1007/s10257-009-0113-9 Wei CP, Chen YM, Yang CS, Yang C (2010) Understanding what concerns consumers: a semantic approach to product feature extraction from consumer reviews. Inf Syst E-Bus Manage 8:149–167. doi:10.​1007/​s10257-009-0113-9
Zurück zum Zitat Wei CP, Lin YT, Yang CC (2011) Cross-lingual text categorization: conquering language boundaries in globalized environments. Inf Process Manage 47(5):786–804CrossRef Wei CP, Lin YT, Yang CC (2011) Cross-lingual text categorization: conquering language boundaries in globalized environments. Inf Process Manage 47(5):786–804CrossRef
Zurück zum Zitat Yang HC, Hsiao HW, Lee CH (2011) Multilingual document mining and navigation using self-organizing maps. Inf Process Manage 47(5):647–666CrossRef Yang HC, Hsiao HW, Lee CH (2011) Multilingual document mining and navigation using self-organizing maps. Inf Process Manage 47(5):647–666CrossRef
Zurück zum Zitat Yen CC, Lo LK, Chi DJ, Huang YJ (2009) The integrated methodology of classification and regression trees and random forest for information disclosure prediction: consideration of corporate governance indicator. In: Sixth conferences on operations research society of Taiwan. In Chinese.http://edoc.ypu.edu.tw:8080/paper/antai/2009%E5%B9%B4--%E7%AC%AC%E5%85%AD%E5%B1%86%E5%8F%B0%E7%81%A3%E4%BD%9C%E6%A5%AD%E7%A0%94%E7%A9%B6%E5%AD%B8%E6%9C%83%E7%90%86%E8%AB%96%E8%88%87%E5%AF%A6%E5%8B%99%E5%AD%B8%E8%A1%93%E7%A0%94%E8%A8%8E%E6%9C%83/(42)%E6%95%B4%E5%90%88%E5%88%86%E9%A1%9E%E8%BF%B4%E6%AD%B8%E6%A8%B9%E8%88%87%E9%9A%A8%E6%A9%9F%E6%A3%AE%E6%9E%97%E6%96%BC%E8%B3%87%E8%A8%8A%E6%8F%AD%E9%9C%B2%E9%A0%90%E6%B8%AC%E4%B9%8B%E7%A0%94%E7%A9%B6.pdf Yen CC, Lo LK, Chi DJ, Huang YJ (2009) The integrated methodology of classification and regression trees and random forest for information disclosure prediction: consideration of corporate governance indicator. In: Sixth conferences on operations research society of Taiwan. In Chinese.http://​edoc.​ypu.​edu.​tw:​8080/​paper/​antai/​2009%E5%B9%B4--%E7%AC%AC%E5%85%AD%E5%B1%86%E5%8F%B0%E7%81%A3%E4%BD%9C%E6%A5%AD%E7%A0%94%E7%A9%B6%E5%AD%B8%E6%9C%83%E7%90%86%E8%AB%96%E8%88%87%E5%AF%A6%E5%8B%99%E5%AD%B8%E8%A1%93%E7%A0%94%E8%A8%8E%E6%9C%83/​(42)%E6%95%B4%E5%90%88%E5%88%86%E9%A1%9E%E8%BF%B4%E6%AD%B8%E6%A8%B9%E8%88%87%E9%9A%A8%E6%A9%9F%E6%A3%AE%E6%9E%97%E6%96%BC%E8%B3%87%E8%A8%8A%E6%8F%AD%E9%9C%B2%E9%A0%90%E6%B8%AC%E4%B9%8B%E7%A0%94%E7%A9%B6.​pdf
Metadaten
Titel
On developing indicators with text analytics: exploring concept vectors applied to English and Chinese texts
verfasst von
Steven O. Kimbrough
Christine Chou
Yi-Ting Chen
Hilary Lin
Publikationsdatum
01.08.2014
Verlag
Springer Berlin Heidelberg
Erschienen in
Information Systems and e-Business Management / Ausgabe 3/2014
Print ISSN: 1617-9846
Elektronische ISSN: 1617-9854
DOI
https://doi.org/10.1007/s10257-013-0228-x

Weitere Artikel der Ausgabe 3/2014

Information Systems and e-Business Management 3/2014 Zur Ausgabe