Computer-Assisted Text Analysis for Comparative Politics

Christopher Lucas; Richard A. Nielsen; Margaret E. Roberts; Brandon M. Stewart; Alex Storer; Dustin Tingley

doi:10.1093/pan/mpu019

Computer-Assisted Text Analysis for Comparative Politics

Published online by Cambridge University Press: 04 January 2017

Christopher Lucas ,

Richard A. Nielsen ,

Margaret E. Roberts ,

Brandon M. Stewart ,

Alex Storer and

Dustin Tingley

Show author details

Christopher Lucas: Affiliation:
Department of Government and Institute for Quantitative Social Science, Harvard University, 1737 Cambridge St., Cambridge MA 02138, USA, e-mail: clucas@fas.harvard.edu
Richard A. Nielsen: Affiliation:
Department of Political Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue Cambridge, MA 02139, USA, e-mail: rnielsen@mit.edu
Margaret E. Roberts: Affiliation:
Department of Political Science, University of California, San Diego, 9500 Gilman Drive, #0521 La Jolla, CA 92093, USA, e-mail: meroberts@ucsd.edu
Brandon M. Stewart: Affiliation:
Department of Government and Institute for Quantitative Social Science, Harvard University, 1737 Cambridge Street, Cambridge, MA 02138, USA, e-mail: bstewart@fas.harvard.edu
Alex Storer: Affiliation:
Graduate School of Business, Stanford University, 655 Knight Way, Stanford, CA 94305, USA, e-mail: astorer@stanford.edu
Dustin Tingley*: Affiliation:
Department of Government and Institute for Quantitative Social Science, Harvard University, 1737 Cambridge St., Cambridge, MA 02138, USA
*: e-mail: dtingley@gov.harvard.edu (corresponding author)

Article contents

Abstract
Footnotes
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Recent advances in research tools for the systematic analysis of textual data are enabling exciting new research throughout the social sciences. For comparative politics, scholars who are often interested in non-English and possibly multilingual textual datasets, these advances may be difficult to access. This article discusses practical issues that arise in the processing, management, translation, and analysis of textual data with a particular focus on how procedures differ across languages. These procedures are combined in two applied examples of automated text analysis using the recently introduced Structural Topic Model. We also show how the model can be used to analyze data that have been translated into a single language via machine translation tools. All the methods we describe here are implemented in open-source software packages available from the authors.

Type: Articles
Information: Political Analysis , Volume 23 , Issue 2 , Spring 2015 , pp. 254 - 277

DOI: https://doi.org/10.1093/pan/mpu019 [Opens in a new window]
Copyright: Copyright © The Author 2015. Published by Oxford University Press on behalf of the Society for Political Methodology

Footnotes

Authors' note: Our thanks to Sam Brotherton and Jetson Leder-Luis for research assistance and Amy Catilinac for discussion about text analyses in comparative politics. We also thank Christopher Blattman, Dan Corstange, Macartan Humphreys, Amaney Jamal, Gary King, Helen Milner, Tamar Mitts, Brendan O’Connor, Arthur Spirling, and the Columbia University Comparative Politics Workshop for comments. Our software discussed in this article is open source and available.

References

Alfonseca, E., Bilac, S., and Pharies, S. 2008. Decompounding query keywords from compounding languages. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, 253–256. Association for Computational Linguistics.Google Scholar

Barberá, P. 2012. Birds of the same feather tweet together: Bayesian ideal point estimation using twitter data. In APSA 2012 Annual Meeting Paper.Google Scholar

Baturo, A., and Mikhaylov, S. 2013. Life of Brian revisited: Assessing informational and non-informational leadership tools. Political Science Research and Methods 1(01): 139–57.Google Scholar

Blei, D. M. 2012. Probabilistic topic models. Communications of the ACM 55(4): 77–84.Google Scholar

Blei, D. M., and Lafferty, J. D. 2007. A correlated topic model of science. Annals of Applied Statistics 1(1): 17–35.Google Scholar

Boyd-Graber, J., and Blei, D. M. 2009. Multilingual topic models for unaligned text. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 75–82. AUAI Press.Google Scholar

Boyd-Graber, J., and Resnik, P. 2010. Holistic sentiment analysis across languages: Multilingual supervised latent dirichlet allocation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 45–55. Association for Computational Linguistics.Google Scholar

Brachman, J. 2009. Global Jihadism. New York: Routledge.Google Scholar

Brady, H. E., and Collier, D. 2010. Rethinking social inquiry: Diverse tools, shared standards. Lanham, MD: Rowman & Littlefield.Google Scholar

Brown, P. F., Cocke, J., Pietra, S. A. D., Pietra, V. J. D., Jelinek, F., Lafferty, J. D., Mercer, R. L., and Roossin, P. S. 1990. A statistical approach to machine translation. Computational Linguistics 16(2): 79–85.Google Scholar

Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., and Mercer, R. L. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2): 263–311.Google Scholar

Budge, I., Hans-Dieter, K., Andrea, V., Judith, B., and Eric, T. 2001. Mapping Policy Preferences: Estimates for Parties, Electors, and Governments 1945–1998. Oxford: Oxford University Press, Oxford, UK.Google Scholar

Campbell, R. S., and Pennebaker, J. W. 2003. The secret life of pronouns flexibility in writing style and physical health. Psychological Science 14(1): 60–65.Google Scholar

Catalinac, A. 2014. Pork to policy: The Rise of National Security in Elections in Japan, unpublished manuscript.Google Scholar

Cheng, K.-S., Young, G. H., and Wong, K.-F. 1999. A study on word-based and integral-bit Chinese text compression algorithms. Journal of the American Society for Information Science 50(3): 218–28.Google Scholar

Chiozza, G. 2009. Anti-Americanism and the American world order. Baltimore: Johns Hopkins University Press.Google Scholar

Coscia, M., and Rios, V. 2012. Knowing where and how criminal organizations operate using web content. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 1412–1421. ACM.Google Scholar

Eggers, A., and Spirling, A. 2011. Partisan convergence in executive-legislative interactions modeling debates in the House of Commons, 1832–1915. Unpublished manuscript.Google Scholar

Farrell, H., and Finnemore, M. 2013. The end of hypocrisy: American foreign policy in the age of leaks. Foreign Affairs 92:22.Google Scholar

Feinerer, I., Hornik, K., and Meyer, D. 2008. Text mining infrastructure in R. Journal of Statistical Software 25(5): 1–54.Google Scholar

Fokkens, A., Van Erp, M., Postma, M., Pedersen, T., Vossen, P., and Freire, N. 2013. Offspring from reproduction problems: What replication failure teaches us. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1691–1701, Sofia, Bulgaria, August. Association for Computational Linguistics.Google Scholar

George, A., and Bennett, A. 2005. Case studies and theory development in the social sciences. Cambridge, MA: MIT Press.Google Scholar

Griffiths, T. L., and Steyvers, M. 2004. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America 101(Suppl 1): 5228–235.Google Scholar

Grimmer, J. 2010. A Bayesian hierarchical topic model for political texts: Measuring expressed agendas in Senate press releases. Political Analysis 18(1):1.Google Scholar

Grimmer, J., and Stewart, B. M. 2013. Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis 21(3): 267–97.Google Scholar

Habash, N., and Hu, J. 2009. Improving Arabic-Chinese statistical machine translation using English as pivot language. In Proceedings of the Fourth Workshop on Statistical Machine Translation, pp. 173–81. Association for Computational Linguistics.Google Scholar

Harman, D. 1991. How effective is suffixing? JASIS 42(1): 7–15.Google Scholar

Hollink, V., Kamps, J., Monz, C., and De Rijke, M. 2004. Monolingual document retrieval for European languages. Information Retrieval 7(1–2): 33–52.Google Scholar

Hu, Y., Zhai, K., Eidelman, V., and Boyd-Graber, J. 2014. Polylingual tree-based topic models for translation domain adaptation. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers):1166–1176.Google Scholar

Hull, D. A. 1996. Stemming algorithms: A case study for detailed evaluation. JASIS 47(1): 70–84.Google Scholar

Jamal, A., Keohane, R. O., Romney, D., and Tingley, D. n.d. Anti-Americanism or anti-interventionism? Evidence from the Arabic Twitter universe. Perspectives on Politics. Forthcoming.Google Scholar

Katzenstein, P. J., and Keohane, R. O. 2007. Varieties of anti-Americanism: A framework for analysis. In Anti-Americanisms in world politics, eds. Katzenstein, P. J. and Keohane, R. O., 9–38. Ithaca: Cornell University Press.Google Scholar

King, G., Pan, J., and Roberts, M. E. 2013. How censorship in China allows government criticism but silences collective expression. American Political Science Review 107:1–18.Google Scholar

Koehn, P. 2009. Statistical machine translation. Cambridge, UK: Cambridge University Press.Google Scholar

Krovetz, R. J. 1995. Word-sense disambiguation for large text databases PhD thesis, University of Massachusetts, Amherst.Google Scholar

Laver, M., Benoit, K., and Garry, J. 2003. Extracting policy positions from political texts using words as data. American Political Science Review 97(02): 311–31.Google Scholar

Lunde, K. 2009. CJKV information processing. New York, NY: O’Reilly Media, Inc.Google Scholar

Lynch, M. 2007. Anti-Americanism in the Arab world. In Anti-Americanisms in world politics, eds. Katzenstein, P. J. and Keohane, R. O., 196–224. Ithaca: Cornell University Press.Google Scholar

Manning, C. D., Raghavan, P., and Schütze, H. 2008. Introduction to information retrieval, Vol. 1. Cambridge: Cambridge University Press.Google Scholar

McCallum, A. K. 2002. Mallet: A machine learning for language toolkit. Available at http://mallet.cs.umass.edu.Google Scholar

McCants, W. 2006. Militant ideology atlas. Technical report, Combating Terrorism Center, U.S. Military Academy.Google Scholar

Miller, M. C. 2013. Wronged by empire: Post-imperial ideology and foreign policy in India and China. Stanford, CA: Stanford University Press.Google Scholar

Mimno, D., Wallach, H. M., Naradowsky, J., Smith, D. A., and McCallum, A. 2009. Polylingual topic models. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2, 880–889. Association for Computational Linguistics.Google Scholar

Mosteller, F., and Wallace, D. L. 1963. Inference in an authorship problem: A comparative study of discrimination methods applied to the authorship of the disputed Federalist Papers. Journal of the American Statistical Association 58(302): 275–309.Google Scholar

Nielsen, R. 2013. The lonely Jihadist: Weak networks and the radicalization of Muslim clerics. PhD Thesis, Harvard University. Ann Arbor: ProQuest/UMI (Publication No. 3567018).Google Scholar

Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of Association for Computational Linguistics, 311–318. Association for Computational Linguistics.Google Scholar

Paul, M., Yamamoto, H., Sumita, E., and Nakamura, S. 2009. On the importance of pivot language selection for statistical machine translation. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pp. 221–224. Association for Computational Linguistics.Google Scholar

Quinn, K., Monroe, B., Colaresi, M., Crespin, M., and Radev, D. 2010. How to analyze political attention with minimal assumptions and costs. American Journal of Political Science 54(1): 209–228.Google Scholar

Roberts, M. E., Stewart, B. M., and Airoldi, E. 2015. A model of text for experimentation in the social sciences. Unpublished manuscript.Google Scholar

Roberts, M. E., Stewart, B. M., and Tingley, D. 2014. stm: R package for structural topic models. R package version 0.6.21. software package http://structuraltopicmodel.com/.Google Scholar

Roberts, M. E., Stewart, B. M., Tingley, D., and Airoldi, E. M. 2013. The structural topic model and applied social science. Advances in Neural Information Processing Systems Workshop on Topic Models: Computation, Application, and Evaluation.Google Scholar

Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S., Albertson, B., and Rand, D. 2014. Structural topic models for open-ended survey responses. American Journal of Political Science 58(4): 1064–1082.Google Scholar

Rubin, B. 2002. The real roots of Arab anti-Americanism. Foreign Affairs 81(6): 73–85.Google Scholar

Salton, G. 1989. Automatic text processing: The transformation, analysis, and retrieval of information by computer. Boston, MA: Addison-Wesley.Google Scholar

Schonhardt-Bailey, C. 2006. From the Corn Laws to free trade [electronic resource]: Interests, ideas, and institutions in historical perspective. Cambridge, MA: MIT Press.Google Scholar

Schrodt, P. A., and Gerner, D. J. 1994. Validity assessment of a machine-coded event data set for the Middle East, 1982–92. American Journal of Political Science 38(3): 825–854.Google Scholar

Slapin, J. B., and Proksch, S.-O. 2008. A scaling model for estimating time-series party positions from texts. American Journal of Political Science 52(3): 705–722.Google Scholar

Stewart, B. M., and Zhukov, Y. M. 2009. Use of force and civil-military relations in Russia: An automated content analysis. Small Wars & Insurgencies 20(2): 319–343.Google Scholar

Stockmann, D. 2012. Media commercialization and authoritarian rule in China. New York, NY: Cambridge University Press.Google Scholar

Telhami, S. 2002. The stakes: America and the Middle East. Boulder, CO: Westview Press.Google Scholar

Tseng, H., Chang, P., Andrew, G., Jurafsky, D., and Manning, C. 2005. A conditional random field word segmenter for Sighan Bakeoff 2005. In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, Vol. 171. Jeju Island, Korea.Google Scholar

Utiyama, M., and Isahara, H. 2007. A comparison of pivot methods for phrase-based statistical machine translation. In 2007 Proceedings of NAACL/HLT, pp. 484–491.Google Scholar

Van Atteveldt, W., Kleinnijenhuis, J., and Ruigrok, N. 2008. Parsing, semantic networks, and political authority using syntactic analysis to extract semantic relations from Dutch newspaper articles. Political Analysis 16(4): 428–446.Google Scholar

Volkens, A., Lehmann, P., Merz, N., Regel, S., Werner, A., Lacewell, O., and Schultze, H. 2013. The manifesto data collection. In Manifesto Project (MRG/CMP/MARPOR). Berlin: Wissenschaftszentrum Berlin für Sozialforschung (WZB).Google Scholar

Zhao, B., and Xing, E. P. 2006. Bitam: Bilingual topic admixture models for word alignment. In Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 969–76. Association for Computational Linguistics.Google Scholar

Lucas et al. supplementary material

Appendix

PDF 269.2 KB

Article contents

Computer-Assisted Text Analysis for Comparative Politics

Abstract

Footnotes

References

Lucas et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests