Hostname: page-component-8448b6f56d-42gr6 Total loading time: 0 Render date: 2024-04-23T05:25:08.990Z Has data issue: false hasContentIssue false

Model-based Clustering and Typologies in the Social Sciences

Published online by Cambridge University Press:  04 January 2017

John S. Ahlquist*
Affiliation:
Department of Political Science, University of Wisconsin, Madison, 1050 Bascom Mall, Madison, WI 53706, and United States Studies Centre at the University of Sydney, Australia
Christian Breunig
Affiliation:
Department of Political Science, University of Toronto, 100 St. George Street, Toronto, Ontario M5S 3G3, Canada. e-mail: c.breunig@utoronto.ca
*
e-mail: jahlquist@wisc.edu (corresponding author)

Abstract

Social scientists spend considerable energy constructing typologies and discussing their roles in measurement. Less discussed is the role of typologies in evaluating and revising theoretical arguments. We argue that unsupervised machine learning tools can be profitably applied to the development and testing of theory-based typologies. We review recent advances in mixture models as applied to cluster analysis and argue that these tools are particularly important in the social sciences where it is common to claim that high-dimensional objects group together in meaningful clusters. Model-based clustering (MBC) grounds analysis in probability theory, permitting the evaluation of uncertainty and application of information-based model selection tools. We show that the MBC approach forces analysts to consider dimensionality problems that more traditional clustering tools obscure. We apply MBC to the “varieties of capitalism,” a typology receiving significant attention in political science and economic sociology. We find weak and conflicting evidence for the theory's expected grouping. We therefore caution against the current practice of including typology-derived dummy variables in regression and case-comparison research designs.

Type
Research Article
Copyright
Copyright © The Author 2011. Published by Oxford University Press on behalf of the Society for Political Methodology 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Edited by Jonathan N. Katz

Authors' note: Previous versions of this paper were presented at the 2008 meetings of the Society for Political Methodology, the 2009 10th Anniversary Conference for the University of Washington Center for Statistics and the Social Sciences, and colloquia at Duke, Purdue, and MPIfG. We thank Justin Grimmer, Martin Hoepner, Ryan Moore, Kevin Quinn, Adrian Raftery, and Mike Ward for comments and helpful conversation. Carlisle Rainey provided research assistance. Replication materials are available at the Political Analysis dataverse, Ahlquist's dataverse (http://dvn.iq.harvard.edu/dvn/dv/ahlquist), and Breunig's Web site http://individual.utoronto.ca/cbreunig/.

References

Ahlquist, John S., and Breunig, Christian. 2009. Country clustering in comparative political economy. Max Planck Institut für Gesellschaftsforschung Discussion Papers (09/5), Cologne, Germany.Google Scholar
Amable, Bruno. 2003. The diversity of modern capitalism. New York: Oxford University Press.Google Scholar
Banfield, J. D., and Raftery, Adrian E. 1993. Model-based Gaussian and non-Gaussian clustering. Biometrics 48: 803–21.Google Scholar
Baudry, Jean-Patrick, Raftery, Adrian E., Celeux, Gilles, Lo, Kenneth, and Gottardo, Raphael. 2010. Combining mixture components for clustering. Journal of Computational and Graphical Statistics 19: 332–53.Google Scholar
Bensmail, H., Celeux, G., Raftery, Adrian E., and Robert, C. P. 1997. Inference in model-based cluster analysis. Statistics and Computing 7: 110.Google Scholar
Bensmail, H., and Meulman, J. J. 2003. Model-based clustering with noise: Bayesian inference and estimation. Journal of Classification 20: 4976.Google Scholar
Biernacki, C., Celeux, Gilles, and Govaert, G. 2000. Assessing a mixture model for clustering with integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence 22: 719–25.CrossRefGoogle Scholar
Blaydes, Lisa, and Linzer, Drew A. 2008. The political economy of women's support for fundamentalist Islam. World Politics 60: 576609.Google Scholar
Bueno de Mesquita, Bruce, Smith, Alastair, Siverson, Randolph M., and Morrow, James D. 2003. The logic of political survival. Cambridge, MA: MIT Press.Google Scholar
Campbell, John L., and Pedersen, Ove K. 2007. The varieties of capitalism and hybrid success: Denmark in the global economy. Comparative Political Studies 40: 307–32.Google Scholar
Celeux, Gilles. 2007. Mixture models for classification. In Advances in data analysis: Proceedings of the 30th Annual Conference of the Gesellschaft fr Klassifikation e. V., Freie Universitt Berlin, March 810, 2006, ed. Decker, Reinhold and Lenz, Hans J., 314. New York: Springer.CrossRefGoogle Scholar
Celeux, G., and Govaert, G. 1992. A classification EM algorithm for clustering and two stochastic versions. Computational Statistics and Data Analysis 2: 7382.Google Scholar
Celeux, Gilles, and Govaert, Gerard. 1993. Comparison of the mixture and the classification maximum likelihood in cluster analysis. Journal of Statistical Computation and Simulation 47: 127–46.Google Scholar
Chang, Wei-Chien. 1983. On using principal components before separating a mixture of two multivariate normal distributions. Applied Statistics 32: 267–75.Google Scholar
Collier, David, Laporte, Jody, and Seawright, Jason. 2008. Typologies: Forming concepts and creating categorical variables. In The Oxford handbook of political methodology, ed. Box-Steffensmeier, Jeanette M., Brady, Henry E., and Collier, David, 152–73. Oxford: Oxford University Press.Google Scholar
Collier, David, and Mahoney, James E. 1993. Conceptual stretching revisited: Adapting categories in comparative analysis. American Political Science Review 87: 845–55.Google Scholar
Cox, Trevor F., and Cox, Michael A. A. 2001. Multidimensional scaling. New York: Chapman & Hall.Google Scholar
Culpepper, Pepper D. 2007. Small states and skill specificity: Austria, Switzerland, and interemployer cleavages in coordinated capitalism. Comparative Political Studies 40: 611–37.Google Scholar
Dasgupta, Abhijit, and Raftery, Adrian E. 1998. Detecting features in spatial point processes with clutter via model-based clustering. Journal of the American Statistical Association 93: 294302.Google Scholar
Dean, Nema, and Raftery, Adrian E. 2010. Latent class analysis variable selection. Annals of the Institute of Statistical Mathematics 62: 1135.Google Scholar
Dempster, A. P., Laird, N. M., and Rubin, Donald B. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39: 138.Google Scholar
Elman, Colin. 2005. Explanatory typologies in qualitative studies of international politics. International Organization 59: 293326.Google Scholar
Esping-Andersen, Gosta. 1990. The three worlds of welfare capitalism. Princeton, NJ: Princeton University Press.Google Scholar
Estevez-Abe, Margarita, Iversen, Torben, and Soskice, David. 2001. Social protection and the formation of skills: A reinterpretation of the welfare state. In Varieties of capitalism, ed. Hall, Peter A., and Soskice, David, 145–83. New York: Oxford University Press.Google Scholar
Fraley, Chris, and Raftery, Adrian E. 1998. How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer Journal 41: 578–88.Google Scholar
Fraley, Chris, and Raftery, Adrian E. 2002. Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association 97: 611–31.Google Scholar
Fraley, Chris, and Raftery, Adrian E. 2005. Bayesian regularization for normal mixture estimation and model-based clustering. Technical report No. 486. Department of Statistics, University of Washington.Google Scholar
Fraley, Chris, and Raftery, Adrian E. 2007. MCLUST version 3 for R: Normal mixture modeling and model-based clustering. Technical report No. 504. Department of Statistics, University of Washington.Google Scholar
Ganesalingam, S. 1989. Classification and mixture approaches to clustering via maximum likelihood. Journal of the Royal Statistical Society Series C Applied Statistics 38: 455–66.Google Scholar
Geddes, Barbara. 2003. Paradigms and sand castles. Ann Arbor, MI: University of Michigan Press.Google Scholar
Gelman, Andrew, and Rubin, Donald B. 1995. Avoiding model selection in Bayesian social research. Sociological Methodology 25: 165–73.Google Scholar
Grimmer, Justin. 2010. A Bayesian hierarchical topic model for political texts: Measuring expressed agendas in senate press releases. Political Analysis 18 (1): 135.Google Scholar
Grimmer, Justin and King, Gary. 2011. General purpose computer-assisted clustering and conceptualization. Proceedings of the National Academy of Sciences 108: 2643–50.Google Scholar
Hall, Peter A., and Gingerich, Daniel W. 2009. Varieties of capitalism and institutional complementarities: An empirical analysis. British Journal of Political Science 39: 449–82.Google Scholar
Hall, Peter A., and Soskice, David, eds. 2001. Varieties of capitalism. New York: Oxford University Press.Google Scholar
Hamann, Kerstin, and Kelly, John. 2008. Varieties of capitalism and industrial relations. In The Sage handbook of industrial relations, eds. Blyton, Paul, Bacon, Nicolas, Fiorito, Jack, and Heery, Edmund, 129–48. London: Sage.Google Scholar
Hicks, Alexander, and Kenworthy, Lane. 2003. Varieties of welfare capitalism. Socio-Economic Review 1: 2761.Google Scholar
Hillard, D., Purpura, S., and Wilkerson, J. 2008. Computer-assisted topic classification for mixed-methods social science research. Journal of Information Technology and Politics 4(4): 3146.Google Scholar
Hubert, Lawrence, and Arabie, Phipps. 1985. Comparing partitions. Journal of Classification 2(1): 193218.Google Scholar
Huo, Jingjing, and Feng, Hui. 2010. The political economy of technological innovation and employment. Comparative Political Studies 43: 329–52.Google Scholar
Iversen, Torben. 2005. Capitalism, democracy, and welfare. New York: Cambridge University Press.Google Scholar
Kass, R. E., and Raftery, Adrian E. 1995. Bayes factors. Journal of the American Statistical Association 90: 773–95.Google Scholar
Kaufman, Leonard, and Rousseeuw, Peter J. 2005. Finding groups in data: An introduction to cluster analysis. New York: Wiley-Interscience.Google Scholar
Keribin, C. 2000. Consistent estimation of the order of mixture models. Sankhya 62: 4966.Google Scholar
Kitschelt, Herbert, Lange, Peter, Marks, Gary, and Stephens, John D. 1999. Convergence and divergence in advanced capitalist democracies. In Continuity and change in contemporary capitalism, eds. Kitschelt, Herbert, Lange, Peter, Marks, Gary, and Stephens, John D., 427–60. New York: Cambridge University Press.CrossRefGoogle Scholar
Klebanov, Beata Beigman, Diermeier, Daniel, and Beigman, Eyal. 2008. Lexical cohesion analysis of political speech. Political Analysis 16: 447–63.Google Scholar
La Porta, R., Lopez-de Silanes, F., and Shleifer, A. 1999. Corporate ownership around the world. Journal of Finance 54(2): 471517.Google Scholar
La Porta, R., Lopez-de Silanes, F., Shleifer, A., and Vishny, R. W. 1998. Law and finance. Journal of Political Economy 106: 1113–55.CrossRefGoogle Scholar
Layard, P., Jackman, R., and Nickell, S. 1991. Unemployment. New York: Oxford University Press.Google Scholar
Lazarsfeld, P. F., and Henry, N. W. 1968. Latent structure analysis. Boston, MA: Houghton Mifflin.Google Scholar
LeDuc, L., Niemi, R. G., and Norris, P. 1996. Comparing democracies: Elections and voting in global perspective. Thousand Oaks, CA: Sage Publications.Google Scholar
Lijphart, Arend. 1999. Patterns of democracy: Government forms & performance in thirty-six countries. New Haven, CT: Yale University Press.Google Scholar
Linzer, Drew A., and Lewis, Jeffry B. 2011. poLCA: Polytonomous latent class analysis. Journal of Statistical Software 42(10): 129.Google Scholar
Marshall, Monty G., Jaggers, Keith, and Gurr, Ted Robert. 2004. Polity IV. Technical report. Center for Systemic Peace, University of Maryland.Google Scholar
McCombs, Maxwell. 2004. Setting the agenda. Cambridge: Polity.Google Scholar
McCutcheon, A. L. 1987. Latent class analysis. Thousand Oaks, CA: Sage.Google Scholar
Milligan, Glenn W. 1980. An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika 45: 325–42.Google Scholar
Milligan, Glenn W. 1981. A review of Monte Carlo tests of cluster analysis. Multivariate Behavioral Research 16: 379407.Google Scholar
Monroe, Burt L., Colaresi, Michael P., and Quinn, Kevin M. 2008. Fightin' words: Lexical feature selection and evaluation for identifying the content of political conflict. Political Analysis 16: 372403.CrossRefGoogle Scholar
Nestor, Stilpon and Thompson, John K. 2001. Corporate governance patterns in OECD countries: Is convergence under way? In Corporate governance in Asia: A comparative perspective, 1942. Paris: OECD.Google Scholar
Obinger, Herbert, and Wagschal, Uwe. 2001. Families of nations and public policy. West European Politics 24(1): 99114.Google Scholar
OECD. 1997. Employment outlook. Paris: OECD.Google Scholar
Oliver, Rebecca. 2008. Diverging developments in wage inequality? Which institutions matter? Comparative Political Studies 41(12): 1551–82.Google Scholar
Pemstein, Daniel, Meserve, Stephen A., and Melton, James. 2010. Democratic compromise: A latent variable analysis often measures of regime type. Political Analysis 18: 426–49.Google Scholar
Quinn, Kevin, Monroe, Burt L., Colaresi, Michael, Crespin, Michael, and Radev, Dragomir R. 2010. How to analyze political attention with minimal assumptions and costs. American Journal of Political Science 54(1): 209–28.Google Scholar
Raftery, Adrian E., and Dean, Nema. 2006. Variable selection for model-based clustering. Journal of the American Statistical Association 101(473): 168–78.Google Scholar
R Core Development Team. 2007. R 2.5.1—A language and environment.Google Scholar
Ringe, Nils. 2006. Policy preference formation in legislative politics: Structures, actors, and focal points. American Journal of Political Science 49: 731–45.Google Scholar
Rueda, David, and Pontusson, Jonas. 2000. Wage inequality and varieties of capitalism. World Politics 52: 350–83.Google Scholar
Saint-Arnaud, Sebastien, and Bernard, Paul. 2003. Convergence or resilience? A hierarchical cluster analysis of the welfare regimes in advanced countries. Current Sociology 51: 499527.Google Scholar
Sartori, Giovanni. 1970. Concept misinformation in comparative research. American Political Science Review 64: 1033–53.Google Scholar
Scruggs, Lyle, and Allan, James. 2006. Welfare state decommodification in eighteen OECD countries: A replication and revision. Journal of European Social Policy 16 (1): 5572.Google Scholar
Scruggs, Lyle, and Allan, James. 2008. Social stratification and welfare regimes for the 21st century: Revisiting the three worlds of welfare capitalism. World Politics 60(4): 642–64.Google Scholar
Steinley, Douglas. 2004. Properties of the Huber-Arabie adjusted rand index. Psychological Methods 9: 386–96.Google Scholar
Sulkin, Tracy. 2005. Issue politics in congress. New York: Cambridge University Press.Google Scholar
Taylor, Mark Zachary. 2006. Empirical evidence against varieties of capitalism's theory of technological innovation. International Organization 58: 601–31.Google Scholar
Tepe, Markus, Gottschall, Karin, and Kittel, Bernhard. 2010. A structural fit between states and markets? Public administration regimes and market economy models in the OECD. Socio-Economic Review 8: 653–84.Google Scholar
Thatcher, Mark. 2004. Varieties of capitalism in an internationalized world: Domestic institutional change in European telecommunications. Comparative Political Studies 37: 751–80.Google Scholar
Treier, Shawn, and Jackman, Simon. 2008. Democracy as latent variable. American Journal of Political Science 52(1): 201–17.Google Scholar
Vanhanen, Tatu. 2003. Democratization and power resources. 18502000.Google Scholar
Venables, W. N., and Ripley, B. D. 2002. Modern applied statistics with S. 4th ed. New York: Springer.Google Scholar
Ward, J. H. 1963. Hierarchical groupings to optimize an objective function. Journal of the American Statistical Association 58: 234–44.Google Scholar
Weakleim, David L. 1999. A critique of the Bayesian information criterion for model selection. Sociological Methods and Research 27: 359–97.Google Scholar
Weber, Max. 1949. Objectivity in social science and social policy. In The methodology of the social sciences, eds. Shils, Edward A. and Finch, Henry A., 49112. New York: The Free Press.Google Scholar
Zhong, Shi, and Ghosh, Joydeep. 2003. A unified framework for model-based clustering. Journal of Machine Learning Research 4: 1001–37.Google Scholar