Skip to main content
Erschienen in: Marketing Letters 3-4/2008

01.12.2008

Challenges and opportunities in high-dimensional choice data analyses

verfasst von: Prasad Naik, Michel Wedel, Lynd Bacon, Anand Bodapati, Eric Bradlow, Wagner Kamakura, Jeffrey Kreulen, Peter Lenk, David M. Madigan, Alan Montgomery

Erschienen in: Marketing Letters | Ausgabe 3-4/2008

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Modern businesses routinely capture data on millions of observations across subjects, brand SKUs, time periods, predictor variables, and store locations, thereby generating massive high-dimensional datasets. For example, Netflix has choice data on billions of movies selected, user ratings, and geodemographic characteristics. Similar datasets emerge in retailing with potential use of RFIDs, online auctions (e.g., eBay), social networking sites (e.g., mySpace), product reviews (e.g., ePinion), customer relationship marketing, internet commerce, and mobile marketing. We envision massive databases as four-way VAST matrix arrays of Variables × Alternatives × Subjects × Time where at least one dimension is very large. Predictive choice modeling of such massive databases poses novel computational and modeling issues, and the negligence of academic research to address them will result in a disconnect from the marketing practice and an impoverishment of marketing theory. To address these issues, we discuss and identify the challenges and opportunities for both practicing and academic marketers. Thus, we offer an impetus for advancing research in this nascent area and fostering collaboration across scientific disciplines to improve the practice of marketing in information-rich environment.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Allenby, G. M., McCulloch, R., & Rossi, P. E. (1996). The value of purchase history data in target marketing. Marketing Science, 15, 321–340. Allenby, G. M., McCulloch, R., & Rossi, P. E. (1996). The value of purchase history data in target marketing. Marketing Science, 15, 321–340.
Zurück zum Zitat Ansari, A., Essegaier, S., & Kohli, R. (2000). Internet recommendation systems. Journal of Marketing Research, 37, 363–375.CrossRef Ansari, A., Essegaier, S., & Kohli, R. (2000). Internet recommendation systems. Journal of Marketing Research, 37, 363–375.CrossRef
Zurück zum Zitat Bacon, L., & Sridhar, A. (2006). Interactive innovation tools and methods. Annual Convention of the Marketing Research Association, Washington DC, June. Bacon, L., & Sridhar, A. (2006). Interactive innovation tools and methods. Annual Convention of the Marketing Research Association, Washington DC, June.
Zurück zum Zitat Baker, S. (2007). Google and the wisdom of clouds. Business Week, December 13th issue. Baker, S. (2007). Google and the wisdom of clouds. Business Week, December 13th issue.
Zurück zum Zitat Balakrishnan, S., & Madigan, D. (2006). A one-pass sequential Monte Carlo method for Bayesian analysis of massive datasets. Bayesian Analysis, 1(2), 345–362. Balakrishnan, S., & Madigan, D. (2006). A one-pass sequential Monte Carlo method for Bayesian analysis of massive datasets. Bayesian Analysis, 1(2), 345–362.
Zurück zum Zitat Balakrishnan, S., & Madigan, D. (2007). LAPS: LASSO with partition search. Manuscript. Balakrishnan, S., & Madigan, D. (2007). LAPS: LASSO with partition search. Manuscript.
Zurück zum Zitat Balasubramanian, S., Gupta, S., Kamakura, W. A., & Wedel, M. (1998). Modeling large datasets in marketing. Statistica Neerlandica, 52(3), 303–324.CrossRef Balasubramanian, S., Gupta, S., Kamakura, W. A., & Wedel, M. (1998). Modeling large datasets in marketing. Statistica Neerlandica, 52(3), 303–324.CrossRef
Zurück zum Zitat Benzécri, J.-P. (2005). Foreword. In F. Murtaugh (Ed.), Correspondence analysis and data coding with JAVA and R. London, UK: Chapman and Hall. Benzécri, J.-P. (2005). Foreword. In F. Murtaugh (Ed.), Correspondence analysis and data coding with JAVA and R. London, UK: Chapman and Hall.
Zurück zum Zitat Bodapati, A. (2008). Recommendation systems with purchase data. Journal of Marketing Research, 45, 77–93.CrossRef Bodapati, A. (2008). Recommendation systems with purchase data. Journal of Marketing Research, 45, 77–93.CrossRef
Zurück zum Zitat Bradlow, E. T., Hardie, B. G. S., & Fader, P. S. (2002). Bayesian inference for the negative binomial distribution via polynomial expansions. Journal of Computational and Graphical Statistics, 11(1), 189–201.CrossRef Bradlow, E. T., Hardie, B. G. S., & Fader, P. S. (2002). Bayesian inference for the negative binomial distribution via polynomial expansions. Journal of Computational and Graphical Statistics, 11(1), 189–201.CrossRef
Zurück zum Zitat Breese, J., Heckerman, D., & Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative filtering. Madison, WI: Morgan Kaufmann. Breese, J., Heckerman, D., & Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative filtering. Madison, WI: Morgan Kaufmann.
Zurück zum Zitat Brockwell, A. E. (2006). Parallel Markov chain Monte Carlo simulation by pre-fetching. Journal of Computational and Graphical Statistics, 15(1), 246–261.CrossRef Brockwell, A. E. (2006). Parallel Markov chain Monte Carlo simulation by pre-fetching. Journal of Computational and Graphical Statistics, 15(1), 246–261.CrossRef
Zurück zum Zitat Brockwell, A. E., & Kadane, J. B. (2005). Identification of regeneration times in MCMC simulation, with application to adaptive schemes. Journal of Computational and Graphical Statistics, 14(2), 436–458.CrossRef Brockwell, A. E., & Kadane, J. B. (2005). Identification of regeneration times in MCMC simulation, with application to adaptive schemes. Journal of Computational and Graphical Statistics, 14(2), 436–458.CrossRef
Zurück zum Zitat Brown, S., & Rose, J. (1996). Architecture of FPGAs and CPLDs: A tutorial. IEEE Design and Test of Computers, 13(2), 42–57.CrossRef Brown, S., & Rose, J. (1996). Architecture of FPGAs and CPLDs: A tutorial. IEEE Design and Test of Computers, 13(2), 42–57.CrossRef
Zurück zum Zitat Brynjolfson, E., Smith, M., & Montgomery, A. (2007). The great equalizer: An empirical study of choice in shopbots. Working Paper, Carnegie Mellon University, Tepper School of Business. Brynjolfson, E., Smith, M., & Montgomery, A. (2007). The great equalizer: An empirical study of choice in shopbots. Working Paper, Carnegie Mellon University, Tepper School of Business.
Zurück zum Zitat Chung, T., Siong, R. R., & Wedel, M. (2007). My mobile music: Automatic adaptive play-list personalization. Marketing Science, in press. Chung, T., Siong, R. R., & Wedel, M. (2007). My mobile music: Automatic adaptive play-list personalization. Marketing Science, in press.
Zurück zum Zitat Cook, R. D., & Weisberg, S. (1991). Discussion of Li (1991). Journal of the American Statistical Association, 86, 328–332.CrossRef Cook, R. D., & Weisberg, S. (1991). Discussion of Li (1991). Journal of the American Statistical Association, 86, 328–332.CrossRef
Zurück zum Zitat Ding, M., Park, Y.-H., & Bradlow, E. (2007). Barter markets. Working Paper, The Wharton School. Ding, M., Park, Y.-H., & Bradlow, E. (2007). Barter markets. Working Paper, The Wharton School.
Zurück zum Zitat Du, R., & Kamakura, W. A. (2007). How efficient is your category management? A stochastic-frontier factor model for internal benchmarking. Working Paper. Du, R., & Kamakura, W. A. (2007). How efficient is your category management? A stochastic-frontier factor model for internal benchmarking. Working Paper.
Zurück zum Zitat DuMouchel, W. (1999). Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. The American Statistician, 53(3), 177–190.CrossRef DuMouchel, W. (1999). Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. The American Statistician, 53(3), 177–190.CrossRef
Zurück zum Zitat Escobar, M. D., & West, M. (1996). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90, 577–588.CrossRef Escobar, M. D., & West, M. (1996). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90, 577–588.CrossRef
Zurück zum Zitat Everson, P. J., & Bradlow, E. T. (2002). Bayesian inference for the beta-binomial distribution via polynomial expansions. Journal of Computational and Graphical Statistics, 11(1), 202–207.CrossRef Everson, P. J., & Bradlow, E. T. (2002). Bayesian inference for the beta-binomial distribution via polynomial expansions. Journal of Computational and Graphical Statistics, 11(1), 202–207.CrossRef
Zurück zum Zitat Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.CrossRef Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.CrossRef
Zurück zum Zitat Foutz, N. Z., & Jank, W. (2007). Forecasting new product revenues via online virtual stock market. MSI Report. Foutz, N. Z., & Jank, W. (2007). Forecasting new product revenues via online virtual stock market. MSI Report.
Zurück zum Zitat Genkin, A., Lewis, D. D., & Madigan, D. (2007). Large-scale Bayesian logistic regression for text categorization. Technometrics, 49, 291–304.CrossRef Genkin, A., Lewis, D. D., & Madigan, D. (2007). Large-scale Bayesian logistic regression for text categorization. Technometrics, 49, 291–304.CrossRef
Zurück zum Zitat Handcock, M. S., Raftery, A. E., & Tantrum, J. M. (2007). Model-based clustering for social networks. Journal of the Royal Statistical Society. Series A, 170(2), 301–352. Handcock, M. S., Raftery, A. E., & Tantrum, J. M. (2007). Model-based clustering for social networks. Journal of the Royal Statistical Society. Series A, 170(2), 301–352.
Zurück zum Zitat Hauben, M., Madigan, D., Gerrits, C., & Meyboom, R. (2005). The role of data mining in pharmacovigilance. Expert Opinion in Drug Safety, 4(5), 929–948.CrossRef Hauben, M., Madigan, D., Gerrits, C., & Meyboom, R. (2005). The role of data mining in pharmacovigilance. Expert Opinion in Drug Safety, 4(5), 929–948.CrossRef
Zurück zum Zitat Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1998). An introduction to variational methods for graphical models. In M. I. Jordan (Ed.), Learning graphical models, vol 89 of series D: Behavioural and social sciences (pp. 105–162). Dordrecht, The Netherlands: Kluwer. Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1998). An introduction to variational methods for graphical models. In M. I. Jordan (Ed.), Learning graphical models, vol 89 of series D: Behavioural and social sciences (pp. 105–162). Dordrecht, The Netherlands: Kluwer.
Zurück zum Zitat Kamakura, W. A., & Kang, W. (2007). Chain-wide and store-level analysis for cross-category management. Journal of Retailing, 83(2), 159–170.CrossRef Kamakura, W. A., & Kang, W. (2007). Chain-wide and store-level analysis for cross-category management. Journal of Retailing, 83(2), 159–170.CrossRef
Zurück zum Zitat Kreulen, J., Cody, W., Spangler, W., & Krishna, V. (2002). The integration of business intelligence and knowledge management. IBM Systems Journal, 41(4), 2002. Kreulen, J., Cody, W., Spangler, W., & Krishna, V. (2002). The integration of business intelligence and knowledge management. IBM Systems Journal, 41(4), 2002.
Zurück zum Zitat Kreulen, J., & Spangler, W. (2005). Interactive methods for taxonomy editing and validation. Next generation of data-mining applications, chapter 20 pp. 495–522. New York: Wiley. Kreulen, J., & Spangler, W. (2005). Interactive methods for taxonomy editing and validation. Next generation of data-mining applications, chapter 20 pp. 495–522. New York: Wiley.
Zurück zum Zitat Kreulen, J., Spangler, W., & Lessler, J. (2003). Generating and browsing multiple taxonomies over a document collection. Journal of Management Information Systems, 19(4), 191–212. Kreulen, J., Spangler, W., & Lessler, J. (2003). Generating and browsing multiple taxonomies over a document collection. Journal of Management Information Systems, 19(4), 191–212.
Zurück zum Zitat Li, K.-C. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86, 316–342.CrossRef Li, K.-C. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86, 316–342.CrossRef
Zurück zum Zitat Li, L., Cook, R. D., & Tsai, C.-L. (2007). Partial inverse regression. Biometrika, 94, 615–625.CrossRef Li, L., Cook, R. D., & Tsai, C.-L. (2007). Partial inverse regression. Biometrika, 94, 615–625.CrossRef
Zurück zum Zitat Liu, J. S., & Chen, R. R. (1998). Sequential Monte Carlo methods for dynamical systems. Journal of the American Statistical Association, 93, 1032–1044.CrossRef Liu, J. S., & Chen, R. R. (1998). Sequential Monte Carlo methods for dynamical systems. Journal of the American Statistical Association, 93, 1032–1044.CrossRef
Zurück zum Zitat Miller, S. J., Bradlow, E. T., & Dayartna, K. (2006). Closed-form Bayesian inferences for the logit model via polynomial expansions. Quantitative Marketing and Economics, 4(2), 173–206.CrossRef Miller, S. J., Bradlow, E. T., & Dayartna, K. (2006). Closed-form Bayesian inferences for the logit model via polynomial expansions. Quantitative Marketing and Economics, 4(2), 173–206.CrossRef
Zurück zum Zitat Montgomery, A. L. (1997). Creating micro-marketing pricing strategies using supermarket scanner data. Marketing Science, 16(4), 315–337.CrossRef Montgomery, A. L. (1997). Creating micro-marketing pricing strategies using supermarket scanner data. Marketing Science, 16(4), 315–337.CrossRef
Zurück zum Zitat Montgomery, A. L., Li, S., Srinivasan, K., & Liechty, J. (2004). Modeling online browsing and path analysis using clickstream data. Marketing Science, 23(4), 579–595.CrossRef Montgomery, A. L., Li, S., Srinivasan, K., & Liechty, J. (2004). Modeling online browsing and path analysis using clickstream data. Marketing Science, 23(4), 579–595.CrossRef
Zurück zum Zitat Naik, P. A., Hagerty, M., & Tsai, C.-L. (2000). A new dimension reduction approach for data-rich marketing environments: Sliced inverse regression. Journal of Marketing Research, 37(1), 88–101.CrossRef Naik, P. A., Hagerty, M., & Tsai, C.-L. (2000). A new dimension reduction approach for data-rich marketing environments: Sliced inverse regression. Journal of Marketing Research, 37(1), 88–101.CrossRef
Zurück zum Zitat Naik, P. A., & Tsai, C.-L. (2004). Isotonic single-index model for high-dimensional database marketing. Computational Statistics and Data Analysis, 47(4), 775–790.CrossRef Naik, P. A., & Tsai, C.-L. (2004). Isotonic single-index model for high-dimensional database marketing. Computational Statistics and Data Analysis, 47(4), 775–790.CrossRef
Zurück zum Zitat Naik, P. A., & Tsai, C.-L. (2005). Constrained inverse regression for incorporating prior information. Journal of the American Statistical Association, 100(469), 204–211.CrossRef Naik, P. A., & Tsai, C.-L. (2005). Constrained inverse regression for incorporating prior information. Journal of the American Statistical Association, 100(469), 204–211.CrossRef
Zurück zum Zitat Naik, P. A., Wedel, M., & Kamakura, W. (2008). Multi-index binary response model for analysis of large datasets. Journal of Business and Economic Statistics, in press. Naik, P. A., Wedel, M., & Kamakura, W. (2008). Multi-index binary response model for analysis of large datasets. Journal of Business and Economic Statistics, in press.
Zurück zum Zitat Prelec, D. (2001). Readings packet on the information pump. Boston, MA: MIT Sloan School of Management. Prelec, D. (2001). Readings packet on the information pump. Boston, MA: MIT Sloan School of Management.
Zurück zum Zitat Ridgeway, G., & Madigan, D. (2002). A sequential Monte Carlo method for Bayesian analysis of massive datasets. Journal of Knowledge Discovery and Data Mining, 7, 301–319.CrossRef Ridgeway, G., & Madigan, D. (2002). A sequential Monte Carlo method for Bayesian analysis of massive datasets. Journal of Knowledge Discovery and Data Mining, 7, 301–319.CrossRef
Zurück zum Zitat Silverman, B. W. (1986). Density estimation. London, UK: Chapman and Hall. Silverman, B. W. (1986). Density estimation. London, UK: Chapman and Hall.
Zurück zum Zitat Simonoff, J. S. (1996). Smoothing methods in statistics. New York, NY: Springer. Simonoff, J. S. (1996). Smoothing methods in statistics. New York, NY: Springer.
Zurück zum Zitat Spangler, S., & Kreulen, J. (2007). Mining the talk: Unlocking the business value in unstructured information. Indianapolis, IN: IBM. Spangler, S., & Kreulen, J. (2007). Mining the talk: Unlocking the business value in unstructured information. Indianapolis, IN: IBM.
Zurück zum Zitat Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B, 58(1), 267–288. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B, 58(1), 267–288.
Zurück zum Zitat Toubia, O. (2006). Idea generation, creativity, and incentives. Marketing Science, 25(5), 411–425.CrossRef Toubia, O. (2006). Idea generation, creativity, and incentives. Marketing Science, 25(5), 411–425.CrossRef
Zurück zum Zitat Trusov, M., Bodapati, A., & Bucklin, R. E. (2007a). Determining influential users in internet social networks. Working Paper, Robert H. Smith School of Business, University of Maryland. Trusov, M., Bodapati, A., & Bucklin, R. E. (2007a). Determining influential users in internet social networks. Working Paper, Robert H. Smith School of Business, University of Maryland.
Zurück zum Zitat Trusov, M., Bucklin, R. E., & Pauwels, K. (2007b). Estimating the dynamic effects of online word-of-mouth on member growth of a social network site. Working Paper, Robert H. Smith School of Business, University of Maryland. Trusov, M., Bucklin, R. E., & Pauwels, K. (2007b). Estimating the dynamic effects of online word-of-mouth on member growth of a social network site. Working Paper, Robert H. Smith School of Business, University of Maryland.
Zurück zum Zitat Wainwright, M., & Jordan, M. (2003). Graphical models, exponential families, and variational inference. Technical Report 649, Department of Statistics, UC Berkeley. Wainwright, M., & Jordan, M. (2003). Graphical models, exponential families, and variational inference. Technical Report 649, Department of Statistics, UC Berkeley.
Zurück zum Zitat Wasserman, S., & Faust, K. (1994). Social network analysis. Cambridge: Cambridge University Press. Wasserman, S., & Faust, K. (1994). Social network analysis. Cambridge: Cambridge University Press.
Zurück zum Zitat Wedel, M., & Kamakura, W. (2000). Market segmentation: Conceptual and methodological foundations (2nd edn.). Dordrecht: Kluwer. Wedel, M., & Kamakura, W. (2000). Market segmentation: Conceptual and methodological foundations (2nd edn.). Dordrecht: Kluwer.
Zurück zum Zitat Wedel, M., & Kamakura, W. A. (2001). Factor analysis with mixed observed and latent variables in the exponential family. Psychometrika, 66(4), 515–530.CrossRef Wedel, M., & Kamakura, W. A. (2001). Factor analysis with mixed observed and latent variables in the exponential family. Psychometrika, 66(4), 515–530.CrossRef
Zurück zum Zitat Wedel, M., & Zhang, J. (2004). Analyzing brand competition across subcategories. Journal of Marketing Research, 41(4), 448–456.CrossRef Wedel, M., & Zhang, J. (2004). Analyzing brand competition across subcategories. Journal of Marketing Research, 41(4), 448–456.CrossRef
Zurück zum Zitat Ying, Y., Feinberg, F., & Wedel, M. (2006). Improving online product recommendations by including nonrated items. Journal of Marketing Research, 43, 355–365.CrossRef Ying, Y., Feinberg, F., & Wedel, M. (2006). Improving online product recommendations by including nonrated items. Journal of Marketing Research, 43, 355–365.CrossRef
Metadaten
Titel
Challenges and opportunities in high-dimensional choice data analyses
verfasst von
Prasad Naik
Michel Wedel
Lynd Bacon
Anand Bodapati
Eric Bradlow
Wagner Kamakura
Jeffrey Kreulen
Peter Lenk
David M. Madigan
Alan Montgomery
Publikationsdatum
01.12.2008
Verlag
Springer US
Erschienen in
Marketing Letters / Ausgabe 3-4/2008
Print ISSN: 0923-0645
Elektronische ISSN: 1573-059X
DOI
https://doi.org/10.1007/s11002-008-9036-3

Weitere Artikel der Ausgabe 3-4/2008

Marketing Letters 3-4/2008 Zur Ausgabe