Skip to main content
Erschienen in: Quantitative Marketing and Economics 3/2013

01.09.2013

Multi level categorical data fusion using partially fused data

verfasst von: Zvi Gilula, Robert McCulloch

Erschienen in: Quantitative Marketing and Economics | Ausgabe 3/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Data fusion poses challenging methodological issues for inferring the joint distribution of two random variables when the information available is mainly confined to the marginal distributions. When the variables are categorical, the challenges are even more severe. Applications of categorical data fusion are of top importance in marketing, especially in advertising. A great deal of categorical data fusion methods are confined to binary variables. In this paper we develop an innovative approach to categorical data fusion that extends previous methodologies and applies to categorical variables with any number of levels. We introduce a new concept of “evident dependence” that describes a variety of patterns of joint distributions given the marginals. Using information from partially fused data, our method smoothly accommodates a Bayesian approach based on mixtures of joint distributions constructed using evident dependence. The approach is illustrated using data from the advertising industry.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Accioly, R., & Chiyoshi, F. (2004). Modeling dependence with copulas: a useful tool for field development decision process. Journal of Petroleum Science and Engineering, 44, 83–91.CrossRef Accioly, R., & Chiyoshi, F. (2004). Modeling dependence with copulas: a useful tool for field development decision process. Journal of Petroleum Science and Engineering, 44, 83–91.CrossRef
Zurück zum Zitat Chen, T., & Feinberg, S. (1974). Two-dimensional contingency tables with both completely and partially cross-classified data. Biometrics, 30, 629–642.CrossRef Chen, T., & Feinberg, S. (1974). Two-dimensional contingency tables with both completely and partially cross-classified data. Biometrics, 30, 629–642.CrossRef
Zurück zum Zitat Chen, T., & Feinberg, S. (1976). The analysis of contingency tables with incompletely classified data. Biometrics, 32, 629–642.CrossRef Chen, T., & Feinberg, S. (1976). The analysis of contingency tables with incompletely classified data. Biometrics, 32, 629–642.CrossRef
Zurück zum Zitat Dall‘Aglio, G., & Bona, E. (forthcoming). The minimum of the entropy of a two-dimensonal distribution with given marinals. Electronic Journal of Statistics. Dall‘Aglio, G., & Bona, E. (forthcoming). The minimum of the entropy of a two-dimensonal distribution with given marinals. Electronic Journal of Statistics.
Zurück zum Zitat Dobra, A., & Feinberg, S. (2001). Bounds for cell entries in contingency tables induced by fixed marginal totals with applications to disclosure limitation. Statistical Journal of the United Nations ECE, 18, 363–371. Dobra, A., & Feinberg, S. (2001). Bounds for cell entries in contingency tables induced by fixed marginal totals with applications to disclosure limitation. Statistical Journal of the United Nations ECE, 18, 363–371.
Zurück zum Zitat Elidan, G. (2010). Copula bayesian networks. In J. Lafferty, C.K.I. Williams, J. Shawe-Taylor, R. Zemel, A. Culotta (Eds.), Advances in neural information processing systems 23 (pp. 559–567). Red Hook, New York: Curran Associates. Elidan, G. (2010). Copula bayesian networks. In J. Lafferty, C.K.I. Williams, J. Shawe-Taylor, R. Zemel, A. Culotta (Eds.), Advances in neural information processing systems 23 (pp. 559–567). Red Hook, New York: Curran Associates.
Zurück zum Zitat Embrechts, P., Lindskog, F., McNeil, A. (2003). Modeling dependence with copulas and applications to risk management. In Handbook of heavy tailed distributions in finance. Embrechts, P., Lindskog, F., McNeil, A. (2003). Modeling dependence with copulas and applications to risk management. In Handbook of heavy tailed distributions in finance.
Zurück zum Zitat Feinberg, S. (2006). Privacy and confidentiality in an e-commerce world: data mining. Data warehousing, matching and disclosure limitation. Statistical Science, 21(2), 143–154.CrossRef Feinberg, S. (2006). Privacy and confidentiality in an e-commerce world: data mining. Data warehousing, matching and disclosure limitation. Statistical Science, 21(2), 143–154.CrossRef
Zurück zum Zitat Feinberg, S., & Makov, U. (1998). Confidentiality, uniqueness and disclosure limitation for categorical data. Journal of Official Statistics, 14, 385–397. Feinberg, S., & Makov, U. (1998). Confidentiality, uniqueness and disclosure limitation for categorical data. Journal of Official Statistics, 14, 385–397.
Zurück zum Zitat Feinberg, S., Makov, U., Meyer, M., Steele, R. (2001). Computing the exact distribution for a multi-way contingency table conditional on its marginal totals. In A.K.M.E. Saleh (Ed.), Data analysis from statistical foundations: a Festschrift in honor of the 75th birthday of D. A. S. Fraser (pp. 145–165). Hauppauge: Nova Science Publishers. Feinberg, S., Makov, U., Meyer, M., Steele, R. (2001). Computing the exact distribution for a multi-way contingency table conditional on its marginal totals. In A.K.M.E. Saleh (Ed.), Data analysis from statistical foundations: a Festschrift in honor of the 75th birthday of D. A. S. Fraser (pp. 145–165). Hauppauge: Nova Science Publishers.
Zurück zum Zitat Feinberg, S., Makov, U., Steele, R. (1998). Disclosure limitation using perturbation and related methods for categorical data. Journal of Official Statistics, 14, 485–511. With discussion by P. Kooiman and a response. Feinberg, S., Makov, U., Steele, R. (1998). Disclosure limitation using perturbation and related methods for categorical data. Journal of Official Statistics, 14, 485–511. With discussion by P. Kooiman and a response.
Zurück zum Zitat Gilula, Z., McCulloch, R., Rossi, P. (2006). Direct data fusion. Journal of Marketing Research, 63, 1–22. Gilula, Z., McCulloch, R., Rossi, P. (2006). Direct data fusion. Journal of Marketing Research, 63, 1–22.
Zurück zum Zitat Kamakura, W., & Wedel, M. (1997). Statistical data fusion for cross-tabulation. Journal of Marketing Research, 34, 485–498.CrossRef Kamakura, W., & Wedel, M. (1997). Statistical data fusion for cross-tabulation. Journal of Marketing Research, 34, 485–498.CrossRef
Zurück zum Zitat Kamakura, W., & Wedel, M. (2000). Factor analysis and missing data. Journal of Marketing Research, 37, 490–498.CrossRef Kamakura, W., & Wedel, M. (2000). Factor analysis and missing data. Journal of Marketing Research, 37, 490–498.CrossRef
Zurück zum Zitat Kiesel, H., & Rassler, S. (2006). How valid can data fusion be? IAB Discussion Paper 200615, Institut fr Arbeitsmarkt- und Berufsforschung (IAB), Nrnberg. Nuremberg: Institute for Employment Research. Kiesel, H., & Rassler, S. (2006). How valid can data fusion be? IAB Discussion Paper 200615, Institut fr Arbeitsmarkt- und Berufsforschung (IAB), Nrnberg. Nuremberg: Institute for Employment Research.
Zurück zum Zitat King, G., Rosen, O., Tanner, M. (Eds.) (2004). Ecological inference: New methodological strategies. Cambridge: Cambridge University Press. King, G., Rosen, O., Tanner, M. (Eds.) (2004). Ecological inference: New methodological strategies. Cambridge: Cambridge University Press.
Zurück zum Zitat Nelsen, R. (2007). An introduction to Coplulas. New York: Springer. Nelsen, R. (2007). An introduction to Coplulas. New York: Springer.
Zurück zum Zitat R Core Team (2012). R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. ISBN 3-900051-07-0. R Core Team (2012). R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. ISBN 3-900051-07-0.
Zurück zum Zitat Rogers, W. (1984). An evaluation of statistical matching. Journal of Business and Economic Statistics, 2, 91–102. Rogers, W. (1984). An evaluation of statistical matching. Journal of Business and Economic Statistics, 2, 91–102.
Zurück zum Zitat Rubin, D. (1986). Statistical matching using file concatenation with adjusted iights and multiple imputations. Journal of Business and Economic Statistics, 4, 87–94. Rubin, D. (1986). Statistical matching using file concatenation with adjusted iights and multiple imputations. Journal of Business and Economic Statistics, 4, 87–94.
Zurück zum Zitat Sklar, A. (1959). Fonctions de repartition a n dimensions et leurs marge. Publication de lInstitute de Statistique de LUniversite de Paris, 8, 229–231. Sklar, A. (1959). Fonctions de repartition a n dimensions et leurs marge. Publication de lInstitute de Statistique de LUniversite de Paris, 8, 229–231.
Zurück zum Zitat Slavkovic, A., & Feinberg, S. (2005). Preserving the confidentiality of categorical statistical data bases when releasing information for association rules. Data Mining and Knowledge Discovery, 11, 155–180.CrossRef Slavkovic, A., & Feinberg, S. (2005). Preserving the confidentiality of categorical statistical data bases when releasing information for association rules. Data Mining and Knowledge Discovery, 11, 155–180.CrossRef
Zurück zum Zitat Tchen, A. (1980). Inequalities for distributions with given marginals. Annals of Probability, 8, 814–827.CrossRef Tchen, A. (1980). Inequalities for distributions with given marginals. Annals of Probability, 8, 814–827.CrossRef
Zurück zum Zitat Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58(1), 267–288. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58(1), 267–288.
Zurück zum Zitat WakeLeld, J. (2004). Ecological inference for 2x2 tables. Journal of the Royal Statistical Society Series A, 167(3), 385–445. WakeLeld, J. (2004). Ecological inference for 2x2 tables. Journal of the Royal Statistical Society Series A, 167(3), 385–445.
Metadaten
Titel
Multi level categorical data fusion using partially fused data
verfasst von
Zvi Gilula
Robert McCulloch
Publikationsdatum
01.09.2013
Verlag
Springer US
Erschienen in
Quantitative Marketing and Economics / Ausgabe 3/2013
Print ISSN: 1570-7156
Elektronische ISSN: 1573-711X
DOI
https://doi.org/10.1007/s11129-013-9136-0