Skip to main content
Top
Published in: Soft Computing 14/2018

22-05-2017 | Methodologies and Application

Fuzzy heaping mechanism for heaped count data with imprecision

Authors: Hye-Young Jung, Heawon Choi, Taesung Park

Published in: Soft Computing | Issue 14/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In genetic association studies, the traits of interest may sometimes be collected from the reported data. Since subjects report exact responses and/or rounded responses, the histogram of data frequently exhibits spikes at particular values. This phenomenon, known as heaping, can cause difficulties in performing the association test via standard modeling approaches. Recently, several models have been proposed to identify the true unobservable underlying distribution from heaped data. However, all of these methods depend on probabilistic assumptions regarding the heaping mechanism. Unfortunately, probabilistic models cannot represent heaped data effectively, because heaping can be caused by imprecisely reported values. This type of imprecision is different from probabilistic uncertainty, which is described well by a probabilistic model. In this paper, we propose a fuzzy heaping model to identify genetic variants for the heaped count data. Our fuzzy model uses a mixture of likelihood functions for precisely and imprecisely reported data, treating heaped data as imprecise data represented by fuzzy sets. Moreover, since reported count data may include excess zeros, as well as heaped data, we extend our fuzzy heaping model to handle excess zeros. Through simulation studies, we show that the proposed fuzzy heaping model controls type I errors effectively and has great power to identify causal variants. We illustrate the proposed fuzzy heaping model through a study of the identification of genetic variants associated with the number of cigarettes smoked per day.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Bar H, Lillard D (2012) Accounting for heaping in retrospectively reported event data. A mixture-model approach. Stat Med 31:3347–3365MathSciNetCrossRef Bar H, Lillard D (2012) Accounting for heaping in retrospectively reported event data. A mixture-model approach. Stat Med 31:3347–3365MathSciNetCrossRef
go back to reference Burton PR, Clayton DG, Cardon LR, Craddock N, Deloukas P, Duncanson A, Kwiatkowski DP, McCarthy MI, Ouwehand WH, Samani NJ, Todd JA (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature 447(7145):661–678CrossRef Burton PR, Clayton DG, Cardon LR, Craddock N, Deloukas P, Duncanson A, Kwiatkowski DP, McCarthy MI, Ouwehand WH, Samani NJ, Todd JA (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature 447(7145):661–678CrossRef
go back to reference Bush WS, Moore JH (2012) Genome-wide association studies. PLoS Comput Biol 8(12):e1002822CrossRef Bush WS, Moore JH (2012) Genome-wide association studies. PLoS Comput Biol 8(12):e1002822CrossRef
go back to reference Cho YS, Go MJ, Kim YJ, Heo JY, Oh JH, Ban HJ, Yoon D, Lee MH, Kim DJ, Park M, Cha SH (2009) A large-scale genome-wide association study of Asian populations uncovers genetic factors influencing eight quantitative traits. Nat Genet 41(5):527–534CrossRef Cho YS, Go MJ, Kim YJ, Heo JY, Oh JH, Ban HJ, Yoon D, Lee MH, Kim DJ, Park M, Cha SH (2009) A large-scale genome-wide association study of Asian populations uncovers genetic factors influencing eight quantitative traits. Nat Genet 41(5):527–534CrossRef
go back to reference Dale SC, Robin JM et al (2014) Effect of neuronal nicotinic acetylcholine receptor genes (CHRN) on longitudinal cigarettes per day in adolescents and young adults. Nicotine Tob Res Feb 16(2):137–144CrossRef Dale SC, Robin JM et al (2014) Effect of neuronal nicotinic acetylcholine receptor genes (CHRN) on longitudinal cigarettes per day in adolescents and young adults. Nicotine Tob Res Feb 16(2):137–144CrossRef
go back to reference David SP et al (2012) Genome-wide meta-analyses of smoking behaviors in African Americans. Transl psychiatry 2(5):e119 David SP et al (2012) Genome-wide meta-analyses of smoking behaviors in African Americans. Transl psychiatry 2(5):e119
go back to reference Dubois D, Prade H (1980) Fuzzy sets and systems theory and applications. Academic Press, New YorkMATH Dubois D, Prade H (1980) Fuzzy sets and systems theory and applications. Academic Press, New YorkMATH
go back to reference Farrell L, Fry T, Harris M (2008) A pack a day for twenty years: smoking and cigarette packet sizes. Appl Econ 43:2833–2842CrossRef Farrell L, Fry T, Harris M (2008) A pack a day for twenty years: smoking and cigarette packet sizes. Appl Econ 43:2833–2842CrossRef
go back to reference Hardy J, Singleton A (2009) Genomewide association studies and human disease. N Engl J Med 360(17):1759–1768CrossRef Hardy J, Singleton A (2009) Genomewide association studies and human disease. N Engl J Med 360(17):1759–1768CrossRef
go back to reference Heilbron D (1989) Generalized linear models for altered zero probabilities and overdispersion in count Data, SIMS Technical Report 9. University of California, San Francisco, Department of Epidemiology and Biostatistics Heilbron D (1989) Generalized linear models for altered zero probabilities and overdispersion in count Data, SIMS Technical Report 9. University of California, San Francisco, Department of Epidemiology and Biostatistics
go back to reference Jung H, Choi H, Park T (2015) Fuzzy mixture model for heaping data. In: Proceedings of the 9th NAUN international conference on applied mathematics, simulation, modelling (ASM ’15), Konya, Turkey, 20–22 May 2015 Jung H, Choi H, Park T (2015) Fuzzy mixture model for heaping data. In: Proceedings of the 9th NAUN international conference on applied mathematics, simulation, modelling (ASM ’15), Konya, Turkey, 20–22 May 2015
go back to reference Jung H, Lee W, Yoon J, Choi S (2014) Likelihood inference based on fuzzy data in regression model. In: SCIS & ISIS 2014, IEEE, 1175-1179 Jung H, Lee W, Yoon J, Choi S (2014) Likelihood inference based on fuzzy data in regression model. In: SCIS & ISIS 2014, IEEE, 1175-1179
go back to reference Kumasaka N, Aoki M, Okada Y, Takahashi A, Ozaki K, Mushiroda T, Kamatani N (2012) Haplotypes with copy number and single nucleotide polymorphisms in CYP2A6 locus are associated with smoking quantity in a Japanese population. PLoS ONE 7(9):e44507CrossRef Kumasaka N, Aoki M, Okada Y, Takahashi A, Ozaki K, Mushiroda T, Kamatani N (2012) Haplotypes with copy number and single nucleotide polymorphisms in CYP2A6 locus are associated with smoking quantity in a Japanese population. PLoS ONE 7(9):e44507CrossRef
go back to reference Lambert D (2008) Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics 34:1–14CrossRefMATH Lambert D (2008) Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics 34:1–14CrossRefMATH
go back to reference Li MD, Yoon D, Lee JY, Han BG, Niu T, Payne TJ, Park T (2010) Associations of variants in CHRNA5/A3/B4 gene cluster with smoking behaviors in a Korean population. PLoS ONE 5(8):e12183CrossRef Li MD, Yoon D, Lee JY, Han BG, Niu T, Payne TJ, Park T (2010) Associations of variants in CHRNA5/A3/B4 gene cluster with smoking behaviors in a Korean population. PLoS ONE 5(8):e12183CrossRef
go back to reference Manolio TA, Brooks LD, Collins FS (2008) A HapMap harvest of insights into the genetics of common disease. J Clin Investig 118(5):1590–1605CrossRef Manolio TA, Brooks LD, Collins FS (2008) A HapMap harvest of insights into the genetics of common disease. J Clin Investig 118(5):1590–1605CrossRef
go back to reference Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39(7):906–913CrossRef Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39(7):906–913CrossRef
go back to reference Mullahy J (1997) Heterogeneity, excess zeros, and the structure of count data model. J Appl Econom 12:337–350CrossRef Mullahy J (1997) Heterogeneity, excess zeros, and the structure of count data model. J Appl Econom 12:337–350CrossRef
go back to reference Najafi Z, Taheri SM, Mashinchi M (2010) Likelihood ratio test based on fuzzy data. Int J Intell Technol Appl Stat 3(3):285–301 Najafi Z, Taheri SM, Mashinchi M (2010) Likelihood ratio test based on fuzzy data. Int J Intell Technol Appl Stat 3(3):285–301
go back to reference Rice JP et al (2012) CHRNB3 is more strongly associated with FTCD-based nicotine dependence than cigarettes per day: phenotype definition changes GWAS results, Addiction (Abingdon, England) 107.11 2019 Rice JP et al (2012) CHRNB3 is more strongly associated with FTCD-based nicotine dependence than cigarettes per day: phenotype definition changes GWAS results, Addiction (Abingdon, England) 107.11 2019
go back to reference The Tobacco and Genetics Consortium (2010) Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat Genet 42(5):443–571CrossRef The Tobacco and Genetics Consortium (2010) Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat Genet 42(5):443–571CrossRef
go back to reference Thorgeirsson TE et al (2008) A variant associated with nicotine dependence, lung cancer and peripheral arterial disease. Nature 452(7187):638–642CrossRef Thorgeirsson TE et al (2008) A variant associated with nicotine dependence, lung cancer and peripheral arterial disease. Nature 452(7187):638–642CrossRef
Metadata
Title
Fuzzy heaping mechanism for heaped count data with imprecision
Authors
Hye-Young Jung
Heawon Choi
Taesung Park
Publication date
22-05-2017
Publisher
Springer Berlin Heidelberg
Published in
Soft Computing / Issue 14/2018
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-017-2641-4

Other articles of this Issue 14/2018

Soft Computing 14/2018 Go to the issue

Premium Partner