Skip to main content
Top
Published in: Knowledge and Information Systems 3/2016

01-03-2016 | Regular Paper

Analysis of incomplete and inconsistent clinical survey data

Authors: Suzan Arslanturk, Mohammad-Reza Siadat, Theophilus Ogunyemi, Kim Killinger, Ananias Diokno

Published in: Knowledge and Information Systems | Issue 3/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

It is common for clinical data in survey trials to be incomplete and inconsistent for several reasons. Inconsistent data occur when more than one set of exclusive alternative questions are answered. One objective of this study was to identify and eliminate inconsistent data as an important data mining preprocessing step. We define three types of incomplete data: missing data due to skip pattern (SPMD), undetermined missing data (UMD), and genuine missing data (GMD). Identifying the type of missing data is another important objective as all missing data types cannot be treated the same. This goal cannot be achieved manually on large data of complex surveys since each subject should be processed individually. The analyses are accomplished in a mathematical framework by exploiting graph theoretic structure inherent in the questionnaire. An undirected graph is built using mutually inconsistent responses as well as its complement. The responses not in the largest maximal clique of complement graph are considered inconsistent. This guarantees removing as few responses as possible so that remaining ones are mutually consistent. Further, all potential paths in questionnaire’s graph are considered, based on the responses of subjects, to identify each type of incomplete data. Experiments are conducted on MESA data. Results show 15.4 % GMD, 9.8 % SPMD, 12.9 % UMD, and 0.021 % inconsistent data. Further utility of the approach is using a) the SPMD for data stratification, and b) inconsistent data for noise estimation. Proposed method is a preprocessing prerequisite for any data mining of clinical survey data.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Ambler G, Omar RZ, Royston P (2007) A comparison of imputation techniques for handling missing predictor values in a risk model with a binary outcome. Stat Methods Med Res 16:277–298CrossRefMathSciNetMATH Ambler G, Omar RZ, Royston P (2007) A comparison of imputation techniques for handling missing predictor values in a risk model with a binary outcome. Stat Methods Med Res 16:277–298CrossRefMathSciNetMATH
2.
go back to reference Arslanturk S, Siadat MR, Ogunyemi T, Sethi I, Diokno A (2011) Comparison of feature selection techniques using fully controlled simulation based datasets. In: 2nd international conference on information management and evaluation, Toronto, Canada, pp 18–23 Arslanturk S, Siadat MR, Ogunyemi T, Sethi I, Diokno A (2011) Comparison of feature selection techniques using fully controlled simulation based datasets. In: 2nd international conference on information management and evaluation, Toronto, Canada, pp 18–23
3.
4.
go back to reference Cohen WW (1995) Fast effective rule induction. In: Twelfth international conference on machine learning, Lake Tahoe, CA, pp 115–123 Cohen WW (1995) Fast effective rule induction. In: Twelfth international conference on machine learning, Lake Tahoe, CA, pp 115–123
5.
go back to reference Dillman DA, Baxter LC, Jackson A (1999) Skip-pattern compliance in three test forms: a theoretical and empirical evaluation. The social and economic sciences research center technical report number: 99-01 Dillman DA, Baxter LC, Jackson A (1999) Skip-pattern compliance in three test forms: a theoretical and empirical evaluation. The social and economic sciences research center technical report number: 99-01
6.
go back to reference Diokno AC, Brock BM, Brown MB et al (1986) Prevalence of urinary incontinence and other urological symptoms in the noninstutionalized elderly. J Urol 136:1022 Diokno AC, Brock BM, Brown MB et al (1986) Prevalence of urinary incontinence and other urological symptoms in the noninstutionalized elderly. J Urol 136:1022
7.
go back to reference Diokno AC, Brown MB, Brock BM et al (1988) Clinical and cystometric characteristics of continent and incontinent noninstitutionalized elderly. J Urol 140:567 Diokno AC, Brown MB, Brock BM et al (1988) Clinical and cystometric characteristics of continent and incontinent noninstitutionalized elderly. J Urol 140:567
8.
go back to reference Diokno AC, Sampselle CM, Herzog AR et al (2004) Prevention of urinary incontinence by behavioral modification program: a randomized, controlled trial among older women in the community. J Urol 171:1165CrossRef Diokno AC, Sampselle CM, Herzog AR et al (2004) Prevention of urinary incontinence by behavioral modification program: a randomized, controlled trial among older women in the community. J Urol 171:1165CrossRef
9.
go back to reference Fagan J, Greenberg BV (1988) Using graph theory to analyze skip patterns in questionnaires. Bureau of the census, statistical research division report series, SRD research report number: census/SRD/RR-88/06 Fagan J, Greenberg BV (1988) Using graph theory to analyze skip patterns in questionnaires. Bureau of the census, statistical research division report series, SRD research report number: census/SRD/RR-88/06
10.
go back to reference Hall MA (1999) Correlation-based feature subset selection for machine learning. Dissertation, The University of Waikato Thesis Hall MA (1999) Correlation-based feature subset selection for machine learning. Dissertation, The University of Waikato Thesis
11.
go back to reference Heijden G, Donders A, Stijnen T, Moons K (2006) Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example. J Clin Epidemiol 59:1102–1109CrossRef Heijden G, Donders A, Stijnen T, Moons K (2006) Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example. J Clin Epidemiol 59:1102–1109CrossRef
12.
go back to reference Jerez JM, Molina I et al (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med 50:105–115CrossRef Jerez JM, Molina I et al (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med 50:105–115CrossRef
13.
go back to reference Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M (2004) Methods for imputation of missing values in air quality data sets. J Atmos Environ 38:2895–2907CrossRef Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M (2004) Methods for imputation of missing values in air quality data sets. J Atmos Environ 38:2895–2907CrossRef
14.
go back to reference Lenderking WR, Nackley JF, Anderson RB, Testa MA (1996) A review of the quality of life aspects of urinary urge incontinence. J PharmacoEconomics 9:11–23CrossRef Lenderking WR, Nackley JF, Anderson RB, Testa MA (1996) A review of the quality of life aspects of urinary urge incontinence. J PharmacoEconomics 9:11–23CrossRef
15.
go back to reference Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New YorkMATH Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New YorkMATH
16.
go back to reference Manca A, Palmer S (2005) Handling missing data in patient-level cost-effectiveness analysis alongside randomised clinical trials. Appl Health Econ Health Policy 4:65–75CrossRef Manca A, Palmer S (2005) Handling missing data in patient-level cost-effectiveness analysis alongside randomised clinical trials. Appl Health Econ Health Policy 4:65–75CrossRef
17.
go back to reference Penny KI, Chesney T (2006) Imputation methods to deal with missing values when data mining trauma injury data. In: 28th international conference on information technology interfaces, Cavtat, Croatia, pp 213–218 Penny KI, Chesney T (2006) Imputation methods to deal with missing values when data mining trauma injury data. In: 28th international conference on information technology interfaces, Cavtat, Croatia, pp 213–218
18.
go back to reference Ouzienkio V, Obradovic Z (2014) Imputation of missing links and attributes in longitudinal social surveys. J Mach Learn 95(3):329–356CrossRef Ouzienkio V, Obradovic Z (2014) Imputation of missing links and attributes in longitudinal social surveys. J Mach Learn 95(3):329–356CrossRef
19.
go back to reference Li Yuanyuan, Parker LE (2014) Full length article: nearest neighbor imputation using spatial-temporal correlations in wireless sensor networks. J Inf Fusion 15:64–79CrossRef Li Yuanyuan, Parker LE (2014) Full length article: nearest neighbor imputation using spatial-temporal correlations in wireless sensor networks. J Inf Fusion 15:64–79CrossRef
20.
go back to reference Zhang C, Zu Y, Zhang J, Zhang S (2006) Clustering-based missing value imputation for data preprocessing. In: International conference on industrial informatics, Singapore, pp 1081–1086. doi:10.1109/INDIN.2006.275672 Zhang C, Zu Y, Zhang J, Zhang S (2006) Clustering-based missing value imputation for data preprocessing. In: International conference on industrial informatics, Singapore, pp 1081–1086. doi:10.​1109/​INDIN.​2006.​275672
Metadata
Title
Analysis of incomplete and inconsistent clinical survey data
Authors
Suzan Arslanturk
Mohammad-Reza Siadat
Theophilus Ogunyemi
Kim Killinger
Ananias Diokno
Publication date
01-03-2016
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 3/2016
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-015-0850-7

Other articles of this Issue 3/2016

Knowledge and Information Systems 3/2016 Go to the issue

Premium Partner