nach oben

Data Mining and Knowledge Discovery

Erschienen in:

01.07.2015

A relative similarity based method for interactive patient risk prediction

verfasst von: Buyue Qian, Xiang Wang, Nan Cao, Hongfei Li, Yu-Gang Jiang

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 4/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper investigates the patient risk prediction problem in the context of active learning with relative similarities. Active learning has been extensively studied and successfully applied to solve real problems. The typical setting of active learning methods is to query absolute questions. In a medical application where the goal is to predict the risk of patients on certain disease using Electronic Health Records (EHR), the absolute questions take the form of “Will this patient suffer from Alzheimer’s later in his/her life?”, or “Are these two patients similar or not?”. Due to the excessive requirements of domain knowledge, such absolute questions are usually difficult to answer, even for experienced medical experts. In addition, the performance of absolute question focused active learning methods is less stable, since incorrect answers often occur which can be detrimental to the risk prediction model. In this paper, alternatively, we focus on designing relative questions that can be easily answered by domain experts. The proposed relative queries take the form of “Is patient A or patient B more similar to patient C?”, which can be answered by medical experts with more confidence. These questions poll relative information as opposed to absolute information, and even can be answered by non-experts in some cases. In this paper we propose an interactive patient risk prediction method, which actively queries medical experts with the relative similarity of patients. We explore our method on both benchmark and real clinic datasets, and make several interesting discoveries including that querying relative similarities is effective in patient risk prediction, and sometimes can even yield better prediction accuracy than asking for absolute questions.

Vorheriger Artikel Data mining for censored time-to-event data: a Bayesian network model for predicting cardiovascular risk from electronic health record data

Nächster Artikel Constrained elastic net based knowledge transfer for healthcare information exchange

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Asuncion A, Newman D (2007) Uci machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html

Cebron N, Berthold MR (2009) Active learning for object classification: from exploration to exploitation. Data Min Knowl Discov 18(2):283–299MathSciNetCrossRef

Chattopadhyay R, Wang Z, Fan W, Davidson I, Panchanathan S, Ye J (2012) Batch mode active sampling based on marginal probability distribution matching. In: KDD, pp 741–749

Chen Y, Carroll RJ, Hinz ERM, Shah A, Eyler AE, Denny JC, Xu H (2013) Applying active learning to high-throughput phenotyping algorithms for electronic health records data. JAMIA 20:e253–e259

Culotta A, McCallum A (2005) Reducing labeling effort for structured prediction tasks. In: Proceedings of the 20th national conference on artificial intelligence—vol 2, AAAI’05. AAAI Press, Menlo Park, pp 746–751

Davis DA, Chawla NV, Christakis NA, Barabási AL (2010) Time to care: a collaborative engine for practical disease prediction. Data Min Knowl Discov 20(3):388–415. doi:10.1007/s10618-009-0156-z

Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: Proceedings of the 25th international conference on very large data bases, VLDB ’99. Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 518–529

Gionis A, Lappas T, Terzi E (2012) Estimating entity importance via counting set covers. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12. ACM, New York, NY, pp 687–695

Guo Y, Greiner R (2007) Optimistic active learning using mutual information. In: Proceedings of the 20th international joint conference on artifical intelligence, IJCAI’07, pp 823–829

Hoi SCH, Jin R, Zhu J, Lyu MR (2006) Batch mode active learning and its application to medical image classification. In: Proceedings of the 23rd international conference on machine learning, ICML ’06. ACM, New York, NY, pp 417–424. doi:10.1145/1143844.1143897

Ipeirotis PG, Provost FJ, Sheng VS, Wang J (2014) Repeated labeling using multiple noisy labelers. Data Min Knowl Discov 28(2):402–441MATHMathSciNetCrossRef

Kapoor A, Horvitz E, Basu S (2007) Selective supervision: guiding supervised learning with decision-theoretic active learning. In: IJCAI, pp 877–882

Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’94. Springer-Verlag New York Inc, New York, NY, pp 3–12

Melville P, Mooney RJ (2004) Diverse ensembles for active learning. In: Proceedings of the twenty-first international conference on machine learning, ICML ’04. ACM, New York, NY, pp 74–81

Muslea I, Minton S, Knoblock C (2000) Selective sampling with redundant views. In: Proceedings of the national conference on artificial intelligence

Norén GN, Hopstadius J, Bate A, Star K, Edwards IR (2010) Temporal pattern discovery in longitudinal electronic patient records. Data Min Knowl Discov 20(3):361–387. doi:10.1007/s10618-009-0152-3

Panigrahy R (2008) An improved algorithm finding nearest neighbor using kd-trees. In: Proceedings of the 8th Latin American conference on theoretical informatics, LATIN’08. Springer-Verlag, Berlin, Heidelberg, pp 387–398

Qian B, Li H, Wang J, Wang X, Davidson I (2013a) Active learning to rank using pairwise supervision. In: SDM, pp 297–305

Qian B, Wang X, Wang J, Li H, Cao N, Zhi W, Davidson I (2013b) Fast pairwise query selection for large-scale active learning to rank. In: ICDM, pp 607–616

Rashidi P, Cook DJ (2011) Ask me better questions: active learning queries based on rule induction. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’11. ACM, New York, NY, pp 904–912. doi:10.1145/2020408.2020559

Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326CrossRef

Roy N, Mccallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In: Proceedings of 18th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 441–448

Settles B (2009) Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison

Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: EMNLP, pp 1070–1079

Settles B, Craven M, Ray S (2008) Multiple-instance active learning. In: Advances in neural information processing systems NIPS. MIT Press, Cambridge, pp 1289–1296

Sun J, Wang F, Hu J, Edabollahi S (2012) Supervised patient similarity measure of heterogeneous patient records. SIGKDD Explor 14(1):16–24CrossRef

Wang F, Zhang C (2006) Label propagation through linear neighborhoods. In: Proceedings of the 23rd international conference on machine learning, ICML’06. ACM, New York, NY, pp 985–992. doi:10.1145/1143844.1143968

Wang F, Sun J, Ebadollahi S (2012) Composite distance metric integration by leveraging multiple experts’ inputs and its application in patient similarity assessment. Stat Anal Data Min 5(1):54–69MathSciNetCrossRef

Wang X, Wang F, Wang J, Qian B, Hu J (2013) Exploring patient risk groups with incomplete knowledge. In: ICDM, pp 1223–1228

Wauthier FL, Jojic N, Jordan MI (2012) Active spectral clustering via iterative uncertainty reduction. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’12. ACM, New York, NY, pp 1339–1347

Wu J, Roy J, Stewart WF (2010) Prediction modeling using ehr data: challenges, strategies, and a comparison of machine learning approaches. Med care 48(6):S106–S113CrossRef

Zhang T, Oles FJ (2000) A probability analysis on the value of unlabeled data for classification problems. In: Proceedings 17th international conference on machine learning, pp 1191–1198

Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2003) Learning with local and global consistency. In: NIPS

Zhou J, Sun J, Liu Y, Hu J, Ye J (2013) Patient risk prediction model via top-k stability selection. In: SDM, pp 55–63

Zhu X, Ghahramani Z, Lafferty JD (2003a) Semi-supervised learning using gaussian fields and harmonic functions. In: ICML, pp 912–919

Zhu X, Lafferty J, Ghahramani Z (2003b) Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In: ICML 2003 workshop on the continuum from labeled to unlabeled data in machine learning and data mining, pp 58–65

Zhuang H, Tang J, Tang W, Lou T, Chin A, Wang X (2012) Actively learning to infer social ties. Data Min Knowl Discov 25(2):270–297MATHMathSciNetCrossRef

Titel: A relative similarity based method for interactive patient risk prediction
verfasst von: Buyue Qian
Xiang Wang
Nan Cao
Hongfei Li
Yu-Gang Jiang
Publikationsdatum: 01.07.2015
Verlag: Springer US
Erschienen in: Data Mining and Knowledge Discovery / Ausgabe 4/2015
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI: https://doi.org/10.1007/s10618-014-0379-5

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2015

Classification-driven temporal discretization of multivariate time series

Generative modeling of repositories of health records for predictive tasks

Guest editorial: Special issue on data mining for medicine and healthcare

Constrained elastic net based knowledge transfer for healthcare information exchange

Mining strong relevance between heterogeneous entities from unstructured biomedical data

Data mining for censored time-to-event data: a Bayesian network model for predicting cardiovascular risk from electronic health record data

Premium Partner