Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 4/2015

01.07.2015

A relative similarity based method for interactive patient risk prediction

verfasst von: Buyue Qian, Xiang Wang, Nan Cao, Hongfei Li, Yu-Gang Jiang

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 4/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper investigates the patient risk prediction problem in the context of active learning with relative similarities. Active learning has been extensively studied and successfully applied to solve real problems. The typical setting of active learning methods is to query absolute questions. In a medical application where the goal is to predict the risk of patients on certain disease using Electronic Health Records (EHR), the absolute questions take the form of “Will this patient suffer from Alzheimer’s later in his/her life?”, or “Are these two patients similar or not?”. Due to the excessive requirements of domain knowledge, such absolute questions are usually difficult to answer, even for experienced medical experts. In addition, the performance of absolute question focused active learning methods is less stable, since incorrect answers often occur which can be detrimental to the risk prediction model. In this paper, alternatively, we focus on designing relative questions that can be easily answered by domain experts. The proposed relative queries take the form of “Is patient A or patient B more similar to patient C?”, which can be answered by medical experts with more confidence. These questions poll relative information as opposed to absolute information, and even can be answered by non-experts in some cases. In this paper we propose an interactive patient risk prediction method, which actively queries medical experts with the relative similarity of patients. We explore our method on both benchmark and real clinic datasets, and make several interesting discoveries including that querying relative similarities is effective in patient risk prediction, and sometimes can even yield better prediction accuracy than asking for absolute questions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Cebron N, Berthold MR (2009) Active learning for object classification: from exploration to exploitation. Data Min Knowl Discov 18(2):283–299MathSciNetCrossRef Cebron N, Berthold MR (2009) Active learning for object classification: from exploration to exploitation. Data Min Knowl Discov 18(2):283–299MathSciNetCrossRef
Zurück zum Zitat Chattopadhyay R, Wang Z, Fan W, Davidson I, Panchanathan S, Ye J (2012) Batch mode active sampling based on marginal probability distribution matching. In: KDD, pp 741–749 Chattopadhyay R, Wang Z, Fan W, Davidson I, Panchanathan S, Ye J (2012) Batch mode active sampling based on marginal probability distribution matching. In: KDD, pp 741–749
Zurück zum Zitat Chen Y, Carroll RJ, Hinz ERM, Shah A, Eyler AE, Denny JC, Xu H (2013) Applying active learning to high-throughput phenotyping algorithms for electronic health records data. JAMIA 20:e253–e259 Chen Y, Carroll RJ, Hinz ERM, Shah A, Eyler AE, Denny JC, Xu H (2013) Applying active learning to high-throughput phenotyping algorithms for electronic health records data. JAMIA 20:e253–e259
Zurück zum Zitat Culotta A, McCallum A (2005) Reducing labeling effort for structured prediction tasks. In: Proceedings of the 20th national conference on artificial intelligence—vol 2, AAAI’05. AAAI Press, Menlo Park, pp 746–751 Culotta A, McCallum A (2005) Reducing labeling effort for structured prediction tasks. In: Proceedings of the 20th national conference on artificial intelligence—vol 2, AAAI’05. AAAI Press, Menlo Park, pp 746–751
Zurück zum Zitat Davis DA, Chawla NV, Christakis NA, Barabási AL (2010) Time to care: a collaborative engine for practical disease prediction. Data Min Knowl Discov 20(3):388–415. doi:10.1007/s10618-009-0156-z Davis DA, Chawla NV, Christakis NA, Barabási AL (2010) Time to care: a collaborative engine for practical disease prediction. Data Min Knowl Discov 20(3):388–415. doi:10.​1007/​s10618-009-0156-z
Zurück zum Zitat Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: Proceedings of the 25th international conference on very large data bases, VLDB ’99. Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 518–529 Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: Proceedings of the 25th international conference on very large data bases, VLDB ’99. Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 518–529
Zurück zum Zitat Gionis A, Lappas T, Terzi E (2012) Estimating entity importance via counting set covers. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12. ACM, New York, NY, pp 687–695 Gionis A, Lappas T, Terzi E (2012) Estimating entity importance via counting set covers. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12. ACM, New York, NY, pp 687–695
Zurück zum Zitat Guo Y, Greiner R (2007) Optimistic active learning using mutual information. In: Proceedings of the 20th international joint conference on artifical intelligence, IJCAI’07, pp 823–829 Guo Y, Greiner R (2007) Optimistic active learning using mutual information. In: Proceedings of the 20th international joint conference on artifical intelligence, IJCAI’07, pp 823–829
Zurück zum Zitat Hoi SCH, Jin R, Zhu J, Lyu MR (2006) Batch mode active learning and its application to medical image classification. In: Proceedings of the 23rd international conference on machine learning, ICML ’06. ACM, New York, NY, pp 417–424. doi:10.1145/1143844.1143897 Hoi SCH, Jin R, Zhu J, Lyu MR (2006) Batch mode active learning and its application to medical image classification. In: Proceedings of the 23rd international conference on machine learning, ICML ’06. ACM, New York, NY, pp 417–424. doi:10.​1145/​1143844.​1143897
Zurück zum Zitat Ipeirotis PG, Provost FJ, Sheng VS, Wang J (2014) Repeated labeling using multiple noisy labelers. Data Min Knowl Discov 28(2):402–441MATHMathSciNetCrossRef Ipeirotis PG, Provost FJ, Sheng VS, Wang J (2014) Repeated labeling using multiple noisy labelers. Data Min Knowl Discov 28(2):402–441MATHMathSciNetCrossRef
Zurück zum Zitat Kapoor A, Horvitz E, Basu S (2007) Selective supervision: guiding supervised learning with decision-theoretic active learning. In: IJCAI, pp 877–882 Kapoor A, Horvitz E, Basu S (2007) Selective supervision: guiding supervised learning with decision-theoretic active learning. In: IJCAI, pp 877–882
Zurück zum Zitat Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’94. Springer-Verlag New York Inc, New York, NY, pp 3–12 Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’94. Springer-Verlag New York Inc, New York, NY, pp 3–12
Zurück zum Zitat Melville P, Mooney RJ (2004) Diverse ensembles for active learning. In: Proceedings of the twenty-first international conference on machine learning, ICML ’04. ACM, New York, NY, pp 74–81 Melville P, Mooney RJ (2004) Diverse ensembles for active learning. In: Proceedings of the twenty-first international conference on machine learning, ICML ’04. ACM, New York, NY, pp 74–81
Zurück zum Zitat Muslea I, Minton S, Knoblock C (2000) Selective sampling with redundant views. In: Proceedings of the national conference on artificial intelligence Muslea I, Minton S, Knoblock C (2000) Selective sampling with redundant views. In: Proceedings of the national conference on artificial intelligence
Zurück zum Zitat Norén GN, Hopstadius J, Bate A, Star K, Edwards IR (2010) Temporal pattern discovery in longitudinal electronic patient records. Data Min Knowl Discov 20(3):361–387. doi:10.1007/s10618-009-0152-3 Norén GN, Hopstadius J, Bate A, Star K, Edwards IR (2010) Temporal pattern discovery in longitudinal electronic patient records. Data Min Knowl Discov 20(3):361–387. doi:10.​1007/​s10618-009-0152-3
Zurück zum Zitat Panigrahy R (2008) An improved algorithm finding nearest neighbor using kd-trees. In: Proceedings of the 8th Latin American conference on theoretical informatics, LATIN’08. Springer-Verlag, Berlin, Heidelberg, pp 387–398 Panigrahy R (2008) An improved algorithm finding nearest neighbor using kd-trees. In: Proceedings of the 8th Latin American conference on theoretical informatics, LATIN’08. Springer-Verlag, Berlin, Heidelberg, pp 387–398
Zurück zum Zitat Qian B, Li H, Wang J, Wang X, Davidson I (2013a) Active learning to rank using pairwise supervision. In: SDM, pp 297–305 Qian B, Li H, Wang J, Wang X, Davidson I (2013a) Active learning to rank using pairwise supervision. In: SDM, pp 297–305
Zurück zum Zitat Qian B, Wang X, Wang J, Li H, Cao N, Zhi W, Davidson I (2013b) Fast pairwise query selection for large-scale active learning to rank. In: ICDM, pp 607–616 Qian B, Wang X, Wang J, Li H, Cao N, Zhi W, Davidson I (2013b) Fast pairwise query selection for large-scale active learning to rank. In: ICDM, pp 607–616
Zurück zum Zitat Rashidi P, Cook DJ (2011) Ask me better questions: active learning queries based on rule induction. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’11. ACM, New York, NY, pp 904–912. doi:10.1145/2020408.2020559 Rashidi P, Cook DJ (2011) Ask me better questions: active learning queries based on rule induction. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’11. ACM, New York, NY, pp 904–912. doi:10.​1145/​2020408.​2020559
Zurück zum Zitat Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326CrossRef Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326CrossRef
Zurück zum Zitat Roy N, Mccallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In: Proceedings of 18th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 441–448 Roy N, Mccallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In: Proceedings of 18th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 441–448
Zurück zum Zitat Settles B (2009) Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison Settles B (2009) Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison
Zurück zum Zitat Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: EMNLP, pp 1070–1079 Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: EMNLP, pp 1070–1079
Zurück zum Zitat Settles B, Craven M, Ray S (2008) Multiple-instance active learning. In: Advances in neural information processing systems NIPS. MIT Press, Cambridge, pp 1289–1296 Settles B, Craven M, Ray S (2008) Multiple-instance active learning. In: Advances in neural information processing systems NIPS. MIT Press, Cambridge, pp 1289–1296
Zurück zum Zitat Sun J, Wang F, Hu J, Edabollahi S (2012) Supervised patient similarity measure of heterogeneous patient records. SIGKDD Explor 14(1):16–24CrossRef Sun J, Wang F, Hu J, Edabollahi S (2012) Supervised patient similarity measure of heterogeneous patient records. SIGKDD Explor 14(1):16–24CrossRef
Zurück zum Zitat Wang F, Zhang C (2006) Label propagation through linear neighborhoods. In: Proceedings of the 23rd international conference on machine learning, ICML’06. ACM, New York, NY, pp 985–992. doi:10.1145/1143844.1143968 Wang F, Zhang C (2006) Label propagation through linear neighborhoods. In: Proceedings of the 23rd international conference on machine learning, ICML’06. ACM, New York, NY, pp 985–992. doi:10.​1145/​1143844.​1143968
Zurück zum Zitat Wang F, Sun J, Ebadollahi S (2012) Composite distance metric integration by leveraging multiple experts’ inputs and its application in patient similarity assessment. Stat Anal Data Min 5(1):54–69MathSciNetCrossRef Wang F, Sun J, Ebadollahi S (2012) Composite distance metric integration by leveraging multiple experts’ inputs and its application in patient similarity assessment. Stat Anal Data Min 5(1):54–69MathSciNetCrossRef
Zurück zum Zitat Wang X, Wang F, Wang J, Qian B, Hu J (2013) Exploring patient risk groups with incomplete knowledge. In: ICDM, pp 1223–1228 Wang X, Wang F, Wang J, Qian B, Hu J (2013) Exploring patient risk groups with incomplete knowledge. In: ICDM, pp 1223–1228
Zurück zum Zitat Wauthier FL, Jojic N, Jordan MI (2012) Active spectral clustering via iterative uncertainty reduction. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’12. ACM, New York, NY, pp 1339–1347 Wauthier FL, Jojic N, Jordan MI (2012) Active spectral clustering via iterative uncertainty reduction. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’12. ACM, New York, NY, pp 1339–1347
Zurück zum Zitat Wu J, Roy J, Stewart WF (2010) Prediction modeling using ehr data: challenges, strategies, and a comparison of machine learning approaches. Med care 48(6):S106–S113CrossRef Wu J, Roy J, Stewart WF (2010) Prediction modeling using ehr data: challenges, strategies, and a comparison of machine learning approaches. Med care 48(6):S106–S113CrossRef
Zurück zum Zitat Zhang T, Oles FJ (2000) A probability analysis on the value of unlabeled data for classification problems. In: Proceedings 17th international conference on machine learning, pp 1191–1198 Zhang T, Oles FJ (2000) A probability analysis on the value of unlabeled data for classification problems. In: Proceedings 17th international conference on machine learning, pp 1191–1198
Zurück zum Zitat Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2003) Learning with local and global consistency. In: NIPS Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2003) Learning with local and global consistency. In: NIPS
Zurück zum Zitat Zhou J, Sun J, Liu Y, Hu J, Ye J (2013) Patient risk prediction model via top-k stability selection. In: SDM, pp 55–63 Zhou J, Sun J, Liu Y, Hu J, Ye J (2013) Patient risk prediction model via top-k stability selection. In: SDM, pp 55–63
Zurück zum Zitat Zhu X, Ghahramani Z, Lafferty JD (2003a) Semi-supervised learning using gaussian fields and harmonic functions. In: ICML, pp 912–919 Zhu X, Ghahramani Z, Lafferty JD (2003a) Semi-supervised learning using gaussian fields and harmonic functions. In: ICML, pp 912–919
Zurück zum Zitat Zhu X, Lafferty J, Ghahramani Z (2003b) Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In: ICML 2003 workshop on the continuum from labeled to unlabeled data in machine learning and data mining, pp 58–65 Zhu X, Lafferty J, Ghahramani Z (2003b) Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In: ICML 2003 workshop on the continuum from labeled to unlabeled data in machine learning and data mining, pp 58–65
Zurück zum Zitat Zhuang H, Tang J, Tang W, Lou T, Chin A, Wang X (2012) Actively learning to infer social ties. Data Min Knowl Discov 25(2):270–297MATHMathSciNetCrossRef Zhuang H, Tang J, Tang W, Lou T, Chin A, Wang X (2012) Actively learning to infer social ties. Data Min Knowl Discov 25(2):270–297MATHMathSciNetCrossRef
Metadaten
Titel
A relative similarity based method for interactive patient risk prediction
verfasst von
Buyue Qian
Xiang Wang
Nan Cao
Hongfei Li
Yu-Gang Jiang
Publikationsdatum
01.07.2015
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 4/2015
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-014-0379-5

Weitere Artikel der Ausgabe 4/2015

Data Mining and Knowledge Discovery 4/2015 Zur Ausgabe

Premium Partner