Skip to main content
Erschienen in: Information Systems Frontiers 4/2009

01.09.2009

Interactive survival analysis with the OCDM system: From development to application

verfasst von: Sebastian Klenk, Jürgen Dippon, Peter Fritz, Gunther Heidemann

Erschienen in: Information Systems Frontiers | Ausgabe 4/2009

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Medical data mining is currently actively pursued in computer science and statistical research but not in medical practice. The reasons therefore lie in the difficulties of handling and statistically analyzing medical data. We have developed a system that allows practitioners in the field to interactively analyze their data without assistance of statisticians or data mining experts. In the course of this paper we will introduce data mining of medical data and show how this can be achieved for survival data. We will demonstrate how to solve common problems of interactive survival analysis by presenting the Online Clinical Data Mining (OCDM) system. Thereby the main focus is on similarity based queries, a new method to select similar cases based on their covariables and the influence of these on their survival.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
OnLine Analytical Processing—A hypothesis driven, dimensional approach to decision support (Kimball 1996).
 
2
The terms knowledge discovery in databases (KDD) and data mining (DM) are used in accordance with Fayyad et al. (1996). When it comes to actual software systems we will use the terms data mining or knowledge discovery system variantly.
 
3
In this paper we use the term data warehouse in a broader sense, as a storage system for a large number of information from different sources.
 
4
A non-parametric method to estimate the conditional expectation of a random variable Y, given the value x of its covariate, by a locally weighted average of the observations Y i related to the vicinity of x which is moderated by a kernel function and a bandwidth, see Hastie et al. (2002), e.g.
 
Literatur
Zurück zum Zitat Abe, H., Yokoi, H., Ohsaki, M., & Yamaguchi, T. (2007). Developing an integrated time-series data mining environment for medical data mining. In Data mining workshops, 2007 ICDM workshops 2007 seventh IEEE international conference (pp. 127–132). Abe, H., Yokoi, H., Ohsaki, M., & Yamaguchi, T. (2007). Developing an integrated time-series data mining environment for medical data mining. In Data mining workshops, 2007 ICDM workshops 2007 seventh IEEE international conference (pp. 127–132).
Zurück zum Zitat Ahmad, I., & Ran, I. (2004). Data based bandwidth selection in kernel density estimation with parametric start via kernel contrasts. Journal of Nonparametric Statistics, 16(37), 841–877.CrossRef Ahmad, I., & Ran, I. (2004). Data based bandwidth selection in kernel density estimation with parametric start via kernel contrasts. Journal of Nonparametric Statistics, 16(37), 841–877.CrossRef
Zurück zum Zitat Black, N. (2003). Using clinical databases in practice. Basic Music Journal, 326(7379), 2–3. Black, N. (2003). Using clinical databases in practice. Basic Music Journal, 326(7379), 2–3.
Zurück zum Zitat Brameier, M., & Banzhaf, W. (2001). A comparison of linear genetic programming and neural networks in medical data mining. IEEE Transactions on Evolutionary Computation, 5(1), 17–26.CrossRef Brameier, M., & Banzhaf, W. (2001). A comparison of linear genetic programming and neural networks in medical data mining. IEEE Transactions on Evolutionary Computation, 5(1), 17–26.CrossRef
Zurück zum Zitat Cherkassky, V. (2007). Learning from data, 2nd edn. New York: Wiley. Cherkassky, V. (2007). Learning from data, 2nd edn. New York: Wiley.
Zurück zum Zitat Cios, K. J., & William, M. G. (2002). Uniqueness of medical data mining. Artificial Intelligence in Medicine, 26(1–2), 1–24.CrossRef Cios, K. J., & William, M. G. (2002). Uniqueness of medical data mining. Artificial Intelligence in Medicine, 26(1–2), 1–24.CrossRef
Zurück zum Zitat Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society Series B (Methodological), 34(3), 187–220. Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society Series B (Methodological), 34(3), 187–220.
Zurück zum Zitat Date, C. J. (2002). Introduction to database systems. Boston: Addison-Wesley Longman. Date, C. J. (2002). Introduction to database systems. Boston: Addison-Wesley Longman.
Zurück zum Zitat Delen, D., Walker, G., & Kadam, A. (2005). Predicting breast cancer survivability: A comparison of three data mining methods. Artificial Intelligence in Medicine, 34(3), 113–127.CrossRef Delen, D., Walker, G., & Kadam, A. (2005). Predicting breast cancer survivability: A comparison of three data mining methods. Artificial Intelligence in Medicine, 34(3), 113–127.CrossRef
Zurück zum Zitat Dippon, J., Fritz, P., & Kohler, M. (2002). A statistical approach to case based reasoning, with application to breast cancer data. Computational Statistics & Data Analysis, 40(3), 579–602.CrossRef Dippon, J., Fritz, P., & Kohler, M. (2002). A statistical approach to case based reasoning, with application to breast cancer data. Computational Statistics & Data Analysis, 40(3), 579–602.CrossRef
Zurück zum Zitat Dyreson, C., Grandi, F., Käfer, W., Kline, N., Lorentzos, N., Mitsopoulos, Y. et al. (1994). A consensus glossary of temporal database concepts. ACM SIGMOD Rec, 23(1), 52–64.CrossRef Dyreson, C., Grandi, F., Käfer, W., Kline, N., Lorentzos, N., Mitsopoulos, Y. et al. (1994). A consensus glossary of temporal database concepts. ACM SIGMOD Rec, 23(1), 52–64.CrossRef
Zurück zum Zitat Eggebraaten, T. J., Tenner, J. W., & Dubbels, J. C. (2007). A health-care data model based on the hl7 reference information model. IBM Systems Journal, 46(1), 5–18. Eggebraaten, T. J., Tenner, J. W., & Dubbels, J. C. (2007). A health-care data model based on the hl7 reference information model. IBM Systems Journal, 46(1), 5–18.
Zurück zum Zitat Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. Ai Magazine, 17, 37–54. Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. Ai Magazine, 17, 37–54.
Zurück zum Zitat Fung, G., Yu, S., Dehing-Oberije, C., Ruysscher, D. D., Lambin, P., Krishnan, S. et al. (2008). Privacy-preserving predictive models for lung cancer survival analisys. In Privacy-preserving workshop at the SIAM data mining conference 2008. Fung, G., Yu, S., Dehing-Oberije, C., Ruysscher, D. D., Lambin, P., Krishnan, S. et al. (2008). Privacy-preserving predictive models for lung cancer survival analisys. In Privacy-preserving workshop at the SIAM data mining conference 2008.
Zurück zum Zitat Ghannad-Rezaie, M., Soltanain-Zadeh, H., Siadat, M. R., & Elisevich, K. (2006). Medical data mining using particle swarm optimization for temporal lobe epilepsy. Evolutionary Computation, 2006 CEC 2006 IEEE Congress on pp. 761–768. Ghannad-Rezaie, M., Soltanain-Zadeh, H., Siadat, M. R., & Elisevich, K. (2006). Medical data mining using particle swarm optimization for temporal lobe epilepsy. Evolutionary Computation, 2006 CEC 2006 IEEE Congress on pp. 761–768.
Zurück zum Zitat Györfi, L., Kohler, M., Krzyzak, A., & Walk, H. (2002). A distribution-free theory of nonparametric regression. New York: Springer. Györfi, L., Kohler, M., Krzyzak, A., & Walk, H. (2002). A distribution-free theory of nonparametric regression. New York: Springer.
Zurück zum Zitat Han, J., & Kamber, M. (2001). Data mining. San Francisco: Morgan Kaufmann. Han, J., & Kamber, M. (2001). Data mining. San Francisco: Morgan Kaufmann.
Zurück zum Zitat Hastie, T. J., Tibshirani, R. J., & Friedman, J. H. (2002). The elements of statistical learning, corrected print. edn. New York: Springer. Hastie, T. J., Tibshirani, R. J., & Friedman, J. H. (2002). The elements of statistical learning, corrected print. edn. New York: Springer.
Zurück zum Zitat Hoover, D. R., & He, Y. (1994). Nonidentified responses in a proportional hazards setting. Biometrics, 50(1), 1–10.CrossRef Hoover, D. R., & He, Y. (1994). Nonidentified responses in a proportional hazards setting. Biometrics, 50(1), 1–10.CrossRef
Zurück zum Zitat Houston, A. L., Chen, H., Hubbard, S. M., Schatz, B. R., Ng, T. D., Sewell, R. R., et al. (1999). Medical data mining on the internet: Research on a cancer information system. Artificial Intelligence Review, 13(5–6), 437–466.CrossRef Houston, A. L., Chen, H., Hubbard, S. M., Schatz, B. R., Ng, T. D., Sewell, R. R., et al. (1999). Medical data mining on the internet: Research on a cancer information system. Artificial Intelligence Review, 13(5–6), 437–466.CrossRef
Zurück zum Zitat Inokuchi, A., Takeda, K., Inaoka, N., & Wakao, F. (2007). Medtakmi-cdi: Interactive knowledge discovery for clinical decision intelligence. IBM Systems Journal, 46(1), 115–133. Inokuchi, A., Takeda, K., Inaoka, N., & Wakao, F. (2007). Medtakmi-cdi: Interactive knowledge discovery for clinical decision intelligence. IBM Systems Journal, 46(1), 115–133.
Zurück zum Zitat Kimball, R. (1996). The data warehouse toolkit. New York: Wiley. Kimball, R. (1996). The data warehouse toolkit. New York: Wiley.
Zurück zum Zitat Klein, J. P, & Moeschberger, M. L. (2005). Survival analysis, 2nd edn. New York: Springer. Klein, J. P, & Moeschberger, M. L. (2005). Survival analysis, 2nd edn. New York: Springer.
Zurück zum Zitat Kleinbaum, D. G., & Klein, M. (2005). Survival analysis, 2nd edn. New York: Springer. Kleinbaum, D. G., & Klein, M. (2005). Survival analysis, 2nd edn. New York: Springer.
Zurück zum Zitat Lundin, J., Lundin, M., Isola, J., & Joensuu, H. (2003). Infopoints: A web-based system for individualised survival estimation in breast cancer. Basic Music Journal, 326(7379), 29 Lundin, J., Lundin, M., Isola, J., & Joensuu, H. (2003). Infopoints: A web-based system for individualised survival estimation in breast cancer. Basic Music Journal, 326(7379), 29
Zurück zum Zitat McAullay, D., Williams, G., Chen, J., Jin, H., He, H., Sparks, R., et al. (2005). A delivery framework for health data mining and analytics. In ACSC ’05: Proceedings of the twenty-eighth Australasian conference on computer science (pp. 381–387). Darlinghurst: Australian Computer Society. McAullay, D., Williams, G., Chen, J., Jin, H., He, H., Sparks, R., et al. (2005). A delivery framework for health data mining and analytics. In ACSC ’05: Proceedings of the twenty-eighth Australasian conference on computer science (pp. 381–387). Darlinghurst: Australian Computer Society.
Zurück zum Zitat Meinicke, P., Brodag, T., Fricke, W. F., & Waack, S. (2006). P-value based visualization of codon usage data. Algorithms for Molecular Biology, 1, 10.CrossRef Meinicke, P., Brodag, T., Fricke, W. F., & Waack, S. (2006). P-value based visualization of codon usage data. Algorithms for Molecular Biology, 1, 10.CrossRef
Zurück zum Zitat Mullins, I. M., Siadaty, M. S., Lyman, J., Scully, K., Garrett, C. T., Miller W. G. et al. (2006). Data mining and clinical data repositories: Insights from a 667,000 patient data set. Computers in Biology and Medicine, 36(12), 1351–1377.CrossRef Mullins, I. M., Siadaty, M. S., Lyman, J., Scully, K., Garrett, C. T., Miller W. G. et al. (2006). Data mining and clinical data repositories: Insights from a 667,000 patient data set. Computers in Biology and Medicine, 36(12), 1351–1377.CrossRef
Zurück zum Zitat Ölund, G., Lindqvist, P., & Litton, J. E. (2007). Bims: An information management system for biobanking in the 21st century. IBM Systems Journal, 46(1), 171–182.CrossRef Ölund, G., Lindqvist, P., & Litton, J. E. (2007). Bims: An information management system for biobanking in the 21st century. IBM Systems Journal, 46(1), 171–182.CrossRef
Zurück zum Zitat Pedersen, T. B., & Jensen, C. S. (1998). Research issues in clinical data warehousing. In SSDBM ’98: Proceedings of the 10th international conference on scientific and statistical database management, IEEE computer society (pp. 43–52). Washington, DC, USA. Pedersen, T. B., & Jensen, C. S. (1998). Research issues in clinical data warehousing. In SSDBM ’98: Proceedings of the 10th international conference on scientific and statistical database management, IEEE computer society (pp. 43–52). Washington, DC, USA.
Zurück zum Zitat Pedersen, T. B., & Jensen, C. S. (1999). Multidimensional data modeling for complex data. In ICDE ’99: Proceedings of the 15th international conference on data engineering, IEEE computer society (p. 336). Washington, DC, USA. Pedersen, T. B., & Jensen, C. S. (1999). Multidimensional data modeling for complex data. In ICDE ’99: Proceedings of the 15th international conference on data engineering, IEEE computer society (p. 336). Washington, DC, USA.
Zurück zum Zitat R Development Core Team (2008) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org, ISBN 3-900051-07-0. R Development Core Team (2008) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, http://​www.​R-project.​org, ISBN 3-900051-07-0.
Zurück zum Zitat Radespiel-Tröger, M., Rabenstein, T., Schneider, H. T., & Lausen, B. (2003). Comparison of tree-based methods for prognostic stratification of survival data. Artificial Intelligence in Medicine, 28(3), 323–341.CrossRef Radespiel-Tröger, M., Rabenstein, T., Schneider, H. T., & Lausen, B. (2003). Comparison of tree-based methods for prognostic stratification of survival data. Artificial Intelligence in Medicine, 28(3), 323–341.CrossRef
Zurück zum Zitat Russell, S. J., & Norvig, P. (2003). Artificial intelligence, 2nd edn. Englewood Cliffs: Prentice Hall. Russell, S. J., & Norvig, P. (2003). Artificial intelligence, 2nd edn. Englewood Cliffs: Prentice Hall.
Metadaten
Titel
Interactive survival analysis with the OCDM system: From development to application
verfasst von
Sebastian Klenk
Jürgen Dippon
Peter Fritz
Gunther Heidemann
Publikationsdatum
01.09.2009
Verlag
Springer US
Erschienen in
Information Systems Frontiers / Ausgabe 4/2009
Print ISSN: 1387-3326
Elektronische ISSN: 1572-9419
DOI
https://doi.org/10.1007/s10796-009-9152-5

Weitere Artikel der Ausgabe 4/2009

Information Systems Frontiers 4/2009 Zur Ausgabe