Top

Published in:

2019 | OriginalPaper | Chapter

Generating Reward Functions Using IRL Towards Individualized Cancer Screening

Authors : Panayiotis Petousis, Simon X. Han, William Hsu, Alex A. T. Bui

Published in: Artificial Intelligence in Health

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Cancer screening can benefit from individualized decision-making tools that decrease overdiagnosis. The heterogeneity of cancer screening participants advocates the need for more personalized methods. Partially observable Markov decision processes (POMDPs), when defined with an appropriate reward function, can be used to suggest optimal, individualized screening policies. However, determining an appropriate reward function can be challenging. Here, we propose the use of inverse reinforcement learning (IRL) to form rewards functions for lung and breast cancer screening POMDPs. Using experts (physicians) retrospective screening decisions for lung and breast cancer screening, we developed two POMDP models with corresponding reward functions. Specifically, the maximum entropy (MaxEnt) IRL algorithm with an adaptive step size was employed to learn rewards more efficiently; and combined with a multiplicative model to learn state-action pair rewards for a POMDP. The POMDP screening models were evaluated based on their ability to recommend appropriate screening decisions before the diagnosis of cancer. The reward functions learned with the MaxEnt IRL algorithm, when combined with POMDP models in lung and breast cancer screening, demonstrate performance comparable to experts. The Cohen’s Kappa score of agreement between the POMDPs and physicians’ predictions was high in breast cancer and had a decreasing trend in lung cancer.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Identification of Serious Illness Conversations in Unstructured Clinical Notes Using Deep Neural Networks

next chapter Deep Learning Architectures for Vector Representations of Patients and Exploring Predictors of 30-Day Hospital Readmissions in Patients with Multiple Chronic Conditions

Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Twenty-First International Conference on Machine learning - ICML 2004, p. 1 (2004). https://doi.org/10.1145/1015330.1015430

Alaa, A.M., Moon, K.H., Hsu, W., Van Der Schaar, M.: ConfidentCare: a clinical decision support system for personalized breast cancer screening. IEEE Trans. Multimedia 18(10), 1942–1955 (2016). https://doi.org/10.1109/TMM.2016.2589160, http://arxiv.org/abs/1602.00374CrossRef

Alger, M.: Deep inverse reinforcement learning. Technical report (2016). https://matthewja.com/pdfs/irl.pdf

Babeş-Vroman, M., Marivate, V., Subramanian, K., Littman, M.: Apprenticeship learning about multiple intentions. In: Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pp. 897–904 (2011)

Bennett, C.C., Hauser, K.: Artificial intelligence framework for simulating clinical decision-making: a Markov decision process approach. Artif. Intell. Med. 57(1), 919 (2013). https://doi.org/10.1016/j.artmed.2012.12.003CrossRef

Burnside, E.S., et al.: Probabilistic computer model developed from clinical data in national mammography database format to classify mammographic findings. Radiology 251(3), 663–672 (2009). https://doi.org/10.1148/radiol.2513081346CrossRef

Chelsea Finn: Deep RL Bootcamp Lecture 10B Inverse Reinforcement Learning - YouTube (2017). https://www.youtube.com/watch?v=d9DlQSJQAoI&t=1012s

Cuaya, G., et al.: A dynamic Bayesian network for estimating the risk of falls from real gait data. Med. Biol. Eng. Comput. 51(1–2), 29–37 (2013). https://doi.org/10.1007/s11517-012-0960-2CrossRef

D’Orsi, C.J.: ACR BI-RADS Atlas: Breast Imaging Reporting and Data System. American College of Radiology, Reston (2013)

10.

Elson, S., Hiatt, R., Anton, C.: The Athena breast health network: developing a rapid learning system in breast cancer prevention, screening, treatment, and care. Breast Cancer Res. Treat. 140, 417–425 (2013). https://doi.org/10.1007/s10549-013-2612-0CrossRef

11.

Goulionis, J.E., Vozikis, A., Benos, V.K., Nikolakis, D.: On the decision rules of cost-effective treatment for patients with diabetic foot syndrome. ClinicoEconomics Outcomes Res. 2(1), 121–126 (2010). https://doi.org/10.2147/CEOR.S11981CrossRef

12.

Hauskrecht, M., Fraser, H.: Planning treatment of ischemic heart disease with partially observable Markov decision processes. Artif. Intell. Med. 18(3), 221–244 (2000). https://doi.org/10.1016/S0933-3657(99)00042-1CrossRef

13.

Hauskrecht, M., Milos, H.: Dynamic decision making in stochastic partially observable medical domains: Ischemic heart disease example. In: Keravnou, E., Garbay, C., Baud, R., Wyatt, J. (eds.) AIME 1997. LNCS, pp. 296–299. Springer, Heidelberg (1997). https://doi.org/10.1007/bfb0029462CrossRef

14.

Van der Heijden, M., Velikova, M., Lucas, P.J.F.: Learning Bayesian networks for clinical time series analysis. J. Biomed. Inform. 48, 94–105 (2014). https://doi.org/10.1016/j.jbi.2013.12.007CrossRef

15.

Klein, S., Pluim, J.P., Staring, M., Viergever, M.A.: Adaptive stochastic gradient descent optimisation for image registration. Int. J. Comput. Vis. 81(3), 227–239 (2009). https://doi.org/10.1007/s11263-008-0168-yCrossRef

16.

Maillart, L.M., Ivy, J.S., Ransom, S., Diehl, K.: Assessing dynamic breast cancer screening policies. Oper. Res. 56(6), 1411–1427 (2008). https://doi.org/10.1287/opre.1080.0614CrossRef

17.

National Lung Screening Trial Research Team, et al.: Reduced lung-cancer mortality with low-dose computed tomographic screening. N. Engl. J. Med. 365(5), 395–409 (2011). https://doi.org/10.1056/NEJMoa1102873

18.

Ng, A.Y., Russell, S.: Algorithms for inverse reinforcement learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 663–670 (2000). https://doi.org/10.2460/ajvr.67.2.323CrossRef

19.

Petousis, P., Han, S.X., Aberle, D., Bui, A.A.: Prediction of lung cancer incidence on the low-dose computed tomography arm of the National Lung screening trial: a dynamic Bayesian network. Artif. Intell. Med. 72, 42–55 (2016). https://doi.org/10.1016/j.artmed.2016.07.001CrossRef

20.

Schaefer, A.J., Bailey, M.D., Shechter, S.M., Roberts, M.S.: Modeling medical treatment using Markov decision processes. In: Brandeau, M.L., Sainfort, F., Pierskalla, W.P. (eds.) Operations Research and Health Care, pp. 597–616. Springer, Heidelberg (2005). https://doi.org/10.1007/1-4020-8066-2_23CrossRef

21.

Thrun, S., Burgard, W., Fox, D.: Probabilistic robotics (2006). https://doi.org/10.1145/504729.504754

22.

Tusch, G.: Optimal sequential decisions in liver transplantation based on a POMDP model. In: ECAI, pp. 186–190 (2000)

23.

Vroman, M.C.: Maximum likelihood inverse reinforcement learning. Ph.D. thesis (2014)

24.

Watt, E.W., Bui, A.A.T.: Evaluation of a dynamic Bayesian belief network to predict osteoarthritic knee pain using data from the osteoarthritis initiative. In: AMIA 2008 Symposium, pp. 788–92 (2008). http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2656041&tool=pmcentrez&rendertype=abstract

25.

Ziebart, B.: Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Ph.D. thesis (2010). http://www.cs.cmu.edu/~bziebart/publications/thesis-bziebart.pdf

26.

Ziebart, B.D., Maas, A., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: AAAI Conference on Artificial Intelligence, pp. 1433–1438 (2008)

Title: Generating Reward Functions Using IRL Towards Individualized Cancer Screening
Authors: Panayiotis Petousis
Simon X. Han
William Hsu
Alex A. T. Bui
Publisher: Springer International Publishing
Book: Artificial Intelligence in Health
Print ISBN: 978-3-030-12737-4

Electronic ISBN: 978-3-030-12738-1

Copyright Year: 2019
DOI: https://doi.org/10.1007/978-3-030-12738-1_16

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner