Skip to main content

2020 | OriginalPaper | Buchkapitel

Towards Capturing Sonographic Experience: Cognition-Inspired Ultrasound Video Saliency Prediction

verfasst von : Richard Droste, Yifan Cai, Harshita Sharma, Pierre Chatelain, Aris T. Papageorghiou, J. Alison Noble

Erschienen in: Medical Image Understanding and Analysis

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

For visual tasks like ultrasound (US) scanning, experts direct their gaze towards regions of task-relevant information. Therefore, learning to predict the gaze of sonographers on US videos captures the spatio-temporal patterns that are important for US scanning. The spatial distribution of gaze points on video frames can be represented through heat maps termed saliency maps. Here, we propose a temporally bidirectional model for video saliency prediction (BDS-Net), drawing inspiration from modern theories of human cognition. The model consists of a convolutional neural network (CNN) encoder followed by a bidirectional gated-recurrent-unit recurrent convolutional network (GRU-RCN) decoder. The temporal bidirectionality mimics human cognition, which simultaneously reacts to past and predicts future sensory inputs. We train the BDS-Net alongside spatial and temporally one-directional comparative models on the task of predicting saliency in videos of US abdominal circumference plane detection. The BDS-Net outperforms the comparative models on four out of five saliency metrics. We present a qualitative analysis on representative examples to explain the model’s superior performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
In our implementation, for numerical stability, we compute \(log(\hat{s}^t_i)\) with a log-softmax function instead of computing the softmax and logarithm sequentially.
 
Literatur
1.
Zurück zum Zitat Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. In: NIPS - Deep Learning Symposium (2016) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. In: NIPS - Deep Learning Symposium (2016)
2.
Zurück zum Zitat Bak, C., Kocak, A., Erdem, E., Erdem, A.: Spatio-temporal saliency networks for dynamic saliency prediction. IEEE Trans. Multimed. 20(7), 1688–1698 (2018)CrossRef Bak, C., Kocak, A., Erdem, E., Erdem, A.: Spatio-temporal saliency networks for dynamic saliency prediction. IEEE Trans. Multimed. 20(7), 1688–1698 (2018)CrossRef
3.
Zurück zum Zitat Ballas, N., Yao, L., Pal, C., Courville, A.: Delving deeper into convolutional networks for learning video representations. In: ICLR (2016) Ballas, N., Yao, L., Pal, C., Courville, A.: Delving deeper into convolutional networks for learning video representations. In: ICLR (2016)
4.
Zurück zum Zitat Baumgartner, C.F., et al.: SonoNet: real-time detection and localisation of fetal standard scan planes in freehand ultrasound. IEEE Trans. Med. Imag. 36(11), 2204–2215 (2017)CrossRef Baumgartner, C.F., et al.: SonoNet: real-time detection and localisation of fetal standard scan planes in freehand ultrasound. IEEE Trans. Med. Imag. 36(11), 2204–2215 (2017)CrossRef
5.
Zurück zum Zitat Bazzani, L., Larochelle, H., Torresani, L.: Recurrent mixture density network for spatiotemporal visual attention. In: ICLR (2017) Bazzani, L., Larochelle, H., Torresani, L.: Recurrent mixture density network for spatiotemporal visual attention. In: ICLR (2017)
6.
Zurück zum Zitat Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., Durand, F.: What do different evaluation metrics tell us about saliency models? IEEE Trans. Pattern Anal. Mach. Intell. 41(3), 740–757 (2019)CrossRef Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., Durand, F.: What do different evaluation metrics tell us about saliency models? IEEE Trans. Pattern Anal. Mach. Intell. 41(3), 740–757 (2019)CrossRef
8.
Zurück zum Zitat Cai, Y., Sharma, H., Chatelain, P., Noble, J.A.: SonoEyeNet: standardized fetal ultrasound plane detection informed by eye tracking. In: ISBI (2018) Cai, Y., Sharma, H., Chatelain, P., Noble, J.A.: SonoEyeNet: standardized fetal ultrasound plane detection informed by eye tracking. In: ISBI (2018)
9.
Zurück zum Zitat Cai, Y., Sharma, H., Chatelain, P., Noble, J.A.: Multi-task sonoeyenet: detection of fetal standardized planes assisted by generated sonographer attention maps. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11070, pp. 871–879. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00928-1_98CrossRef Cai, Y., Sharma, H., Chatelain, P., Noble, J.A.: Multi-task sonoeyenet: detection of fetal standardized planes assisted by generated sonographer attention maps. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11070, pp. 871–879. Springer, Cham (2018). https://​doi.​org/​10.​1007/​978-3-030-00928-1_​98CrossRef
10.
11.
Zurück zum Zitat Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP (2014) Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP (2014)
12.
Zurück zum Zitat Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS (2014) Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS (2014)
13.
Zurück zum Zitat Clark, A.: Whatever next? predictive brains, situated agents, and the future of cognitive science. Behav. Brain Sci. 36(03), 181–204 (2013)CrossRef Clark, A.: Whatever next? predictive brains, situated agents, and the future of cognitive science. Behav. Brain Sci. 36(03), 181–204 (2013)CrossRef
14.
Zurück zum Zitat Droste, R., et al.: Ultrasound Image Representation Learning by Modeling Sonographer Visual Attention. Accepted at IPMI (2019) Droste, R., et al.: Ultrasound Image Representation Learning by Modeling Sonographer Visual Attention. Accepted at IPMI (2019)
15.
Zurück zum Zitat Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. In: NIPS (2016) Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. In: NIPS (2016)
16.
Zurück zum Zitat Gao, Y., Alison Noble, J.: Detection and characterization of the fetal heartbeat in free-hand ultrasound sweeps with weakly-supervised two-streams convolutional networks. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10434, pp. 305–313. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66185-8_35CrossRef Gao, Y., Alison Noble, J.: Detection and characterization of the fetal heartbeat in free-hand ultrasound sweeps with weakly-supervised two-streams convolutional networks. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10434, pp. 305–313. Springer, Cham (2017). https://​doi.​org/​10.​1007/​978-3-319-66185-8_​35CrossRef
17.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
18.
Zurück zum Zitat Huang, W., Bridge, C.P., Noble, J.A., Zisserman, A.: Temporal heartnet: towards human-level automatic analysis of fetal cardiac screening video. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10434, pp. 341–349. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66185-8_39CrossRef Huang, W., Bridge, C.P., Noble, J.A., Zisserman, A.: Temporal heartnet: towards human-level automatic analysis of fetal cardiac screening video. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10434, pp. 341–349. Springer, Cham (2017). https://​doi.​org/​10.​1007/​978-3-319-66185-8_​39CrossRef
19.
Zurück zum Zitat Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015) Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)
20.
Zurück zum Zitat Jetley, S., Murray, N., Vig, E.: End-to-end saliency mapping via probability distribution prediction. In: CVPR (2016) Jetley, S., Murray, N., Vig, E.: End-to-end saliency mapping via probability distribution prediction. In: CVPR (2016)
21.
22.
Zurück zum Zitat Sharma, H., Droste, R., Chatelain, P., Drukker, L., Papageorghiou, A., Noble, J.A.: Spatio-temporal partitioning and description of full-length routine fetal anomaly ultrasound scans. Accepted at IEEE ISBI 2019 (2019) Sharma, H., Droste, R., Chatelain, P., Drukker, L., Papageorghiou, A., Noble, J.A.: Spatio-temporal partitioning and description of full-length routine fetal anomaly ultrasound scans. Accepted at IEEE ISBI 2019 (2019)
23.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS (2014) Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS (2014)
24.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
26.
Zurück zum Zitat Ulyanov, D., Vedaldi, A., Lempitsky, V.: Instance Normalization: The Missing Ingredient for Fast Stylization. arxiv:1607.08022 (2016) Ulyanov, D., Vedaldi, A., Lempitsky, V.: Instance Normalization: The Missing Ingredient for Fast Stylization. arxiv:​1607.​08022 (2016)
27.
Zurück zum Zitat Wang, W., Shen, J., Guo, F., Cheng, M.M., Borji, A.: Revisiting video saliency: a large-scale benchmark and a new model. In: CVPR (2018) Wang, W., Shen, J., Guo, F., Cheng, M.M., Borji, A.: Revisiting video saliency: a large-scale benchmark and a new model. In: CVPR (2018)
28.
Zurück zum Zitat Wu, Y., He, K.: Group normalization. In: ECCV (2018) Wu, Y., He, K.: Group normalization. In: ECCV (2018)
Metadaten
Titel
Towards Capturing Sonographic Experience: Cognition-Inspired Ultrasound Video Saliency Prediction
verfasst von
Richard Droste
Yifan Cai
Harshita Sharma
Pierre Chatelain
Aris T. Papageorghiou
J. Alison Noble
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-39343-4_15