Elsevier

Medical Engineering & Physics

Volume 31, Issue 9, November 2009, Pages 1049-1055
Medical Engineering & Physics

Swallow segmentation with artificial neural networks and multi-sensor fusion

https://doi.org/10.1016/j.medengphy.2009.07.001Get rights and content

Abstract

Swallow segmentation is a critical precursory step to the analysis of swallowing signal characteristics. In an effort to automatically segment swallows, we investigated artificial neural networks (ANN) with information from cervical dual-axis accelerometry, submental MMG, and nasal airflow. Our objectives were (1) to investigate the relationship between segmentation performance and the number of signal sources and (2) to identify the signals or signal combinations most useful for swallow segmentation. Signals were acquired from 17 healthy adults in both discrete and continuous swallowing tasks using five stimuli. Training and test feature vectors were constructed with variances from single or multiple signals, estimated within 200 ms moving windows with 50% overlap. Corresponding binary target labels (swallow or non-swallow) were derived by manual segmentation. A separate 3-layer ANN was trained for each participant–signal combination, and all possible signal combinations were investigated. As more signal sources were included, segmentation performance improved in terms of sensitivity, specificity, accuracy, and adjusted accuracy. The combination of all four signal sources achieved the highest mean accuracy and adjusted accuracy of 88.5% and 89.6%, respectively. A–P accelerometry proved to be the most discriminatory source, while the inclusion of MMG or nasal airflow resulted in the least performance improvement. These findings suggest that an ANN, multi-sensor fusion approach to segmentation is worthy of further investigation in swallowing studies.

Introduction

In swallowing research, several non-invasive signal modalities have been investigated for swallow assessment (e.g. pulse oximetry [1], neck vibrations [2], [3], cervical auscultation [4], surface electromyography [5], or nasal airflow [6]). Such signal modalities are attractive due to their low cost, easy-to-attach sensors, and portability. Swallowing studies involving non-invasive signals have mainly focused on understanding the relationship between signal characteristics and swallowing function. However, to analyze the information buried in a signal about a particular swallow, a crucial precursory step is swallow segmentation. Although manual segmentation is always an option, automatic swallow segmentation is required for real-time swallow analysis or when dealing with massive volumes of data. In fact, automatic signal segmentation is a critical part of most computerized diagnostic systems, such as, for example, medical imaging equipment [7], [8]. Automatic segmentation has also been investigated with electrocardiograms (ECG) (e.g. [9]), phonocardiograms (PCG) (e.g. [10]), and speech signals (e.g. [11]). Signal segmentation isolates specific segments of interest from a continuous stream of time series data. The segmented data are critical for informing diagnostic decision. The segments must capture the physiological phenomena under scrutiny, while minimizing contributions from other unwanted sources such as motion artifacts.

This study focuses on three non-invasive signal modalities, namely dual-axis accelerometry, submental mechanomyography (MMG), and nasal airflow, for automatic swallow segmentation. A brief background of these modalities is provided below, followed by a brief introduction about the two technical pillars of this study, namely, multi-sensor fusion and the artificial neural network (ANN).

The measurement of neck vibrations is motivated by cervical auscultation, which is based on the claim that normal and abnormal swallows exhibit audible differences [12]. Although a microphone can be used, the combination of an accelerometer and digital signal processing has been the recent focus of research and termed swallowing accelerometry [2], [13], [14], [15]. One accelerometry study has shown a significant correlation between peak neck vibration and maximum hyolaryngeal excursion [3], which is a vital biomechanical component of airway protection during swallowing [16]. Although most prior work in this area has examined single-axis accelerometry signals, it has recently been shown that the anterior–posterior and superior–inferior axes of dual-axis accelerometry contain distinct information about swallowing [13].

Deglutition (i.e. swallowing) is a sequence of well-coordinated muscle activations. More than 25 pairs of muscles in the oral cavity, pharynx, larynx, and esophagus participate in deglutition [17]. In particular, the submental muscles, including the mylohyoid, geniohyoid, and styloglossus muscles, are part of the group of muscles that contract first in deglutition, known as the swallowing leading complex [18]. Thus, the submental musculature has been extensively studied in relation to swallowing. Although electromyography (EMG) is the most common method of measuring muscle activity, MMG offers several advantages in swallowing studies such as tolerance to variations in electrode location and robustness to perspiration and food spillage [19], [20].

Respiration and deglutition must be coordinated precisely in order to avoid airway invasion during swallowing [21], because the pharynx is a shared passageway for both air and food. The cessation of breathing during bolus transport is called swallowing apnea (SA), and the absence or presence as well as the timing of SA can disclose crucial information about airway protection [22], [23]. Because the mouth is occupied with mastication (i.e. chewing) and bolus formulation during food intake, nasal respiration is the usual option for airflow monitoring in swallowing studies.

In a wide range of applications, multi-sensor fusion methodologies resulted in successful performance in the cases of both complementary and redundant data [24], [25]. Although multi-sensor fusion has been applied to segmentation before, such efforts mainly focused on computer vision applications (e.g. [26], [27]). Decisive advantages of multi-sensor fusion include reduced uncertainty, robustness to noise and measurement error in individual sensors, tolerance to single sensor failure, and resolved ambiguity [28].

ANN has been deployed in image segmentation problems (e.g. [29], [30]). The ANN is a versatile nonlinear function approximator that can be utilized in either regression or classification. A 3-layer ANN is capable of mapping any input–output relationship, given sufficient hidden units and training cases as well as suitable nonlinear activation functions [31]. The input–output relationship can be learned automatically, and this is particularly helpful if the relationship is too complex to be described analytically. At least one study has investigated ANN classifiers based on multiple bio-sensors for emotion recognition [32].

To the best of our knowledge, a multi-sensor fusion ANN has never been investigated for swallow segmentation. Furthermore, the combination of dual-axis accelerometry, submental MMG, and nasal airflow signals is novel in the field of swallowing research.

Section snippets

Objectives

The two primary objectives of this study were:

  • to investigate the relationship between swallow segmentation performance and the number of employed signal sources, and

  • to determine the signals or signal combinations which yield the most accurate swallow segmentation.

Signal acquisition

Signals were acquired from 17 (8 male) healthy adults with no history of dysphagia (i.e., swallowing disorder) or neurological impairments. The mean age was 46.9±23.8 years. Each participant’s swallowing health was confirmed via a standardized oral mechanism examination and a water swallow screening test conducted by a registered speech-language pathologist. This study was approved by the ethics committees at the Toronto Rehabilitation Institute, Bloorview Kids Rehab, and University of Toronto.

Effects of multi-sensor fusion

Fig. 4 illustrates how segmentation performance changed as more signal sources became available for ANN training. The means and standard deviations are shown with dots and bars, respectively, while the dashed lines are the logistic regression fits. All performance measures resulted in positive regression coefficients: 0.1147 (p=0.6066), 0.4302 (p=0.022), 0.4022 (p=0.0345), and 0.2993 (p=0.1368) for sensitivity, specificity, accuracy, and adjusted accuracy, respectively. These results indicate

Discussion

As more signal sources were considered, segmentation performance improved with respect to sensitivity, specificity, accuracy, and adjusted accuracy. In binary classification settings, there is a tradeoff between sensitivity and specificity as described by the receiver operating characteristic (ROC) curve [42]. Therefore, the simultaneous gain in both sensitivity and specificity in this study implies that the overall performance increase was truly due to multi-sensor fusion. Also, as shown in

Conclusions

We have shown in this paper that swallow segmentation can be improved by utilizing information from multiple sensors. The combination of all four signal sources resulted in the highest mean accuracy and adjusted accuracy. A–P accelerometry was deemed to be the most essential component to swallow segmentation, while the inclusion of MMG and nasal airflow was less critical.

Conflict of interest statement

There is no conflict of interest.

Acknowledgements

The authors would like to thank the Natural Sciences and Engineering Research Council of Canada, University of Toronto, Bloorview Kids Rehab, and Canada Research Chairs Program for funding and support. The authors also acknowledge the support of the Toronto Rehabilitation Institute. Equipment and space have been funded with grants from the Canada Foundation for Innovation and the Province of Ontario. This research has been generously supported by a grant from the Ontario Ministry of Health and

References (44)

  • W. Selley et al.

    Respiratory patterns associated with swallowing. Part 2. Neurologically impaired dysphagic patients

    Age and Ageing

    (1989)
  • Vitulano S., DiRuberto C., Nappi M., Different methods to segment biomedical images (1997)...
  • R. Andreao et al.

    ECG signal analysis through hidden markov models

    IEEE Transactions on Biomedical Engineering

    (2006)
  • R. Lehner et al.

    A three-channel microcomputer system for segmentation and characterization of the phonocardiogram

    IEEE Transactions on Biomedical Engineering

    (1987)
  • R. Andre-Obrecht

    A new statistical approach for the automatic segmentation of continuous speech signals

    IEEE Transactions on Acoustics, Speech, and Signal Processing

    (1988)
  • S. Hamlet et al.

    Interpreting the sounds of swallowing: fluid flow through the cricopharyngeus

    Annals of Otology, Rhinology, and Laryngology

    (1990)
  • J. Lee et al.

    Time and time-frequency characterization of dual-axis swallowing accelerometry signals

    Physiological Measurement

    (2008)
  • T. Chau et al.

    Investigating the stationarity of paediatric aspiration signals

    IEEE Transactions on Neural and Rehabilitation Engineering

    (2005)
  • J. Logemann

    Evaluation and treatment of swallowing disorders

    (1998)
  • A. Miller

    Deglutition

    Physiological Reviews

    (1982)
  • A. Jean

    Brain stem control of swallowing: neuronal network and cellular mechanisms

    Physiological Reviews

    (2001)
  • C. Orizio

    Muscle sound: bases for the introduction of a mechanomyographic signal in muscle studies

    Critical Reviews in Biomedical Engineering

    (1993)
  • Cited by (34)

    • Analysis of electrophysiological and mechanical dimensions of swallowing by non-invasive biosignals

      2023, Biomedical Signal Processing and Control
      Citation Excerpt :

      For instance, most works related to Acc analysis involved swallowing sounds recorded by a digital microphone placed on the neck [21], but there are clinical barriers that prevent its implementation at the consulting room [15], such as the requirement to be performed together with VFSS [15]. Other biosignals have also been used in combination with Acc, such as mechanomyography [16], nasal flow [46] and piezo-electric [47]. In fact, Papapanagiotou et al. [48] found that the late-fusion of information from Acc, photoplethysmography and a microphone inserted in the ear hook, improves the chewing detection compared to using each signal separately.

    • A comparison between swallowing sounds and vibrations in patients with dysphagia

      2017, Computer Methods and Programs in Biomedicine
      Citation Excerpt :

      The focus of these investigations can be categorized into several main topics, such as the physiological sources of the signals [4,5,13–17,27], the best placement site of microphones and accelerometers on the neck [1,7,27,28], the best preprocessing methods for signals [20,27,29–31], characterization of the recorded signals [1–3,6,8,9,12,18,21,23,27,32], segmentation of the swallowing signals [20,22,27,33–36], and classification of an abnormal swallow from a normal swallow [10,27,37–39]. Researchers have tried to characterize the swallowing sounds and vibrations separately by extracting different features in the various domains including time, frequency, and time-frequency [1–3,6–8,10,12,18,21–23,27,37]. The results have provided some evidence that using microphones or dual-axial accelerometers may be a credible approach for detecting some swallowing difficulties.

    • A comparative analysis of DBSCAN, K-means, and quadratic variation algorithms for automatic identification of swallows from swallowing accelerometry signals

      2015, Computers in Biology and Medicine
      Citation Excerpt :

      One method that has received significant attention is the neural network technique. The signal is windowed and then multiple time-varying features are calculated before being fed into the neural network [24–27]. After sufficient training this network should be able to differentiate between periods of time where swallowing activity is present or absent based on the values of the inputs [24–27].

    View all citing articles on Scopus
    View full text