Swallow segmentation with artificial neural networks and multi-sensor fusion
Introduction
In swallowing research, several non-invasive signal modalities have been investigated for swallow assessment (e.g. pulse oximetry [1], neck vibrations [2], [3], cervical auscultation [4], surface electromyography [5], or nasal airflow [6]). Such signal modalities are attractive due to their low cost, easy-to-attach sensors, and portability. Swallowing studies involving non-invasive signals have mainly focused on understanding the relationship between signal characteristics and swallowing function. However, to analyze the information buried in a signal about a particular swallow, a crucial precursory step is swallow segmentation. Although manual segmentation is always an option, automatic swallow segmentation is required for real-time swallow analysis or when dealing with massive volumes of data. In fact, automatic signal segmentation is a critical part of most computerized diagnostic systems, such as, for example, medical imaging equipment [7], [8]. Automatic segmentation has also been investigated with electrocardiograms (ECG) (e.g. [9]), phonocardiograms (PCG) (e.g. [10]), and speech signals (e.g. [11]). Signal segmentation isolates specific segments of interest from a continuous stream of time series data. The segmented data are critical for informing diagnostic decision. The segments must capture the physiological phenomena under scrutiny, while minimizing contributions from other unwanted sources such as motion artifacts.
This study focuses on three non-invasive signal modalities, namely dual-axis accelerometry, submental mechanomyography (MMG), and nasal airflow, for automatic swallow segmentation. A brief background of these modalities is provided below, followed by a brief introduction about the two technical pillars of this study, namely, multi-sensor fusion and the artificial neural network (ANN).
The measurement of neck vibrations is motivated by cervical auscultation, which is based on the claim that normal and abnormal swallows exhibit audible differences [12]. Although a microphone can be used, the combination of an accelerometer and digital signal processing has been the recent focus of research and termed swallowing accelerometry [2], [13], [14], [15]. One accelerometry study has shown a significant correlation between peak neck vibration and maximum hyolaryngeal excursion [3], which is a vital biomechanical component of airway protection during swallowing [16]. Although most prior work in this area has examined single-axis accelerometry signals, it has recently been shown that the anterior–posterior and superior–inferior axes of dual-axis accelerometry contain distinct information about swallowing [13].
Deglutition (i.e. swallowing) is a sequence of well-coordinated muscle activations. More than 25 pairs of muscles in the oral cavity, pharynx, larynx, and esophagus participate in deglutition [17]. In particular, the submental muscles, including the mylohyoid, geniohyoid, and styloglossus muscles, are part of the group of muscles that contract first in deglutition, known as the swallowing leading complex [18]. Thus, the submental musculature has been extensively studied in relation to swallowing. Although electromyography (EMG) is the most common method of measuring muscle activity, MMG offers several advantages in swallowing studies such as tolerance to variations in electrode location and robustness to perspiration and food spillage [19], [20].
Respiration and deglutition must be coordinated precisely in order to avoid airway invasion during swallowing [21], because the pharynx is a shared passageway for both air and food. The cessation of breathing during bolus transport is called swallowing apnea (SA), and the absence or presence as well as the timing of SA can disclose crucial information about airway protection [22], [23]. Because the mouth is occupied with mastication (i.e. chewing) and bolus formulation during food intake, nasal respiration is the usual option for airflow monitoring in swallowing studies.
In a wide range of applications, multi-sensor fusion methodologies resulted in successful performance in the cases of both complementary and redundant data [24], [25]. Although multi-sensor fusion has been applied to segmentation before, such efforts mainly focused on computer vision applications (e.g. [26], [27]). Decisive advantages of multi-sensor fusion include reduced uncertainty, robustness to noise and measurement error in individual sensors, tolerance to single sensor failure, and resolved ambiguity [28].
ANN has been deployed in image segmentation problems (e.g. [29], [30]). The ANN is a versatile nonlinear function approximator that can be utilized in either regression or classification. A 3-layer ANN is capable of mapping any input–output relationship, given sufficient hidden units and training cases as well as suitable nonlinear activation functions [31]. The input–output relationship can be learned automatically, and this is particularly helpful if the relationship is too complex to be described analytically. At least one study has investigated ANN classifiers based on multiple bio-sensors for emotion recognition [32].
To the best of our knowledge, a multi-sensor fusion ANN has never been investigated for swallow segmentation. Furthermore, the combination of dual-axis accelerometry, submental MMG, and nasal airflow signals is novel in the field of swallowing research.
Section snippets
Objectives
The two primary objectives of this study were:
- •
to investigate the relationship between swallow segmentation performance and the number of employed signal sources, and
- •
to determine the signals or signal combinations which yield the most accurate swallow segmentation.
Signal acquisition
Signals were acquired from 17 (8 male) healthy adults with no history of dysphagia (i.e., swallowing disorder) or neurological impairments. The mean age was years. Each participant’s swallowing health was confirmed via a standardized oral mechanism examination and a water swallow screening test conducted by a registered speech-language pathologist. This study was approved by the ethics committees at the Toronto Rehabilitation Institute, Bloorview Kids Rehab, and University of Toronto.
Effects of multi-sensor fusion
Fig. 4 illustrates how segmentation performance changed as more signal sources became available for ANN training. The means and standard deviations are shown with dots and bars, respectively, while the dashed lines are the logistic regression fits. All performance measures resulted in positive regression coefficients: 0.1147 (), 0.4302 (), 0.4022 (), and 0.2993 () for sensitivity, specificity, accuracy, and adjusted accuracy, respectively. These results indicate
Discussion
As more signal sources were considered, segmentation performance improved with respect to sensitivity, specificity, accuracy, and adjusted accuracy. In binary classification settings, there is a tradeoff between sensitivity and specificity as described by the receiver operating characteristic (ROC) curve [42]. Therefore, the simultaneous gain in both sensitivity and specificity in this study implies that the overall performance increase was truly due to multi-sensor fusion. Also, as shown in
Conclusions
We have shown in this paper that swallow segmentation can be improved by utilizing information from multiple sensors. The combination of all four signal sources resulted in the highest mean accuracy and adjusted accuracy. A–P accelerometry was deemed to be the most essential component to swallow segmentation, while the inclusion of MMG and nasal airflow was less critical.
Conflict of interest statement
There is no conflict of interest.
Acknowledgements
The authors would like to thank the Natural Sciences and Engineering Research Council of Canada, University of Toronto, Bloorview Kids Rehab, and Canada Research Chairs Program for funding and support. The authors also acknowledge the support of the Toronto Rehabilitation Institute. Equipment and space have been funded with grants from the Canada Foundation for Innovation and the Province of Ontario. This research has been generously supported by a grant from the Ontario Ministry of Health and
References (44)
- et al.
Measurements of acceleration during videofluorographic evaluation of dysphagic patients
Medical Engineering & Physics
(2000) - et al.
Segmentation of medical images
Image and Vision Computing
(1993) - et al.
Hybrid fuzzy logic committee neural networks for recognition of swallow acceleration signals
Computer Methods and Programs in Biomedicine
(2001) - et al.
A self-contained, mechanomyography-driven externally powered prosthesis
Archives of Physical Medicine and Rehabilitation
(2005) - et al.
Experiments in combining intensity and range edge maps
CGIP
(1983) An introduction to ROC analysis
Pattern Recognition Letters
(2006)- et al.
Assessment of dysphagia with the use of pulse oximetry
Dysphagia
(1999) - et al.
A radial basis classifier for the automatic detection of aspiration in children with dysphagia
Journal of NeuroEngineering and Rehabilitation
(2006) - et al.
Reliability and validity of cervical auscultation: a controlled comparison using videofluoroscopy
Dysphagia
(2004) - et al.
Surface electromyographic characteristics of swallowing in dysphagia secondary to brainstem stroke
Dysphagia
(1997)
Respiratory patterns associated with swallowing. Part 2. Neurologically impaired dysphagic patients
Age and Ageing
ECG signal analysis through hidden markov models
IEEE Transactions on Biomedical Engineering
A three-channel microcomputer system for segmentation and characterization of the phonocardiogram
IEEE Transactions on Biomedical Engineering
A new statistical approach for the automatic segmentation of continuous speech signals
IEEE Transactions on Acoustics, Speech, and Signal Processing
Interpreting the sounds of swallowing: fluid flow through the cricopharyngeus
Annals of Otology, Rhinology, and Laryngology
Time and time-frequency characterization of dual-axis swallowing accelerometry signals
Physiological Measurement
Investigating the stationarity of paediatric aspiration signals
IEEE Transactions on Neural and Rehabilitation Engineering
Evaluation and treatment of swallowing disorders
Deglutition
Physiological Reviews
Brain stem control of swallowing: neuronal network and cellular mechanisms
Physiological Reviews
Muscle sound: bases for the introduction of a mechanomyographic signal in muscle studies
Critical Reviews in Biomedical Engineering
Cited by (34)
Analysis of electrophysiological and mechanical dimensions of swallowing by non-invasive biosignals
2023, Biomedical Signal Processing and ControlCitation Excerpt :For instance, most works related to Acc analysis involved swallowing sounds recorded by a digital microphone placed on the neck [21], but there are clinical barriers that prevent its implementation at the consulting room [15], such as the requirement to be performed together with VFSS [15]. Other biosignals have also been used in combination with Acc, such as mechanomyography [16], nasal flow [46] and piezo-electric [47]. In fact, Papapanagiotou et al. [48] found that the late-fusion of information from Acc, photoplethysmography and a microphone inserted in the ear hook, improves the chewing detection compared to using each signal separately.
Automatic detection of oral and pharyngeal phases in swallowing using classification algorithms and multichannel EMG
2018, Journal of Electromyography and KinesiologyA comparison between swallowing sounds and vibrations in patients with dysphagia
2017, Computer Methods and Programs in BiomedicineCitation Excerpt :The focus of these investigations can be categorized into several main topics, such as the physiological sources of the signals [4,5,13–17,27], the best placement site of microphones and accelerometers on the neck [1,7,27,28], the best preprocessing methods for signals [20,27,29–31], characterization of the recorded signals [1–3,6,8,9,12,18,21,23,27,32], segmentation of the swallowing signals [20,22,27,33–36], and classification of an abnormal swallow from a normal swallow [10,27,37–39]. Researchers have tried to characterize the swallowing sounds and vibrations separately by extracting different features in the various domains including time, frequency, and time-frequency [1–3,6–8,10,12,18,21–23,27,37]. The results have provided some evidence that using microphones or dual-axial accelerometers may be a credible approach for detecting some swallowing difficulties.
A comparative analysis of DBSCAN, K-means, and quadratic variation algorithms for automatic identification of swallows from swallowing accelerometry signals
2015, Computers in Biology and MedicineCitation Excerpt :One method that has received significant attention is the neural network technique. The signal is windowed and then multiple time-varying features are calculated before being fed into the neural network [24–27]. After sufficient training this network should be able to differentiate between periods of time where swallowing activity is present or absent based on the values of the inputs [24–27].
Toward a robust swallowing detection for an implantable active artificial larynx: a survey
2023, Medical and Biological Engineering and Computing