Swallow segmentation with artificial neural networks and multi-sensor fusion

doi:10.1016/j.medengphy.2009.07.001

Medical Engineering & Physics

Volume 31, Issue 9, November 2009, Pages 1049-1055

https://doi.org/10.1016/j.medengphy.2009.07.001 Get rights and content

Abstract

Swallow segmentation is a critical precursory step to the analysis of swallowing signal characteristics. In an effort to automatically segment swallows, we investigated artificial neural networks (ANN) with information from cervical dual-axis accelerometry, submental MMG, and nasal airflow. Our objectives were (1) to investigate the relationship between segmentation performance and the number of signal sources and (2) to identify the signals or signal combinations most useful for swallow segmentation. Signals were acquired from 17 healthy adults in both discrete and continuous swallowing tasks using five stimuli. Training and test feature vectors were constructed with variances from single or multiple signals, estimated within 200 ms moving windows with 50% overlap. Corresponding binary target labels (swallow or non-swallow) were derived by manual segmentation. A separate 3-layer ANN was trained for each participant–signal combination, and all possible signal combinations were investigated. As more signal sources were included, segmentation performance improved in terms of sensitivity, specificity, accuracy, and adjusted accuracy. The combination of all four signal sources achieved the highest mean accuracy and adjusted accuracy of 88.5% and 89.6%, respectively. A–P accelerometry proved to be the most discriminatory source, while the inclusion of MMG or nasal airflow resulted in the least performance improvement. These findings suggest that an ANN, multi-sensor fusion approach to segmentation is worthy of further investigation in swallowing studies.

Introduction

In swallowing research, several non-invasive signal modalities have been investigated for swallow assessment (e.g. pulse oximetry [1], neck vibrations [2], [3], cervical auscultation [4], surface electromyography [5], or nasal airflow [6]). Such signal modalities are attractive due to their low cost, easy-to-attach sensors, and portability. Swallowing studies involving non-invasive signals have mainly focused on understanding the relationship between signal characteristics and swallowing function. However, to analyze the information buried in a signal about a particular swallow, a crucial precursory step is swallow segmentation. Although manual segmentation is always an option, automatic swallow segmentation is required for real-time swallow analysis or when dealing with massive volumes of data. In fact, automatic signal segmentation is a critical part of most computerized diagnostic systems, such as, for example, medical imaging equipment [7], [8]. Automatic segmentation has also been investigated with electrocardiograms (ECG) (e.g. [9]), phonocardiograms (PCG) (e.g. [10]), and speech signals (e.g. [11]). Signal segmentation isolates specific segments of interest from a continuous stream of time series data. The segmented data are critical for informing diagnostic decision. The segments must capture the physiological phenomena under scrutiny, while minimizing contributions from other unwanted sources such as motion artifacts.

This study focuses on three non-invasive signal modalities, namely dual-axis accelerometry, submental mechanomyography (MMG), and nasal airflow, for automatic swallow segmentation. A brief background of these modalities is provided below, followed by a brief introduction about the two technical pillars of this study, namely, multi-sensor fusion and the artificial neural network (ANN).

The measurement of neck vibrations is motivated by cervical auscultation, which is based on the claim that normal and abnormal swallows exhibit audible differences [12]. Although a microphone can be used, the combination of an accelerometer and digital signal processing has been the recent focus of research and termed swallowing accelerometry [2], [13], [14], [15]. One accelerometry study has shown a significant correlation between peak neck vibration and maximum hyolaryngeal excursion [3], which is a vital biomechanical component of airway protection during swallowing [16]. Although most prior work in this area has examined single-axis accelerometry signals, it has recently been shown that the anterior–posterior and superior–inferior axes of dual-axis accelerometry contain distinct information about swallowing [13].

Deglutition (i.e. swallowing) is a sequence of well-coordinated muscle activations. More than 25 pairs of muscles in the oral cavity, pharynx, larynx, and esophagus participate in deglutition [17]. In particular, the submental muscles, including the mylohyoid, geniohyoid, and styloglossus muscles, are part of the group of muscles that contract first in deglutition, known as the swallowing leading complex [18]. Thus, the submental musculature has been extensively studied in relation to swallowing. Although electromyography (EMG) is the most common method of measuring muscle activity, MMG offers several advantages in swallowing studies such as tolerance to variations in electrode location and robustness to perspiration and food spillage [19], [20].

Respiration and deglutition must be coordinated precisely in order to avoid airway invasion during swallowing [21], because the pharynx is a shared passageway for both air and food. The cessation of breathing during bolus transport is called swallowing apnea (SA), and the absence or presence as well as the timing of SA can disclose crucial information about airway protection [22], [23]. Because the mouth is occupied with mastication (i.e. chewing) and bolus formulation during food intake, nasal respiration is the usual option for airflow monitoring in swallowing studies.

In a wide range of applications, multi-sensor fusion methodologies resulted in successful performance in the cases of both complementary and redundant data [24], [25]. Although multi-sensor fusion has been applied to segmentation before, such efforts mainly focused on computer vision applications (e.g. [26], [27]). Decisive advantages of multi-sensor fusion include reduced uncertainty, robustness to noise and measurement error in individual sensors, tolerance to single sensor failure, and resolved ambiguity [28].

ANN has been deployed in image segmentation problems (e.g. [29], [30]). The ANN is a versatile nonlinear function approximator that can be utilized in either regression or classification. A 3-layer ANN is capable of mapping any input–output relationship, given sufficient hidden units and training cases as well as suitable nonlinear activation functions [31]. The input–output relationship can be learned automatically, and this is particularly helpful if the relationship is too complex to be described analytically. At least one study has investigated ANN classifiers based on multiple bio-sensors for emotion recognition [32].

To the best of our knowledge, a multi-sensor fusion ANN has never been investigated for swallow segmentation. Furthermore, the combination of dual-axis accelerometry, submental MMG, and nasal airflow signals is novel in the field of swallowing research.

Section snippets

Objectives

The two primary objectives of this study were:

•
to investigate the relationship between swallow segmentation performance and the number of employed signal sources, and
•
to determine the signals or signal combinations which yield the most accurate swallow segmentation.

Signal acquisition

Signals were acquired from 17 (8 male) healthy adults with no history of dysphagia (i.e., swallowing disorder) or neurological impairments. The mean age was $46.9 \pm 23.8$ years. Each participant’s swallowing health was confirmed via a standardized oral mechanism examination and a water swallow screening test conducted by a registered speech-language pathologist. This study was approved by the ethics committees at the Toronto Rehabilitation Institute, Bloorview Kids Rehab, and University of Toronto.

Effects of multi-sensor fusion

Fig. 4 illustrates how segmentation performance changed as more signal sources became available for ANN training. The means and standard deviations are shown with dots and bars, respectively, while the dashed lines are the logistic regression fits. All performance measures resulted in positive regression coefficients: 0.1147 ( $p = 0.6066$ ), 0.4302 ( $p = 0.022$ ), 0.4022 ( $p = 0.0345$ ), and 0.2993 ( $p = 0.1368$ ) for sensitivity, specificity, accuracy, and adjusted accuracy, respectively. These results indicate

Discussion

As more signal sources were considered, segmentation performance improved with respect to sensitivity, specificity, accuracy, and adjusted accuracy. In binary classification settings, there is a tradeoff between sensitivity and specificity as described by the receiver operating characteristic (ROC) curve [42]. Therefore, the simultaneous gain in both sensitivity and specificity in this study implies that the overall performance increase was truly due to multi-sensor fusion. Also, as shown in

Conclusions

We have shown in this paper that swallow segmentation can be improved by utilizing information from multiple sensors. The combination of all four signal sources resulted in the highest mean accuracy and adjusted accuracy. A–P accelerometry was deemed to be the most essential component to swallow segmentation, while the inclusion of MMG and nasal airflow was less critical.

Conflict of interest statement

There is no conflict of interest.

Acknowledgements

The authors would like to thank the Natural Sciences and Engineering Research Council of Canada, University of Toronto, Bloorview Kids Rehab, and Canada Research Chairs Program for funding and support. The authors also acknowledge the support of the Toronto Rehabilitation Institute. Equipment and space have been funded with grants from the Canada Foundation for Innovation and the Province of Ontario. This research has been generously supported by a grant from the Ontario Ministry of Health and

References (44)

N. Reddy et al.
Measurements of acceleration during videofluorographic evaluation of dysphagic patients
Medical Engineering & Physics
(2000)
R. Deklerck et al.
Segmentation of medical images
Image and Vision Computing
(1993)
A. Das et al.
Hybrid fuzzy logic committee neural networks for recognition of swallow acceleration signals
Computer Methods and Programs in Biomedicine
(2001)
J. Silva et al.
A self-contained, mechanomyography-driven externally powered prosthesis
Archives of Physical Medicine and Rehabilitation
(2005)
B. Gil et al.
Experiments in combining intensity and range edge maps
CGIP
(1983)
T. Fawcett
An introduction to ROC analysis
Pattern Recognition Letters
(2006)
B. Sherman et al.
Assessment of dysphagia with the use of pulse oximetry
Dysphagia
(1999)
J. Lee et al.
A radial basis classifier for the automatic detection of aspiration in children with dysphagia
Journal of NeuroEngineering and Rehabilitation
(2006)
P. Leslie et al.
Reliability and validity of cervical auscultation: a controlled comparison using videofluoroscopy
Dysphagia
(2004)
M. Crary et al.
Surface electromyographic characteristics of swallowing in dysphagia secondary to brainstem stroke
Dysphagia
(1997)

W. Selley et al.

Respiratory patterns associated with swallowing. Part 2. Neurologically impaired dysphagic patients

Age and Ageing

(1989)

Vitulano S., DiRuberto C., Nappi M., Different methods to segment biomedical images (1997)...

R. Andreao et al.

ECG signal analysis through hidden markov models

IEEE Transactions on Biomedical Engineering

(2006)

R. Lehner et al.

A three-channel microcomputer system for segmentation and characterization of the phonocardiogram

IEEE Transactions on Biomedical Engineering

(1987)

R. Andre-Obrecht

A new statistical approach for the automatic segmentation of continuous speech signals

IEEE Transactions on Acoustics, Speech, and Signal Processing

(1988)

S. Hamlet et al.

Interpreting the sounds of swallowing: fluid flow through the cricopharyngeus

Annals of Otology, Rhinology, and Laryngology

(1990)

J. Lee et al.

Time and time-frequency characterization of dual-axis swallowing accelerometry signals

Physiological Measurement

(2008)

T. Chau et al.

Investigating the stationarity of paediatric aspiration signals

IEEE Transactions on Neural and Rehabilitation Engineering

(2005)

J. Logemann

Evaluation and treatment of swallowing disorders

(1998)

A. Miller

Deglutition

Physiological Reviews

(1982)

A. Jean

Brain stem control of swallowing: neuronal network and cellular mechanisms

Physiological Reviews

(2001)

C. Orizio

Muscle sound: bases for the introduction of a mechanomyographic signal in muscle studies

Critical Reviews in Biomedical Engineering

(1993)

Cited by (34)

Analysis of electrophysiological and mechanical dimensions of swallowing by non-invasive biosignals
2023, Biomedical Signal Processing and Control
Citation Excerpt :
For instance, most works related to Acc analysis involved swallowing sounds recorded by a digital microphone placed on the neck [21], but there are clinical barriers that prevent its implementation at the consulting room [15], such as the requirement to be performed together with VFSS [15]. Other biosignals have also been used in combination with Acc, such as mechanomyography [16], nasal flow [46] and piezo-electric [47]. In fact, Papapanagiotou et al. [48] found that the late-fusion of information from Acc, photoplethysmography and a microphone inserted in the ear hook, improves the chewing detection compared to using each signal separately.
Alterations in the neuromuscular coordination of swallowing are known as dysphagia, which can produce malnutrition, dehydration and aspiration pneumonia. Its instrumental diagnosis is invasive and expertise dependent. Thus, we introduced a non-invasive multimodal approach for dysphagia screening using surface electromyography (sEMG) and accelerometry-based cervical auscultation (Acc).
Thirty healthy individuals and 30 patients with functional oropharyngeal dysphagia were recruited. Swallowing tasks of saliva and 5, 10, and 20 mL of yogurt and water were performed. Supra- and infrahyoid sEMG and tri-axial Acc signals were recorded. Linear and non-linear features were extracted and selected. Two unimodal and one multimodal classification scenarios were tested. Classical algorithms were applied and the Area Under the ROC curve (AUC) was the criterion for hyperparameters optimization.
The Acc related features were the most consistently selected. Although the classification results with Acc signals were higher than with sEMG, the signal fusion improved the unimodal results regardless of swallowing task (AUC $>$ 0.82). The highest classification results were achieved with small volumes of water (AUC = 0.86 ± 0.15) and yogurt (AUC = 0.87 ± 0.12).
The combination of non-invasive sEMG and Acc signals improves the performance of automatic classification models for dysphagia detection.
This paper proposes a multimodal approach based on electrophysiological and mechanical swallowing dimensions, for automatic, non-invasive and quantitative dysphagia screening.
Automatic detection of oral and pharyngeal phases in swallowing using classification algorithms and multichannel EMG
2018, Journal of Electromyography and Kinesiology
Swallowing is a complex process that involves sequential voluntary and involuntary muscle contractions. Malfunctioning of swallowing related muscles could lead to dysphagia. However, there is a lack of standardized and non-invasive methods that support and improve the diagnosis and ambulatory care.
This paper presents a classification scheme of two swallowing phases (oral and pharyngeal) based on signals of surface electromyography (sEMG). Eight acquisition channels recorded the EMG activity of 47 healthy subjects while they swallowed water, yogurt and saliva. Every signal was processed, segmented and labeled with background activity, oral or pharyngeal classes. Nine time domain and four frequency domain features were extracted from the segments, assessed individually and then compared in groups according to a correlation analysis. A support vector machine (SVM) with radial basis function kernel and a feedforward artificial neural network (ANN) with one hidden layer were used as classifiers.
Different hyperparameters of the SVM and number of hidden neurons of the ANN were assessed for the proposed scheme. The recognition accuracy of SVM (92,03%) was higher than ANN’s (90,26%). Time domain features were found to have better capability of representation than their frequency domain counterpart. Nevertheless, expanding the feature space improved the performance of the classifiers.
Experimental results show that proposed sEMG-based method can correctly distinguish between oral and pharyngeal swallowing phases and can be used for assessment of continuous swallowing tasks. This paper extends previous reported findings to small muscles with low signal-to-noise ratio and high crosstalk acquired in multichannel systems.
A comparison between swallowing sounds and vibrations in patients with dysphagia
2017, Computer Methods and Programs in Biomedicine
Citation Excerpt :
The focus of these investigations can be categorized into several main topics, such as the physiological sources of the signals [4,5,13–17,27], the best placement site of microphones and accelerometers on the neck [1,7,27,28], the best preprocessing methods for signals [20,27,29–31], characterization of the recorded signals [1–3,6,8,9,12,18,21,23,27,32], segmentation of the swallowing signals [20,22,27,33–36], and classification of an abnormal swallow from a normal swallow [10,27,37–39]. Researchers have tried to characterize the swallowing sounds and vibrations separately by extracting different features in the various domains including time, frequency, and time-frequency [1–3,6–8,10,12,18,21–23,27,37]. The results have provided some evidence that using microphones or dual-axial accelerometers may be a credible approach for detecting some swallowing difficulties.
The cervical auscultation refers to the observation and analysis of sounds or vibrations captured during swallowing using either a stethoscope or acoustic/vibratory detectors. Microphones and accelerometers have recently become two common sensors used in modern cervical auscultation methods. There are open questions about whether swallowing signals recorded by these two sensors provide unique or complementary information about swallowing function; or whether they present interchangeable information. This study aims to compare of swallowing signals recorded by a microphone and a tri-axial accelerometer from 72 patients (mean age 63.94 ± 12.58 years, 42 male, 30 female), who had videofluoroscopic examination. The participants swallowed one or more boluses of thickened liquids of different consistencies, including thin liquids, nectar-thick liquids, and pudding. A comfortable self-selected volume from a cup or a controlled volume by the examiner from a 5 ml spoon was given to the participants. A broad feature set was extracted in time, information-theoretic, and frequency domains from each of 881 swallows presented in this study. The swallowing sounds exhibited significantly higher frequency content and kurtosis values than the swallowing vibrations. In addition, the Lempel–Ziv complexity was lower for swallowing sounds than those for swallowing vibrations. To conclude, information provided by microphones and accelerometers about swallowing function are unique and these two transducers are not interchangeable. Consequently, the selection of transducer would be a vital step in future studies.
A comparative analysis of DBSCAN, K-means, and quadratic variation algorithms for automatic identification of swallows from swallowing accelerometry signals
2015, Computers in Biology and Medicine
Citation Excerpt :
One method that has received significant attention is the neural network technique. The signal is windowed and then multiple time-varying features are calculated before being fed into the neural network [24–27]. After sufficient training this network should be able to differentiate between periods of time where swallowing activity is present or absent based on the values of the inputs [24–27].
Background: Cervical auscultation with high resolution sensors is currently under consideration as a method of automatically screening for specific swallowing abnormalities. To be clinically useful without human involvement, any devices based on cervical auscultation should be able to detect specified swallowing events in an automatic manner.
Methods: In this paper, we comparatively analyze the density-based spatial clustering of applications with noise algorithm (DBSCAN), a k-means based algorithm, and an algorithm based on quadratic variation as methods of differentiating periods of swallowing activity from periods of time without swallows. These algorithms utilized swallowing vibration data exclusively and compared the results to a gold standard measure of swallowing duration. Data was collected from 23 subjects that were actively suffering from swallowing difficulties.
Results: Comparing the performance of the DBSCAN algorithm with a proven segmentation algorithm that utilizes k-means clustering demonstrated that the DBSCAN algorithm had a higher sensitivity and correctly segmented more swallows. Comparing its performance with a threshold-based algorithm that utilized the quadratic variation of the signal showed that the DBSCAN algorithm offered no direct increase in performance. However, it offered several other benefits including a faster run time and more consistent performance between patients. All algorithms showed noticeable differentiation from the endpoints provided by a videofluoroscopy examination as well as reduced sensitivity.
Conclusions: In summary, we showed that the DBSCAN algorithm is a viable method for detecting the occurrence of a swallowing event using cervical auscultation signals, but significant work must be done to improve its performance before it can be implemented in an unsupervised manner.
A Comparative Study on Recent Automatic Data Fusion Methods †
2024, Computers
Toward a robust swallowing detection for an implantable active artificial larynx: a survey
2023, Medical and Biological Engineering and Computing

View all citing articles on Scopus

View full text

Swallow segmentation with artificial neural networks and multi-sensor fusion

Abstract

Introduction

Section snippets

Objectives

Signal acquisition

Effects of multi-sensor fusion

Discussion

Conclusions

Conflict of interest statement

Acknowledgements

Medical Engineering & Physics

Image and Vision Computing

Computer Methods and Programs in Biomedicine

Archives of Physical Medicine and Rehabilitation

CGIP

Pattern Recognition Letters

Assessment of dysphagia with the use of pulse oximetry

Dysphagia

A radial basis classifier for the automatic detection of aspiration in children with dysphagia

Journal of NeuroEngineering and Rehabilitation

Reliability and validity of cervical auscultation: a controlled comparison using videofluoroscopy

Dysphagia

Surface electromyographic characteristics of swallowing in dysphagia secondary to brainstem stroke

Dysphagia

Respiratory patterns associated with swallowing. Part 2. Neurologically impaired dysphagic patients

Age and Ageing

ECG signal analysis through hidden markov models

IEEE Transactions on Biomedical Engineering

A three-channel microcomputer system for segmentation and characterization of the phonocardiogram

IEEE Transactions on Biomedical Engineering

A new statistical approach for the automatic segmentation of continuous speech signals

IEEE Transactions on Acoustics, Speech, and Signal Processing

Interpreting the sounds of swallowing: fluid flow through the cricopharyngeus

Annals of Otology, Rhinology, and Laryngology

Time and time-frequency characterization of dual-axis swallowing accelerometry signals

Physiological Measurement

Investigating the stationarity of paediatric aspiration signals

IEEE Transactions on Neural and Rehabilitation Engineering

Evaluation and treatment of swallowing disorders

Deglutition

Physiological Reviews

Brain stem control of swallowing: neuronal network and cellular mechanisms

Physiological Reviews

Muscle sound: bases for the introduction of a mechanomyographic signal in muscle studies

Critical Reviews in Biomedical Engineering