The accurate estimation of respiratory rate (RR) is crucial for assessing the respiratory system’s health in humans, particularly during auscultation processes. Despite the numerous automated RR estimation approaches proposed in the literature, challenges persist in accurately estimating RR in noisy environments, typical of real-life situations. This becomes especially critical when periodic noise patterns interfere with the target signal. In this study, we present a parallel driver designed to address the challenges of RR estimation in real-world environments, combining multi-core architectures with parallel and high-performance techniques. The proposed system employs a nonnegative matrix factorization (NMF) approach to mitigate the impact of noise interference in the input signal. This NMF approach is guided by pre-trained bases of respiratory sounds and incorporates an orthogonal constraint to enhance accuracy. The proposed solution is tailored for real-time processing on low-power hardware. Experimental results across various scenarios demonstrate promising outcomes in terms of accuracy and computational efficiency.
Hinweise
Antonio J. Muñoz-Montoro, Juan Torre-Cruz, Francisco J. Canadas-Quesada and José Ranilla have contributed equally to this work.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1 Introduction
Breath sounds have served as a valuable diagnostic tool for assessing the overall health of the various components that constitute the respiratory system. These sounds offer crucial insights into a range of disorders, as recognized by the World Health Organization (WHO).1 Owing to the inherent periodicity associated with the act of breathing [1], it is possible to derive a vital metric known as the respiratory rate (RR). The RR serves as an essential indicator of the physiological well-being of this vital system. The resting RR typically falls within the range of 12–16 respiratory cycles per minute (rpm).2 Deviations from these established thresholds, whether above or below, can signal underlying health issues.
The estimation of RR based on the analysis of respiratory sounds has been studied during the last decades [2‐6] by means of source separation techniques such as low-rank tensor (LRT) [7, 8], synchrosqueezing transform (SST) [9], empirical mode decomposition (EMD) [10], convolutional neural networks (CNN) [11, 12], and long-short term memory (LSTM) [13]. The scientific community is making continuous research efforts to develop signal processing and artificial intelligence (AI) approaches applied to respiratory sounds. These efforts aim to automate specific biomedical sound tasks focused on apnea detection [14, 15], apnea severity [16‐18], asthma severity [19, 20], adventitious respiratory sound classification [21‐23], primarily addressing wheezing detection [24‐26], wheezing classification [27‐29], crackle detection [30‐32], and crackle classification [33‐35]. Specifically, numerous scientific studies have established a direct correlation between RR and various organic processes, as well as critical health crises [2, 4]. These investigations have linked RR variations to diverse medical conditions, including, but not limited to, respiratory infections [36], chronic obstructive pulmonary disease (COPD) [37, 38], agonal breathing [39], drug overdose [40], COVID-19 [41], and several heart disorders [42, 43]. Despite these advancements in understanding the significance of RR, many medical professionals still rely on manual measurements of RR [44], being yet this traditional approach the most inaccurate measurement [45]. Remarkably, reliable, continuous monitoring and standardized RR assessment systems have not yet been deployed within healthcare institutions, even though such systems could significantly expedite the diagnostic process [46]. In this sense, estimating RR through breath sounds offers numerous advantages. It allows for non-invasive data collection, eliminating the need for intrusive procedures and expensive medical equipment. Moreover, it is a promising avenue for real-time monitoring, enabling healthcare professionals to make timely clinical decisions.
Anzeige
While several techniques have been developed for RR estimation, some have emerged as reliable gold standards for further analysis, such as capnography [47] and respiratory inductance plethysmography (RIP) [48]. These technologies have proven their efficacy in delivering accurate RR measurements, making them essential in various clinical settings. However, the primary drawback of these techniques is the requirement for specialized instrumentation, often involving invasive procedures. Moreover, these advanced technologies may not be readily accessible in developing countries and economically disadvantaged regions, where access to state-of-the-art medical equipment is limited. Consequently, the need for non-invasive, cost-effective, and real-time RR monitoring solutions becomes even more critical.
The research landscape in pursuit of non-invasive RR estimation solutions has witnessed remarkable developments. Numerous audio-based approaches have been explored, each leveraging distinct methodologies and signal processing techniques. Some of these methods involve the analysis of temporal envelope [49], autocorrelation [50], machine learning algorithms such as random forest [51] and k-nearest neighbor [52], entropy-based approaches [53], harmonic product spectrum analysis [54], convolutional neural networks [55], long-short term memory networks [56], fundamental frequency extraction employing adaptive thresholding [54], and the Hilbert transform [49], among others. Recently, nonnegative matrix factorization (NMF) has been applied in the field of RR estimation, although these approaches have not used audio signals [57, 58]. The versatility of NMF and its potential to extract meaningful information from complex data sources make it an intriguing avenue for further exploration.
However, one of the principal challenges in RR estimation resides in the susceptibility of the audio-based method to acoustic interference, which can significantly restrict its practical utility in diverse real-world scenarios. Noise removal techniques aim to mitigate this issue by selectively filtering out unwanted sound sources from the input signal while retaining the essential information of interest. Within the domain of auscultation sound recordings, various noise removal methods have been explored, including adaptive filters [59] and spectral subtraction [60]. These techniques have demonstrated their effectiveness in enhancing the signal quality in noisy environments. Other popular approaches use NMF for noise removal in auscultation recordings [61]. NMF has shown remarkable promise in this context, particularly in scenarios where the target sound source exists in a multi-channel setup, with the target sound present in only one of these channels [26]. This approach has gained significant traction, even in recent research focused on the auscultation of newborns [62]. NMF has the capacity to learn and recognize spectral characteristics of specific target sounds while disregarding unwanted interference makes it a robust tool for noise mitigation.
In this work, we propose a novel and efficient system for the real-time monitoring of RR. Our approach centers around a signal model based on NMF, specifically designed to effectively remove the typical noise encountered in real-world acoustic signals recorded during the process of auscultation. This system combines the power of multi-core architectures with parallel and high-performance techniques, aiming to deliver a low-latency RR estimation tool. Our proposal not only addresses the pressing need for accurate RR estimation but also caters to the demands of healthcare professionals and patients for reliable and real-time health monitoring. The promising outcomes of our research demonstrate the system ability to achieve reliable real-time RR estimation, underscoring its potential to monitor this vital physiological parameter.
Anzeige
The structure of the remainder of this article is organized as follows: In Sect. 2, we delve into the problem formulation and introduce the traditional NMF algorithm as a background for our work. Section 3 provides a comprehensive overview of our proposed respiratory rate estimation system, shedding light on its innovative features and functionalities. The computational aspects, including the complexity of the algorithm developed in this study, are meticulously presented in Sect. 4. Section 5 shows the outcomes of our experiments, encompassing assessments of accuracy, execution time, and overall system efficiency. Finally, in Sect. 6, we provide the concluding remarks for this study.
2 Background
This section presents the problem formulation used in this study and describes the standard NMF algorithm as a foundation for our work.
2.1 Problem formulation
The problem considered in this work is the estimation of the RR from respiratory sounds recorded by a digital stethoscope. Typically, a stethoscope records these respiratory sounds which are simultaneously mixed with the ambient noise surrounding the subject. Thus, the observed discretized signal x[n] can be expressed as
where x(f, t), r(f, t) and e(f, t) represent the magnitude spectrograms of x[n], r[n] and e[n], respectively. Note that \(f \in [1, F]\) and \(t \in [1, T]\) denote the frequency bin and time index (or time frame), respectively. Collecting F frequency bins and T time frames, we define the magnitude spectrogram matrices \(\textbf{X} \in \mathbb {R}_+^{F\times T}\), \(\textbf{R} \in \mathbb {R}_+^{F\times T}\) and \(\textbf{E} \in \mathbb {R}_+^{F\times T}\), where \(\textbf{X} = \left[ \textbf{x}_1,\dots , \textbf{x}_t, \dots , \textbf{x}_T \right]\) and \(\textbf{x}_t = \left[ x(1,t),\dots , x(f,t), \dots , x(F,t) \right] ^\text {T}\), where \(^\text {T}\) denotes the transpose operator. The signals \(\textbf{r}_t\) and \(\textbf{e}_t\) follow the same definition as \(\textbf{x}_t\), while \(\textbf{R}\) and \(\textbf{E}\) are similarly defined to \(\textbf{X}\).
2.2 Nonnegative Matrix Factorization
NMF is a powerful mathematical technique [63] widely used in data analysis, signal processing, and machine learning. In the audio processing field, NMF typically aims to factorize a magnitude spectrogram, broadly denoted as \(\textbf{X}\in \mathbb {R}_+^{F\times T}\), into two nonnegative matrices, \(\textbf{B}\in \mathbb {R}_+^{F\times K}\) and \(\textbf{G}\in \mathbb {R}_+^{K\times T}\), such that \(\textbf{X} \approx \textbf{B}\textbf{G}\). Here, \(\textbf{B}\) represents the basis matrix whose K columns are meaningful elements called basis functions (or spectral patterns), and \(\textbf{G}\) corresponds to the activation matrix that shows the temporal activity for each individual basis function. Mathematically, this factorization can be formalized as an optimization problem as follows,
where \(\Vert \cdot \Vert _F\) denotes the Frobenius norm, and \(\textbf{B} \ge 0\) and \(\textbf{G} \ge 0\) indicate nonnegativity constraints. Note that alternative divergences [64], such as the Kullback–Leibler distance, can be employed in place of the Euclidean distance used here, which is computed using the Frobenius norm.
In the original formulation of NMF [63], this optimization problem is minimized using an iterative approach based on the gradient descend algorithm. The multiplicative update rules are obtained by applying diagonal rescaling to the step size of the gradient descent algorithm. Thus, the multiplicative update rule for each parameter is given by
where \(\odot\) represents the Hadamard (element-wise) multiplication.
Note that while NMF is valuable, it poses significant computational demands, which have motivated the development of efficient algorithms to reduce its execution time (see [65‐69]).
3 Proposed method for RR estimation
In this work, we introduce a novel system for RR estimation based on the NMF. Note that our approach builds upon prior work presented in [70]. With this starting point, the proposed approach entails a signal model enriched with pre-learned spectral patterns of respiratory sounds, enabling the system to effectively distinguish between respiratory and noisy sounds. Furthermore, within the proposed NMF framework, we incorporate an orthogonal constraint to enhance the learning and further estimation of these spectral patterns, which we will refer to as orthogonal NMF (ONMF).
To estimate RR, we leverage the periodicity principle inherent in respiratory sounds. The proposed algorithm is outlined in Fig. 1, highlighting three main stages: signal preprocessing, signal factorization, and RR estimation. The subsequent subsections provide a comprehensive overview of each stage core functions and processes.
As a result, we offer a software solution that fulfills two crucial requirements: mobility and real-time applicability. Consequently, our design considers the constrained memory resources and limited computational power of cost-effective handheld devices, making it an ideal candidate for implementation in healthcare services. This achievement is made possible by leveraging the capabilities of parallel architectures, enabling efficient and accurate real-time RR estimation.
Fig. 1
Block diagram of the proposed algorithm for RR estimation. The notation x[n] represents the input signal, while the magnitude spectrogram of x[n] is denoted by the variable \(\textbf{Y}\). The variable \(\textbf{G}_r\) is used to refer to the activation matrix of the respiratory signal, and the symbol \(\gamma\) denotes the estimation of the RR
×
3.1 Signal preprocessing
In this stage, the input signal is prepared for the subsequent sound separation process by following several steps. Firstly, as discussed in Sect. 2.1, we compute the magnitude spectrum of the input signal using STFT. This transformation provides the time-frequency representation of the input signal x[n]. Then, to ensure the robustness and independence of our algorithm from specific input spectrograms and observation intervals, we apply the \(\ell _1\)-norm to the computed spectrogram \(\textbf{X}\). This normalization step enhances the algorithm adaptability and reliability, making it less sensitive to variations in the input data.
Next, we further refine the signal by implementing a band-pass filter. This filter restricts the frequency range of the spectrogram to a band spanning 300–2000 Hz, a spectral interval where most of the respiratory content can be found [71, 72], given that cardiac and other body sounds lay under 300 Hz and, yet, most of the lung noises can be found above this threshold.
Once these preprocessing steps are completed, the output magnitude spectrogram, denoted as \(\textbf{Y}\in \mathbb {R}_+^{F\times T}\), is adapted for the subsequent factorization stage.
3.2 Signal factorization
As discussed in previous sections, any stethoscope captures respiratory sounds in conjunction with ambient noise, which significantly limits the effectiveness of audio-based RR estimation methods. In this stage of the proposed algorithm, we focus on decomposing the pre-processed signal \(\textbf{Y}\) using NMF to address the removal of noisy sounds. To this end, we propose the following NMF signal model:
where \(\textbf{B}_r\in \mathbb {R}_+^{F\times K_r}\) and \(\textbf{B}_e\in \mathbb {R}_+^{F\times K_e}\) represent the basis matrices associated with respiratory and noisy sounds, respectively. Instead, \(\textbf{G}_r\in \mathbb {R}_+^{K_r\times T}\) and \(\textbf{G}_e\in \mathbb {R}_+^{K_e\times T}\) are the activation matrices. Note that \(K = K_r+K_e\).
In this work, we propose an initialization strategy for the basis matrix \(\textbf{B}_r\). To guide and inform the sound separation process, \(\textbf{B}_r\) is initialized with pre-learned spectral patterns of a randomly selected set of subjects. Thus, we ensure that the noisy spectral components are exclusively modeled by the basis matrix \(\textbf{B}_e\), which is initialized randomly. This initialization approach empowers the factorization process to efficiently disentangle respiratory sounds from ambient noise. Furthermore, given that NMF exclusively facilitates signal-level reconstruction without ensuring that the factorized spectral patterns or activations represent sound characteristics observed in real-world, we introduce an orthogonal constraint to enhance the factorization process. This constraint is designed to minimize redundancy among spectral patterns and facilitate the representativeness of these spectral patterns to the real-world sounds [73, 74]. The goal is to factorize the input sound signal into a collection of spectral patterns that truly represent their underlying characteristics. This approach reduces the number of patterns required for the factorization.
In this regard, with the inclusion of this constraint, the objective cost function to minimize can be formulated as follows:
where \(\lambda\) is a weighting parameter to control the influence of the orthogonality penalty and \(\text {tr}(\cdot )\) is the matrix trace operator, which can be computed as \(\text {tr}(A) = \sum ^M_{m=1} a_{mm}\). Note that the minimization is performed with respect to \(\textbf{B}_e\), \(\textbf{G}_r\) and \(\textbf{G}_e\); meanwhile, \(\textbf{B}_r\) remains fixed during this factorization process.
Thus, the multiplicative update rules for each parameter are defined as follows:
where \(\textbf{O}\in \mathbb {R}_+^{K_e\times K_e}\) denotes an all-ones square matrix of size \(K_e\). Note that, using the notation from Eq. (6), where \(\textbf{B}=[\textbf{B}_r \; \textbf{B}_e]\) and \(\textbf{G}=\begin{bmatrix} \textbf{G}_r \\ \textbf{G}_e \end{bmatrix}\), the update rule in Eq. (9) is identical to that in Eq. (5). Simultaneously, the rule in Eq. (8) can be viewed as equivalent to the rule in Eq. (4), considering only the corresponding \(K_e\) bases and including the additional terms derived from the orthogonal constraint. That is,
In this stage, we focus on the estimation of RR by leveraging the intrinsic periodic nature of the breathing signal. This periodicity can be directly extracted from the activation matrix \(\textbf{G}_r\) associated with respiratory sounds. The goal is to identify the most repetitive pattern within each activation vector \(\textbf{g}_r(k)\), where \(\textbf{G}_r = [\textbf{g}_r(1); \dots ;\textbf{g}_r(K_r)]\) and \(k\in [1,\dots ,K_r]\). To achieve this, we apply a smoothing window to each \(\textbf{g}_r(k)\), eliminating transients with lengths below 200 ms, which are unlikely to be related to breathing events. Subsequently, we compute the fast Fourier transform (FFT) of each activation vector \(\textbf{g}_r(k)\), providing us with the periodic information present in the respiratory activation matrix. Given the constraint on respiratory rate, confined within the range of 6 to 60 breaths per minute (bpm), we select the frequency component \(f_{RR}\) with the highest amplitude from the computed FFTs within the \([0.1-1]\) Hz band. This component can serve to compute an initial estimation \(\gamma \in \mathbb {R}_+\) of RR in bpm as follows,
However, a more precise estimation is necessary because a double periodicity level can be detected: respiratory stage (isolated inspiration or expiration) and respiratory cycle (inspiration followed by expiration). In this manner, the presence of these multiple periodicities can lead to ambiguities in RR estimation due to the proposed algorithm might mistakenly interpret the entire cycle as an inspiration or expiration stage, potentially resulting in an RR estimate that is double or half of the true RR. To overcome this ambiguity, we propose to analyze the half frequency \(f_1=f_{RR}/2\) and the double frequency \(f_2=2\;f_{RR}\), associated to the predominant frequency \(f_{RR}\). Preliminary analysis indicated that: (i) when the amplitude of the component located in \(f_1\) is smaller compared to that one located in \(f_2\), the value \(f_{RR}\) is considered as correct; so, \(\gamma = 60 \;f_{RR}\); (ii) when the amplitude of the component located in \(f_1\) is higher compared to that one located in \(f_2\), the value \(f_{RR}\) is updated as \(\frac{f_{RR}}{2}\); so,
In this section, we focus on the computational aspects of our approach, presenting key design decisions and offering cost estimations for the stages detailed in Sect. 3.
Regarding the computational complexity of the preprocessing stage, it is primarily influenced by the STFT computation. Following [26], a coarse-grained strategy has been implemented for parallel STFT computation, resulting in the following complexity:
where p denotes the total number of used cores. The remaining lower-level operations within this stage, such as the \(\ell _1\)-norm, are addressed using, whenever feasible, BLAS Level-1 subroutines [75].
The factorization stage based on the standard NMF involves the iterative application of update rules (9) and (10). The complexity of these rules is determined by matrix products, which have been implemented using BLAS Level-3 subroutines. Therefore, the computational complexity per iteration in the sequential version is:
$$\begin{aligned} O \left( TF(K+K_e)+T(K^2+K_e^2)+F(K^2+K_e^2)+TK+FK_e \right) . \end{aligned}$$
(14)
Note that matrix products are performed in the optimal way, i.e., in the direction that involves the fewest number of operations. Considering that T is at least one order of magnitude greater than F, and F is an order of magnitude higher than K, the previous expression can be approximated as follows:
$$\begin{aligned} O \left( TF(K+K_e) \right) . \end{aligned}$$
(15)
In this sense, when dealing with the proposed method, the computational difference compared to the standard NMF arises from the element-wise additive/multiplicative operations and the \(\textbf{B}_e\textbf{O}\) matrix-matrix product in Eq. (8). The \(\textbf{B}_e\textbf{O}\) product is treated as a matrix–vector product; therefore, the excess computational load is \(O(FK_e)\), which should be negligible compared to the dominant terms arising from matrix products. Consequently, the bound is equivalent to that of Eq. (10). Thus, the complexity of sequential version for each iteration of both methods can be expressed as:
$$\begin{aligned} O \left( T F (K + K_e) \right) , \end{aligned}$$
(16)
while the complexity of parallel version is:
$$\begin{aligned} O \left( \frac{T F(K + K_e)}{p} \right) . \end{aligned}$$
(17)
The RR estimation stage encompasses several steps as previously described in Sect. 3.3. Firstly, the smoothing process is applied to all the \(K_r\) activation rows in \(\textbf{G}_r\). Note that in this process, windows of size W are applied, generating \((T-W+1)\) chunks per activation row. Therefore, the computational cost of this step is:
$$\begin{aligned} O \left( (T-W+1)W K_r \right) . \end{aligned}$$
(18)
However, this processing can be parallelized, taking into account the independence of the processing of each row. Additionally, considering that W is three orders of magnitude smaller than T and that BLAS Level-1 subroutines are used, the computational cost of this step can be approximated by:
$$\begin{aligned} O \left( \frac{T K_r W}{p} \right) . \end{aligned}$$
(19)
Next, the FFT computation of each activation row is performed. Thus, the total cost of this process, using the same strategy as in the preprocessing stage, can be expressed as:
Then, the search for the frequency component \(f_{RR}\) in the FFT matrix of \(\textbf{G}_r\) (see line (11) in Algorithm 1) has the following complexity:
Finally, the RR estimation (see line 12 in Algorithm 1) involves two additions of dimension \(K_r\), resulting in \(O\left( K_r \right)\).
Based on the previous equations and considering N as the number of iterations of the factorization algorithm, the overall complexity can be expressed as:
As mentioned, T is at least one order of magnitude greater than F, which is an order of magnitude higher than K. Furthermore, considering that W is smaller than \(K_r\), and that N is of the same order of magnitude as F, Eq. (23) can be approximated by:
In this section, we conduct an experimental evaluation of the proposed algorithm. The code for the system is made available.3
5.1 Experimental dataset
In this study, we used the RRinervasO dataset.4 The dataset comprises 155 audio recordings sampled at 4.5 kHz, each of one minute length. These recordings were obtained from 31 healthy participants, aged 18 to 25 years, who used a prototype of a digital stethoscope [61, 70]. During the recordings, participants were instructed to maintain five different RR, ranging from 8 to 20 breaths per minute (bpm).
The dataset was divided into two sets, one for training and the other for testing. The training dataset included 10 signals for each RR value. Note that this training dataset was used to initialize \(\textbf{B}_r\). Figure 2 illustrates the pre-trained bases that were generated following the training process based on the same orthogonal constraint NMF approach described in Sect. 3. The remaining sound excerpts were mixed with noisy signals including sounds from ambulances, semaphore signals for the visually impaired, alerts, drums, fire alarms, and street noises, using different signal-to-noise (SNR) ratios from -5 to 15 dB.
Fig. 2
Pre-trained bases of respiratory sounds used in the initialization of \(\textbf{B}_r\)
×
To intensify the computational load in the performance experiments of the proposed algorithm, multiple chunks were concatenated to create audio files with lengths of 300, 600, 900, 1800 and 3600 s.
For accuracy comparative analysis, we will provide results for the algorithm presented in [76], which has also served as a baseline in other studies, including [49], and the HMBP algorithm presented in [77] which has obtained a promising performance in respiratory rate estimation in different environments.
5.2 Experimental setup
In this work, the time-frequency representation was obtained using 1024-point STFT. The frame size were set to 56.9 ms and an overlapping of \(50\%\) was selected. Concerning the signal factorization approach, we conducted a preliminary analysis to determine the optimal values for parameters involved in this stage. As a result, a total of 40 bases were chosen, with 25 dedicated to \(K_r\) and 15 to \(K_e\). Additionally, we observed that, for the proposed signal model, the reconstruction error converges after 100 iterations, and setting \(\lambda\) to 0.1 in Eq. (7) yields to the most accurate results.
The computational experiments were carried out on a NVidia Jetson AGX Xavier development kit with Ubuntu Linux 18.04.1 LTS. This is an eight-core ARM v8.2 64-bit CPU, which supports different kinds of running modes (configurable with the NVPModel command tool). We can set different number of running cores (1, 2, 4, 6 and 8) and CPU frequencies. Using the NVPModel tool, different configurations have been tested in the experimentation, ranging from a single core at the lowest frequency (0.12 GHz) to 8 cores at the maximum frequency (2.27 GHz), enabling the simulation of different low-power systems as smartphones or other embedded hardware and system-on-chip (SoC). With regard to the software, the FFTW5 library and the GNU C Compiler 7 with OpenMP specification 4.5 are used. In relation to BLAS implementations, we conducted tests using OpenBLAS,6 BLIS [78], ARMPL,7 and the high-performance micro-kernels for general matrix multiplication as described in [79].
5.3 Results
In this section, we present the results obtained, categorized into two main blocks: 1) the accuracy measurements of our algorithm compared to the baseline methods, AR and HMBP, and 2) the computational performance of the proposed algorithm.
5.3.1 Accuracy results
The metric used for the accuracy experiments is the absolute error \(\epsilon\) expressed in bpm, which is suitable for the nature of the estimation problem. It can be formulated as:
The initial experiment aimed to evaluate the influence of pre-trained bases on the proposed system. Figure 3 illustrates the outcomes achieved by both the blind and pre-trained variants of the proposed algorithm. The significant reduction in absolute error across all tested RR underscores the substantial impact of the pre-training process on this dataset. These findings emphasize the importance of proper system initialization to attain reliable RR estimation results.
Fig. 3
Absolute error, expressed in bpm, obtained by the blind and the pre-trained variants of the proposed algorithm as a function of RR
×
Next, we conduct a comparative analysis between our pre-trained algorithm and the baseline algorithms, AR [76] and HMBP [77]. The absolute errors for different RR are summarized in Table 1. Note that these results represent the average across all SNR levels in the dataset. The AR and HMBP algorithms exhibit a poor performance with an average error of 16.22 bpm and 5.58 bpm, respectively. This can be largely attributed to the inherently noisy nature of the tested mixtures. In contrast, our proposed system consistently outperforms the baseline algorithms, achieving an absolute error of less than 0.5 bpm for all tested RR. This underscores the robustness of our system and its capability to provide accurate RR estimations in real-world environments since the proposed ONMF approach shows a promising performance in recovering repetitive temporal structures hidden in respiratory sounds.
Table 1
Absolute errors, measured in bpm, as a function of RR
RR (bpm)
8
10
12
18
20
Average
ONMF
0.02
0.03
0.23
0.05
0.05
0.07
AR
16.13
8.2
20.22
17.52
19.03
16.22
HMBP
5.98
4.99
4.57
5.81
6.54
5.58
Note that a similar experiment was conducted comparing our proposal with the standard NMF. The obtained results demonstrated that the ONMF algorithm exhibits lower average absolute errors than the NMF.
5.3.2 Performance results
Table 2
Execution times measured in seconds as a function of audio length, number of cores and clock frequency used
# cores
1
2
4
8
Dur. (s)
Fact
Total
Fact
Total
Fact
Total
Fact
Total
Max. freq.
60
1.03
1.07
0.62
0.65
0.37
0.39
0.27
0.28
300
4.93
5.11
2.84
2.95
1.59
1.67
1.01
1.06
600
9.91
10.25
5.57
5.76
3.07
3.19
1.86
1.93
900
14.80
15.36
8.36
8.67
4.50
4.70
2.67
2.80
1800
31.49
32.55
16.81
17.42
8.82
9.28
5.11
5.35
3600
63.71
65.98
35.05
36.28
18.17
18.96
10.26
10.79
Lowest freq.
60
22.01
23.33
12.95
13.55
7.87
8.27
6.11
6.42
300
100.81
105.88
58.57
61.56
32.32
34.90
20.23
21.92
600
200.52
208.34
114.50
121.83
62.13
66.80
37.11
38.47
900
297.75
310.67
169.75
176.65
91.60
97.07
51.88
54.43
1800
605.49
628.28
335.79
352.18
181.11
189.93
102.27
107.38
3600
1175.23
1223.23
667.84
692.08
358.43
369.64
200.95
209.68
Firstly, the performance of the proposed algorithm in terms of execution times is analyzed. Table 2 presents the obtained results as a function of audio length, number of cores and clock frequency used. For simplicity, the execution times for both the entire algorithm and the factorization stage are included.
As discussed in Sect. 4, the computational cost of the factorization stage has a significant impact on the overall algorithm cost. This fact is evident in the reported results, where approximately 96% of the global cost of the algorithm stems from the factorization performed using the ONMF approach.
The execution times show a notable increase (by a factor of approximately 20) when reducing the clock frequency from 2.27 to 0.12 GHz, aligning with the expected scaling between these clock frequencies. Additionally, irrespective of the number of cores employed for estimation, the execution times are considerably lower than the input audio duration. Even in the least favorable scenario, using the lowest frequency and a single core, the system demonstrates its applicability in environments where rapid response time is crucial. For instance, for hour-long audio, estimation is completed in just 10 s using 8 cores.
Note that, after testing aforementioned BLAS implementations, BLIS proved to be the most stable. While it may not be the fastest in sequential or low-core scenarios, it exhibits superior scalability when using multiple cores. Consequently, it was selected as the BLAS library to support the subroutines, providing the results presented here.
Fig. 4
Evolution of the efficiency as a function of audio length, number of cores and clock frequency used
×
Figure 4 presents the empirical efficiency obtained, considering different audio lengths, number of cores, and clock frequencies. As observed, efficiency grows with increasing problem dimensions, i.e., when the input audio duration increases. Overall, for large sizes, the efficiency obtained with the maximum clock frequency (see Fig. 4a) is marginally higher compared to scenarios with the minimum clock frequency (see Fig. 4b). In this sense, remark that the efficiency remains stable for large audio input sizes in both cases.
Concerning the number of cores, efficiencies in the range between 80 and 95% for two cores, 70–87% for four cores, and 45–75% for eight cores have been obtained. Note that for 60-second audio samples, the achieved efficiency falls below 50%. This is attributed to the small matrix sizes, limiting the extraction of full performance from the eight cores.
Table 3
Performance loss in comparison with the theoretical peak performance
Dur. (s)
# cores
1
2
4
8
Max. freq.
60
64.8%
71.2%
75.3%
83.2%
300
63.9%
68.7%
72.1%
78.0%
600
64.2%
68.2%
71.2%
76.1%
900
64.4%
68.5%
70.7%
75.3%
1800
66.6%
68.7%
70.2%
74.3%
3600
67.0%
70.0%
71.1%
74.4%
Lowest freq.
60
67.8%
72.6%
77.5%
85.5%
300
65.3%
70.2%
73.0%
78.4%
600
65.2%
69.6%
71.9%
76.5%
900
65.2%
69.5%
71.7%
75.0%
1800
65.8%
69.2%
71.4%
74.7%
3600
64.8%
69.0%
71.1%
74.3%
Finally, the performance degradation of the proposed algorithm has been computed by comparing empirical results respect to the theoretical peak performance. Table 3 presents these results across varying audio lengths, number of cores, and clock frequencies. As observed, the performance degradation are considerable, irrespective of the considered parameters. This phenomenon stems from the limitations observed in BLIS, akin to other computational libraries, which demonstrate suboptimal efficiency in matrix product operations. The cause of this performance attenuation lies in the inherent rectangular structure of matrices, characterized by markedly disparate dimensions, where the inner dimension of the matrix product is notably small (\(K_r=15\)). Note that, for one-minute audio segments, \(K_r\) is two orders of magnitude inferior to F and T, and this difference becomes more pronounced with the increment of the audio duration.
6 Conclusion
In this work, we introduce an efficient driver for real-time RR monitoring. Our proposed system addresses RR estimation through a three-stage process, encompassing preprocessing, factorization, and RR estimation. The factorization stage relies on a NMF approach, featuring pre-trained bases and an orthogonal constraint. This factorization stage proves crucial in rendering the application suitable for real environments, effectively mitigating the influence of noise interference inherent in the acoustic scene during respiratory signal recording.
The efficacy of our proposal was assessed across diverse scenarios, showing the robustness of combining respiratory bases initialized with respiratory sounds along with ONMF. The evaluation yields promising outcomes in terms of accuracy, demonstrating consistency across varying RR and SNR, as well as efficient execution times. The proposed solution was designed for multi-core architectures and supports a wide range of devices. To our best knowledge, this contribution is the first NMF-based implementation that effectively addresses this problem, achieving a commendable balance between result reliability and computation efficiency. This accomplishment was achieved by the intensive use of parallel and high-performance computation techniques.
For future work, it would be interesting to integrate this driver into biomedical applications, facilitating the estimation of additional critical parameters.
Acknowledgements
This work was supported by MCIN/AEI/10.13039/501100011033 under the project grants PID2020-119082RB-{C21,C22}, by the Ministerio de Ciencia, Innovación y Universidades (Gobierno de España) under the grants PID2023-146520OB-{C21,C22}, and by the Gobierno del Principado de Asturias under the grant AYUD/2021/50994.
Declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.