nach oben

Mobile Networks and Applications

Open Access 19.09.2023 | Research

Research on Hybrid Data Clustering Algorithm for Wireless Communication Intelligent Bracelets

verfasst von: Jian-zhao Sun, Kun Yang, Marcin Woźniak

Erschienen in: Mobile Networks and Applications

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

Wireless communication smart bracelet data include motion data, sleep time data, heart rate and blood pressure data and positioning data, etc. These data have diversity and high complexity, and there are interconnections or interactions between the data, which have high clustering difficulty. To this end, a new data clustering algorithm is studied for wireless communication smart bracelets. The K-medoids algorithm is used to calculate the intra-cluster, inter-cluster, or overall similarity to complete the initial clustering of the bracelet data. Setting the clustering evaluation index can determine the optimal number of clusters. The data objects that are closely surrounded and relatively dispersed are selected as the initial clustering centers and combined with the new index IXB to complete the improvement of the data clustering algorithm. The test results show that the accuracy, recall, and F1 of the research algorithm for clustering the heart rate monitoring dataset, temperature monitoring dataset, energy consumption dataset, and sleep monitoring dataset are higher than 97%, which indicates that the data clustering effect of the algorithm is good.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

Wireless communication smart bracelet is a kind of wearable smart device with powerful functions and much attention from users nowadays, which can complete the recording of real-time data such as exercise, sleep, and also diet in users’ daily life, and synchronize these data with mobile phones, tablets, and iPod touch through wireless sensor networks [1], playing a role in guiding healthy life through data [2]. The widespread use of wireless communication smart bracelets has led to an increasing amount of data generated by the bracelets, which contain information about the user’s exercise trajectory, heart rate, and sleep quality. To better utilize these data, researchers have started to focus on how to perform cluster analysis on bracelet data to discover patterns and regularities. However, due to the complexity and diversity of bracelet data, traditional clustering algorithms face some challenges when dealing with bracelet data, such as high data dimensionality and data noise. Therefore, it is of great research significance to study the clustering algorithm of wireless communication smart bracelet data. After clustering the above data, permanent recording and analysis functions can be achieved to reduce its workload [3].Data clustering divides the data into some aggregation classes according to the intrinsic nature of the data, where the elements in each aggregation class have the same characteristics as much as possible, and the characteristics between different aggregation classes differ as much as possible [4]. The purpose of cluster analysis is to analyze whether the data belong to each independent grouping so that the data in one group are similar to each other and different from the data in other groups. Cluster analysis has potential advantages in the application of wireless communication smart bracelets, which can help discover the patterns of user behavior and habits, provide personalized health advice, and provide data support for health management and disease prevention.

Therefore, cluster analysis is also called unguided or unsupervised (Unsupervised) learning. The general approach of cluster analysis is to group data objects into multiple classes or clusters (Cluster), where the objects in the same cluster have high similarity to each other, while the objects in different clusters differ more. Due to the above characteristics of cluster analysis, in many applications, the data objects in a cluster can be treated as a whole after cluster analysis has been performed on the data set [5]. The K-medoids algorithm is a typical clustering algorithm in which the cluster center is obtained by calculating the average value of the objects in a cluster, and it completes the “coarse clustering” based on the distance between the data objects and the cluster center and then divides the samples from the current cluster to another more suitable cluster by repeated iterations. The core idea is to find k cluster centers so that the sum of squared distances of each data point to its nearest cluster center is minimized [6]. This method is easy to describe, simple, and fast, so its widely used for wireless communication smart bracelet data clustering. The quality of the data after clustering will have an impact on the usage rate of the data, the division of data categories, and the analysis of the actual situation of users, so how to complete the clustering of data collected by wireless communication smart bracelet is a hot research topic at home and abroad, and many classical data clustering algorithms have been proposed successively [7], such as CH, IDB, SD and so on.

At the same time, many related methods have been studied and used a lot, for example, Reference [8] proposed a data clustering algorithm based on gray system theory, extracted the causes of laboratory accidents in universities in the past six years, used gray system theory for correlation analysis, sorted out the correlation degree of factors influencing laboratory safety behavior, and through empirical analysis, the organizational system, safety education, laboratory environment management, basic safety, and occupational safety in a decreasing relationship among the factors affecting laboratory safety, thus completing the clustering of laboratory accident data. In the Reference [9], a data clustering algorithm based on the fuzzy c-mean (FCM) clustering method and robust FCM clustering information fusion (RBI-FCM) was proposed to perform edge detection on the acquired colony images and compare the detection results with classical image edges detection operators such as Roberts operator, Prewitt operator, and Canny operator, and the results verified the adopted clustering method The results verify the effectiveness of the adopted clustering method. However, both of them require relatively more iterations in the clustering process, and the clustering process may be affected by anomalous factors such as noise, resulting in unsatisfactory clustering results. Therefore, how to improve or design a more scientific and reasonable clustering algorithm is an important research direction in this field.

In this paper, a data clustering algorithm for wireless communication smart bracelets is studied to help researchers choose a suitable algorithm for wireless communication smart bracelet data and improve the accuracy and efficiency of data analysis. Using the K-medoids algorithm for preliminary clustering, and determining the optimal number of clusters based on the set clustering evaluation indicators. At the same time, by selecting tight and relatively dispersed data objects as the initial clustering center and combining with the new indicator IXB, the improvement of the data clustering algorithm was completed, improving the accuracy and completeness of data clustering. The main research contributions of this method include the following aspects:

(1)

Preliminary clustering using the K-medoids algorithm: Cluster the bracelet data using the K-medoids algorithm, grouping similar data objects into the same cluster. The K-medoids algorithm is a distance based clustering algorithm that represents data objects within a cluster by the center of the cluster, and evaluates the quality of clustering results by calculating intra cluster, inter cluster, or overall similarity.

(2)

Set clustering evaluation indicators to determine the optimal number of clusters: In order to determine the optimal number of clusters, this method sets clustering evaluation indicators, which can be used to select the optimal number of clusters. Cluster evaluation indicators can measure the performance of clustering results in terms of compactness, separability, and stability.

(3)

Select tight and relatively dispersed data objects as the initial clustering center: The improved data clustering algorithm considers the tightness and dispersion of the data objects when selecting the initial clustering center. By selecting dense and relatively dispersed data objects as the initial clustering center, it is possible to better balance the representativeness and dispersion of the clustering center, and improve the accuracy and completeness of the clustering results.

(4)

Combining the new indicator IXB to improve the data clustering algorithm: This method proposes a new internal evaluation indicator IXB, which combines the ratio of intra cluster compactness and inter cluster separation with its reciprocal to measure the quality of clustering. By combining the new indicator IXB, the improved data clustering algorithm can more comprehensively evaluate the compactness and separability of clustering results, and has certain advantages in mixed data clustering tasks.

2 Improvement of data clustering algorithm

2.1 Original setting

Due to the increasing amount of wireless communication smart bracelet data, including heart rate, step count, sleep quality, etc., and the existence of more similar data, to achieve effective organization and management of bracelet data, classify bracelet data using K-medoids algorithm and use principal component analysis to reduce its dimensionality, which can divide similar data points into the same category and transform the high-dimensional bracelet data into low-dimensional feature vectors, thus reducing the data dimensionality and simplify the data processing process, thus helping to visualize and understand the bracelet data.

The K-medoids algorithm has simplicity and effectiveness [10]. If the wireless communication smart bracelet data training set is X and it has N categories, i.e., ${C_1},{C_2},\dots,{C_N}$ the total number of X is M. In the training phase of the K-medoids clustering algorithm, the bracelet data training set X is firstly classified, and then use principal component analysis to reduce the dimensionality of its feature items, and finally the bracelet data training set is represented as a feature vector ${B_i}={\left\{ {{X_1},{X_2},\dots,{X_n}} \right\}^T}\left( {0< i \leq M} \right)$; in the classification phase of the K-medoids algorithm, it is firstly required to follow the In the training phase, the bracelet data B to be classified is represented as a feature vector $B={\left\{ {{X_1},{X_2},\dots,{X_n}} \right\}^T}$, and the training set X of wireless communication smart bracelet data includes heart rate monitoring data set, body temperature monitoring data set, energy consumption data set, and sleep monitoring data set, from which the K data that are most similar to the bracelet data B to be classified are identified ${B_i}={\left\{ {{X_1},{X_2},\dots,{X_n}} \right\}^T}\left( {0< i \leq K} \right)$, and the categories of these K nearest neighbor data are used as candidate categories, and finally the affiliation of the bracelet data B to be classified is calculated in The K-medoids algorithm is performed as follows:

Step 1: Classify the training set of bracelet data using K-medoids clustering algorithm.
Step 2: Use principal component analysis to reduce the dimensionality of feature items in the bracelet training set data. This method projects the original data to a new coordinate system in a low dimensional space through linear transformation, which can effectively reduce the dimensions of the bracelet data, reduce the complexity of data processing, and retain more information.
Step 3: Represent the training set data as a feature vector and use it as the data to be processed.
Step 4: The processing of the data to be classified from Step 1 to Step 3 is carried out.
Step 5: Using the vector angle cosine formula to calculate the similarity between the classified bracelet data B and the bracelet training set data Bi, this method has the following advantages:
(1)

Similarity measurement is simple and intuitive: The cosine formula of the included angle measures the angle between vectors. The closer the similarity value is to 1, the more similar the two vectors are, and the closer it is to 0, the less similar the two vectors are, making the similarity measurement intuitive and interpretable.

(2)

Applicable to high-dimensional space: The cosine formula of included angle can be applied to vector similarity calculation in high-dimensional space, and is suitable for high-dimensional features of bracelet data in this study.

(3)

Not affected by vector length: The angle cosine formula only focuses on the direction of the vector without considering its length, so it is not affected by the absolute range of changes in the data.
By calculating the cosine value of the angle between two vectors to evaluate their similarity, the similarity measurement is intuitive and interpretable. The specific calculation formula is:

$$sin\left( {B,{B_i}} \right)=\frac{{\sum\limits_{{i=1}}^{n} {{X_j}{X_{ij}}} }}{{\sqrt[2]{{\sum\limits_{{j=1}}^{n} {X_{j}^{2}} }}\sqrt[2]{{\sum\limits_{{j=1}}^{n} {X_{{ij}}^{2}} }}}}$$

(1)
where, ${X_j}$ represents the feature vector of bracelet data B; ${X_{ij}}$ represents the feature vector of the bracelet data B_i.

Step 6: Select the most similar data to bracelet data B to be classified, i.e., the K largest data in $\sin \left( {B,{B_i}} \right)$ is used as the nearest neighbors.
Step 7: Based on these K nearest neighbors, the affiliation of the bracelet data B to be classified in each category is calculated. The calculation formula is:

$$p\left( {B,{C_m}} \right)=\sum\limits_{{i=0}}^{k} {\sin \left( {B,{B_i}} \right)\delta \left( {{B_i},{C_m}} \right)} \left( {0 < i \leq N} \right)$$

(2)
where, $\delta \left( {B,{C_m}} \right)$ indicates that the value is 1 if the bracelet data B to be classified belongs to the category${C_m}$ and 0 and vice versa. The formula for $\delta \left( {B,{C_m}} \right)$ the above formula is

$$\delta \left( {B,{C_m}} \right)=\left\{ {\begin{array}{*{20}{c}} {1,}&{{B_i} \in {C_m}} \\ {0,}&{{B_i} \notin {C_m}} \end{array}} \right\}$$

(3)

Step 8: Select the category with the highest affiliation ${C_m}$ and assign the bracelet data to be classified B to that category ${C_m}$.

The above steps complete the initial clustering of smart bracelet data.

2.2 Indicator setting

To further optimize the clustering results, this section sets internal evaluation metrics for the clustering of the heart rate monitoring dataset, temperature monitoring dataset, energy expenditure dataset, and sleep monitoring dataset. Internal evaluation metrics can measure the tightness and separation of clusters as well as the stability of clusters to help determine the optimal number of clusters and thus optimize the data clustering algorithm. Internal evaluation metrics are used to evaluate the merit of the clustering effect by calculating the intra-cluster, inter-cluster, or overall similarity without involving any external information and relying only on the dataset’s own characteristics and metric values. The main idea of the established internal evaluation metrics is measured by some form of the ratio of intra-cluster distance, inter-cluster distance, or average distance of the sample set [11, 12], and the ideal effect of data clustering is intra-cluster tight and inter-cluster separated. Set the collected sample set of wireless communication smart bracelet data as $X=\left\{ {{x_1},{x_2},\dots,{x_i},\dots,{x_N}} \right\}$$\left| X \right|=N$, the number of features of each sample of smart bracelet data as p, the sample set consists of k clusters, i.e. $X=\left\{ {{C_1},{C_2},\dots,{C_k}} \right\}$, the number of samples per cluster n, c is the mean center of the sample set, and the set of cluster centers $V=\left\{ {{v_1},{v_2},\dots,{v_k}} \right\}$$\left( {k<N} \right)$, d denotes the distance. The common internal evaluation metrics and their characteristics are analyzed as shown below:

(1)

DB index (Davies-Bouldin Index): the DB index first takes the sum of the average distance between each bracelet data sample and the cluster center in 2 adjacent clusters as the intra-cluster distance, and the distance between 2 adjacent cluster centers as the inter-cluster distance, then takes the maximum value of the ratio of the two as the similarity of the cluster and finally takes the average of the similarity of all clusters to get the wireless communication smart bracelet data DB metrics of the sample set [13], whose expressions are:

$$DB\left( K \right)=\frac{1}{K}\sum\limits_{{i=1}}^{K} {{{\hbox{max} }_{i,j \ne i}}\frac{{\frac{1}{{{n_i}}}\sum\limits_{{x \in {C_i}}} {d\left( {x,{v_i}} \right)+\frac{1}{{{n_j}}}\sum\limits_{{x \in {C_j}}} {d\left( {x,{v_j}} \right)} } }}{{d\left( {{v_i},{v_j}} \right)}}}$$

(4)

The smaller the DB indicator, the lower the similarity between clusters and the more ideal the clustering result. However, when the overlap of the dataset is large, such as when the data are distributed in a ring, it is difficult to evaluate the clustering results effectively because the centers of the clusters overlap.

(2)

CH metric (Calinski-Harabasz): The CH metric takes the squared sum of the distance between the centroid of each cluster and the mean center of the wireless communication smart bracelet data sample set as the separation of the smart bracelet data set, the squared sum of the distance between each point in the cluster and the cluster center as the tightness within the cluster, and the ratio of the separation to the tightness as the final metric.

$$CH\left( K \right)=\frac{{\sum\limits_{{i=1}}^{K} {{n_i}{d^2}\left( {{v_i},c} \right)/\left( {K - 1} \right)} }}{{\sum\limits_{{i=1}}^{K} {\sum\limits_{{x \in {c_i}}} {{d^2}\left( {{v_i},c} \right)/\left( {N - K} \right)} } }}$$

(5)

The larger the CH indicator, the higher the degree of dispersion between the clusters and the tighter the clusters, thus indicating better clustering results of wireless communication smart bracelet data [14, 15]. From the function expression of the CH indicator, it can be seen that when the number of clusters converges to the sample capacity M, each sample forms a cluster, and the center of the cluster is each sample itself, at which time the intra-cluster distance and is approximately equal to 0, and the denominator is a minimal value, the CH indicator will tend to be maximum, and the clustering evaluation result at this time has no practical significance.

(3)

F_r indicator: To address the problem of intra-cluster distance and the possibility of convergence to 0 in the CH indicator, it can be improved by multiplying the original indicator by the factor lgK to regulate the situation where the CH has extreme values when the value of K converges to M.

$${F_r}=CH \cdot \text{lgK}$$

(6)

where: k ≥ 2, lgK is greater than 0. Therefore, the trend of the F_r indicator is similar to that of the CH indicator, i.e., the larger the value of this indicator, the better the quality of wireless communication smart bracelet data clustering.

(4)

Sil indicator (Silhouette-Coefficient): This indicator is a type of indicator used to evaluate clustering results, which combines the degree of intra cluster cohesion and inter cluster separation. The specific formula is as follows:

$$Sil\left( k \right)=\frac{1}{k}\sum\limits_{{i=1}}^{k} {\left\{ {\frac{1}{{{n_i}}}\sum\limits_{{x \in {C_i}}} {\frac{{l\left( x \right) - a\left( x \right)}}{{\hbox{max} \left[ {l\left( x \right),a\left( x \right)} \right]}}} } \right\}}$$

(7)

where: a(x) is the degree of intra-cluster cohesion of the cluster to which sample x belongs, expressed as the average of the distance between x and all its elements within the same cluster, and the cohesion degree an (x) is defined as

$$a\left( x \right)=\frac{1}{{{n_i} - 1}}\sum\limits_{{x,y \in {C_i},y \ne x}} {d\left( {x,y} \right)}$$

(8)

where: l(x) is the degree of inter-cluster separation of sample x from other clusters, expressed as the minimum value of the average distance between x and other clusters, and the separation l(x) is defined as

$$l\left( x \right)=\mathop {\hbox{min} }\limits_{{j,j \ne i}} \left[ {\frac{1}{{{n_j}}}\sum\limits_{{x \in {C_i},y \in {C_i}}} {d\left( {x,y} \right)} } \right]$$

(9)

The range of the Sil indicator is [-1, 1]. The closer it is to 1, the more correctly assigned the sample to a suitable cluster. The closer it is to -1, the more suitable the sample is to be assigned to other clusters. When the Sil index approaches 0, it indicates that the sample is located on the boundary of two adjacent clusters. When the distance within the cluster decreases, a(x) will decrease, thereby increasing the value of the molecular part (l(x) - a(x)), resulting in an increase in the distance between clusters, an increase in the Sil index value, and an ideal clustering result; therefore, to obtain the best clustering, it is necessary to reduce the distance between the points within the cluster and increase the inter-cluster distance at the same time [16, 17].

The selection of DB index, CH index, Fr index, and Sil index as internal evaluation indicators is because these internal evaluation indicators can evaluate the clustering effect by using the dataset’s own characteristics and measurement values without relying on external information. By comparing indicators such as distance, compactness, and separation within and between clusters, a quantitative approach is provided to evaluate the quality of clustering results, thereby helping to determine the optimal number of clusters and optimize data clustering algorithms.

2.3 Hybrid data clustering based on improved K-medoids algorithm

To improve the performance and effectiveness of the clustering algorithm, improving the K-medoids algorithm [18] can improve the accuracy of the initial clustering centers and thus improve the clustering results. The introduction of a new evaluation metric IXB allows a more comprehensive assessment of the quality of the clustering results, including aspects such as tightness, separation, stability, and noise robustness. The specific improvements are described below.

Since the K-medoids algorithm may have problems such as high computational complexity and slow convergence speed when dealing with large-scale data sets. To improve the performance and effectiveness of the clustering algorithm, the performance of the algorithm can be optimized and the accuracy and efficiency of the clustering results can be improved by improving the K-medoids algorithm. The improved algorithm can choose a better initial cluster center, adopt optimized distance calculation methods, provide accurate convergence criteria, and apply parallelization processing to accelerate the clustering process. This can achieve faster Rate of convergence, higher clustering accuracy and better scalability, especially when dealing with large-scale datasets.

The wireless communication smart bracelet data objects, which are closely surrounded and relatively dispersed, are selected as the initial clustering centers so that the centroids have good independence while ensuring their high density [19]. The basic idea is: firstly, the distance matrix is obtained by calculating the distance between each bracelet sample; the ratio between the sample set and the average distance of each sample is taken as the density of the sample, and the α samples with the highest density value are selected and deposited in the collection of high-density points of bracelet data H. α indicates the proportion of candidate representative points in the sample set, and the value can be specified by the user, and the value of α in the experimental session of this paper is 30%; the α value is selected from the collection H The initial centers of relatively scattered and high-density hand-ring data are deposited into the set V, i.e.,$V=\left\{ {{v_1},{v_2},\dots,{v_k}} \right\}$ each sample is assigned to the corresponding cluster according to the minimum distance, and the process is repeated until the criterion function converges. The definition and formula of the algorithm are as follows:

Definition 1: The Euclidean distance between any two points in space is defined as

$$d\left( {{x_i},{x_j}} \right)=\sqrt {\sum\limits_{{w=1}}^{p} {{{\left( {x_{i}^{w} - x_{j}^{w}} \right)}^2}} }$$

(10)
where $i=1,2,\dots,N,j=1,2,\dots,N$
Definition 2: The average distance of the wireless communication smart bracelet data object x_i is defined as the sum of the distances to all samples divided by the total number of sample sets, which is calculated as follows.

$$Dist\left( {{x_i}} \right)=\sum\limits_{{j=1}}^{N} {d\left( {{x_i},{x_j}} \right)/N}$$

(11)

Definition 3: The average distance of the sample set is defined as the sum of the distances between each bracelet data object divided by the number of all permutations of any two objects selected from the sample set.

$$DistMean=\sum\limits_{{i=1}}^{N} {\sum\limits_{{j=1}}^{N} {d\left( {{x_i},{x_j}} \right)/A_{N}^{2}} }$$

(12)

Definition 4: The density of the bracelet data object x_i is defined as the ratio of the average distance of the sample set to the average distance of the data object x_i.
$$Density\left( {{x_i}} \right)=DistMean/Dist\left( {{x_i}} \right)$$

(13)

Definition 5: The sum of the distances of the bracelet data object x_i from each data object of the cluster to which it belongs is
$$DisSum\left( {{x_i}} \right)=\sum\limits_{{j=1}}^{{{n_i}}} {d\left( {{x_i},{x_j}} \right)}$$

(14)
where ${x_i},{x_j} \in {C_t},t=1,2,\dots,k$
Definition 6: The intra-cluster distance and matrix of the cluster ${C_t}$ is
$$DisSum\left( {{C_t}} \right)=\left[ {\begin{array}{*{20}{c}} {DisSum\left( {{x_1}} \right)} \\ {DisSum\left( {{x_2}} \right)} \\ \cdots \\ {DisSum\left( {{x_n}} \right)} \end{array}} \right]$$

(15)
where $t=1,2,\dots,k$
Definition 7: The conditions under which a bracelet data object x_i is considered as a cluster center during cluster update are
$$Density\left( {{x_i}} \right)=\hbox{min} \left( {DisSum\left( {{C_t}} \right)} \right)$$

(16)
where ${x_i} \in {C_t},t=1,2,\dots,k$
Definition 8: The clustering error sum of squares E is defined as
$$E=\sum\limits_{{t=1}}^{k} {\sum\limits_{{j=1}}^{{{n_t}}} {\left| {{x_{tj}} - {v_t}} \right|} }$$

(17)
where ${x_{tj}}$ is the jth data object of the t-th cluster and${v_t}$ is the center of the t-th cluster.

When testing the clustering results of wireless communication smart bracelet data samples without prior knowledge, “tight clusters and separated clusters” is usually used as an important criterion for internal evaluation, and the distance between cluster centers is used as the inter-cluster distance, which may lead to the failure of clustering evaluation results due to the overlapping of cluster centers, etc. For this reason, this paper proposes: using the distance between the nearest boundary points of two clusters instead of the distance between cluster centers [20, 21]. To facilitate the analysis, the annular distribution dataset is illustrated, which consists of three clusters with a ring structure and the centers of each cluster are extremely close to each other, and it is known from the DB index formula that when calculating the inter-cluster distance of the wireless communication smart bracelet data set, the inter-cluster distance is bound to be close to 0 because the cluster centroids tend to overlap, and when the “close within cluster, separated between clusters " as the judgment basis, it means that the similarity between clusters is extremely high at this time, and from the perspective of clustering division, the two should be merged into one cluster to improve the quality of clustering, while in fact, this wrong division will lead to the final clustering of the ring distribution data set all merged into one cluster, which has an obvious deviation from the real results.

Based on the above analysis, this paper uses the distance between the nearest boundary points between clusters to represent the inter-cluster distance, which can ensure the structural variability of each cluster in terms of geometric features, and then avoid the phenomenon of inter-cluster distance being zero, reflect the degree of similarity between clusters to the maximum extent, overcome the problems such as the merging of clustering results of wireless communication smart bracelet data due to the overlapping of ring centroids [22], and induce the new index IXB, which definition and formula are as follows:

Definition 9: Intra-cluster compactness (Compactness) is defined as the sum of the squared distances of each sample from the centroid of the cluster to which it belongs. The substance is:
$$Comp\left( k \right)=\sum\limits_{{i=1}}^{k} {\sum\limits_{{x \in {C_i}}} {{d^2}} } \left( {{x_i},{c_i}} \right)$$

(18)
Definition 10: Separation between clusters is defined as the product of the sum of the squares of the distances between the nearest boundary points and the number of samples in the cluster:
$$Sep\left( k \right)=n \cdot \mathop {\hbox{min} }\limits_{{i,j \ne i}} {d^2}\left( {{x_i},{x_j}} \right)$$

(19)
where: x_i and x_j are the closest boundary points between clusters C_i and C ._j.
Definition 11: The IXB metric is defined as the sum of the ratio of intra-cluster tightness to inter-cluster separation and its inverse, i.e., essentially
$$IXB\left( k \right)=\frac{{Sep}}{{Comp}}+\frac{{Comp}}{{Sep}}$$

(20)
Definition 12: The optimal number of clusters k is defined as the number of clusters when IXB(k) takes its maximum value, i.e.:
$${k_{opt}}=\mathop {\arg \hbox{max} }\limits_{{{k_{\hbox{min} }} \leq k \leq {k_{\hbox{max} }}}} \left\{ {IXB\left( k \right)} \right\}$$

(21)

In the definition of the IXB indicator, Sep/Comp (i.e. inter cluster separation) represents the ratio of the average distance between each cluster to the average distance between samples within the cluster. When the number of clusters k increases, the inter cluster separation usually increases. This means that the average distance between different clusters will increase with the increase of k, and the degree of separation between clusters will also increase accordingly; Comp/Sep (i.e. intra cluster compact density) represents the ratio of the average distance between samples within a cluster to the average distance between samples between clusters. When the number of clusters k increases, the intra cluster compactness usually decreases. This means that the average distance between samples within the cluster will decrease as k increases, and the compactness within the cluster will also decrease accordingly.

So that patterns, trends, and correlations in the bracelet data can be better understood, a hybrid data clustering model for wireless communication smart bracelets is therefore constructed to provide the basis for subsequent data analysis, prediction, and decision-making [23]. To obtain the best clustering model by considering the quality and performance of the clustering results, the improved K centroid algorithm is combined with the IXB index to construct the wireless communication smart bracelet hybrid data clustering model as shown in Fig. 1, and the model is described as follows:

(1)

The Euclidean distance between any two points of the whole sample set of bracelet data, the average distance of each data object, and the average distance of the sample set are calculated according to formulas (9) to (10) in turn.

(2)

Calculate the density of each hand-ring data object according to formula (13), and select the α samples with the highest density value to deposit in the candidate initial center set H, so that k = 2.

(3)

According to formula (9), the two most distant high-density points v₁ and v₂ in the set H are stored in the initial center set V.

(4)

Select the v₃, which meets the maximum product of the distances from v₁, v₂, from H and store it into V. And so on, the initial set of centers V, $V=\left\{ {{v_1},{v_2},\dots,{v_k}} \right\}$ which is relatively dispersed and has a high density, is obtained.

(5)

In the phase of updating the cluster centers, the intra-cluster distances and matrices are obtained according to Eqs. (14) and (15), and the new cluster centers are selected according to Eq. (16), and each sample is assigned to the corresponding cluster according to the minimum distance, and the process is repeated until the criterion function Eq. (17) converges.

(6)

Use formula (20) to calculate the IXB index to evaluate the current wireless communication smart bracelet hybrid data clustering results, so that k = k + 1, and repeat step (4) until k = kmax.

(7)

The value of k, when IXB takes the maximum value, is used as the optimal number of clusters using formula (21).

At this point, the optimal clustering of smart bracelet mixed data is completed.

3 Experimental analysis

A server configured with an Intel Core i7 processor, 16GB RAM, and 1 TB hard disk is used as the experimental environment. A user’s PliBang smart bracelet was selected as the source of experimental data, and the smart bracelet met the requirements of this experiment. The heart rate monitoring data set, temperature monitoring data set, energy consumption data set, and sleep monitoring data set within 6 months, and a total of four data sets, were collected as the test data, and the data details are shown in Table 1.

Table 1

Details of the data set

Data Set Category	Number of samples	Number of attributes	Number of standard clusters
Heart rate monitoring dataset	180	14	4
Temperature detection data set	270	14	3
Energy consumption data set	570	31	3
Sleep monitoring data sets	360	35	3

Since the value of k has a large impact on the clustering results, the optimal value of k for the algorithm in this paper needs to be determined to ensure the data classification effect of the algorithm in this paper. The relationship between the value of k and IXB is tested, and the results are shown in Fig. 2.

According to the test results in Fig. 2, we can see that: with the increasing value of k, IXB shows an unstable rising trend, and the value of k ranges from 14 to 20, showing an extremely fast rising state, indicating that the IXB index has changed abnormally at this time; when k = 27, IXB reaches a great value, and the clustering quality is the best at this time, after that, with the increasing value of k again, IXB slowly decreases; when k = 32, the decline The magnitude of the decline increases significantly. The IXB indicator ensures optimal clustering partitioning by balancing the relationship between Sep/Comp and Comp/Sep. The larger the IXB, the better the clustering quality. Based on the above experimental analysis results, it can be seen that when k = 27, the IXB value is the highest, at 2.74, which is the best clustering quality.

To verify the clustering effect of the proposed algorithm, accuracy, recall, F1, and iteration number are selected as the experimental metrics and Reference [8] and Reference [9] are selected as the comparison algorithms, and the comparison experiments are set up based on the smart bracelet data in Table 1. The definition and calculation formula of the experimental metrics is shown below.

(1)

Accuracy refers to the ratio of all data classified using the automatic data classification algorithm that agrees with the manual classification result. The higher the accuracy rate indicates the better the clustering effect.

(2)

Recall is the percentage of all data classified manually that agrees with the data classified using the automatic data classification algorithm. The higher the recall indicates the better the clustering effect.

(3)

Accuracy and recall are complementary, and simply improving one indicator will lead to a decrease in the other. Therefore, the mathematical formula for considering these 2 factors together using the F1 value index.

Using the above formula, the results of accuracy, recall, and F1 of the three algorithms are obtained respectively as shown in Table 2, and the number of iterations of the three algorithms is compared notationally, and the results are shown in Fig. 3.

Table 2

Test results of accuracy, recall rate, and F1 of the three methods

Algorithm category		Heart rate monitoring dataset	Temperature detection data set	Energy consumption data set	Sleep monitoring data sets
The proposed algorithm in this paper	Accuracy/%	98.75	99.12	98.64	98.87
	Recall rate/%	97.33	98.54	97.86	99.03
	F1	98.03	98.83	98.25	98.95
The proposed algorithm in Reference [8]	Accuracy/%	90.24	90.46	91.05	90.49
	Recall rate/%	89.56	90.14	89.87	89.67
	F1	89.9	90.3	90.46	90.08
The proposed algorithm in Reference [9]	Accuracy/%	91.06	90.57	91.12	90.96
	The recall/%rate/%	90.56	91.47	90.43	90.08
	F1	90.81	91.02	90.77	90.52

According to the test results in Table 2; Fig. 3, the accuracy, recall, F1, and test results of the three algorithms on the heart rate monitoring dataset, temperature monitoring dataset, energy expenditure dataset, and sleep monitoring dataset of the experimental subjects, the algorithm of this paper has the best test results with values higher than 97%; the values of the two comparison algorithms are significantly lower than the algorithm of this paper; and the algorithm of this paper can complete the classification calculation with fewer iterations, while the two comparison algorithms need more iterations. The algorithm of this paper can complete the classification calculation with fewer iterations, while the two comparison algorithms need more iterations. It shows that the algorithm of this paper can better complete the clustering of wireless communication smart bracelet data. This is because the algorithm in this article improves the K-medoids algorithm and introduces a new indicator IXB, reducing the number of iterations and demonstrating significant advantages in wireless communication intelligent bracelet data clustering.

To test the clustering results of the algorithm of this paper, the IXB indexes after the improvement of the algorithm of this paper were selected as the experimental evaluation indexes, and the clustering results of the six indexes before the improvement were counted to obtain the final clustering results, which are shown in Table 3.

Table 3

Clustering quality evaluation results

Data Set Category	Heart rate monitoring dataset	Temperature detection data set	Energy consumption data set	Sleep monitoring data sets
Standard class number	2	2	3	2
DB	5	3	6	5
CH	3	5	2	2
F _r	6	4	2	3
Sil	5	5	2	4
IXB	3	2	3	2

According to the test results in Table 3, it can be seen that: after completing the clustering of the wireless communication smart bracelet mixed data by the algorithm in this paper, the number of clusters obtained by using the improved IXB index is closest to the standard class number, and only the heart rate monitoring data set has a small difference in the number of the four data sets, and the clustering results of the other three data sets are the same as the standard class number, and the gap between the clustering results of the other four indexes and the standard class number is relatively larger, thus ensuring the optimal clustering division of the proposed algorithm, and the larger IXB means the better quality of clustering. Therefore, it shows that the clustering quality of the improved algorithm of this paper is high, and the clustering of mixed data is completed. This is because the algorithm in this article ensures accurate clustering quantity and high clustering quality through improved IXB indicators and optimized clustering algorithms, thereby providing more accurate and high-quality clustering results. This is very valuable for the analysis and application of wireless communication intelligent bracelet data.

4 Conclusion

The widespread use of wireless communication smart bracelets needs to ensure the complete and accurate clustering of its multiple function data, while the quality of the clustered data is more closely related to the utilization of the data. An improved k-centroid algorithm is proposed, which takes the respective average distance ratio between the sample set and the samples as the density parameter of the samples and uses the maximum distance product method to select the k samples with higher density and farther distance as the initial clustering centers, taking into account the representativeness and dispersion of the clustering centers. Because of the limitations of commonly used internal evaluation indexes in cluster evaluation, the squared distance of the nearest point of the inter-cluster boundary is proposed as the separation degree of the whole sample set, and the internal evaluation index IXB is defined as the sum of the ratio of intra-cluster tightness and inter-cluster separation degree and its inverse as the metric value, which is combined with the improved K centroid algorithm type to complete the clustering of mixed data. The experimental results show that the clustering model proposed in this article can provide an accurate range of clustering numbers for the mixed data of wireless communication smart bracelets. The wireless communication smart bracelet mixed data clustering algorithm exhibits excellent performance for heart rate monitoring data, temperature monitoring data, energy consumption data, and sleep monitoring data. This indicates that the algorithm has achieved very good results in data clustering. This means that the algorithm can effectively cluster different types of monitoring data, providing users with more accurate and reliable results.

Future research directions can further optimize the computational efficiency of algorithms and extend the applicability of models, while exploring more evaluation indicators and clustering methods to improve the accuracy and practicality of data clustering.

Declarations

Foundations

The paper has not supported by any foundations.

Competing interests

The authors declare no competing interests.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Unsere Produktempfehlungen

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

Jetzt informieren

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.

Order your 30-days-trial for free and without any commitment.

Jetzt informieren

Babu MV, Sekaran R, Kannan S et al (2023) CE2RV: commissioned energy-efficient virtualization for large‐scale heterogeneous wireless sensor networks. Int J Commun Syst 36(9):e5480CrossRef

Liu W, Liu Z (2022) User experience evaluation of Intelligent Sports Bracelet based on multi-factor Fusion. Int J Prod Dev 26(26):102–116CrossRef

Nayagi DS, Sivasankari GG, Ravi V et al (2023) Fault tolerance aware workload resource management technique for real-time workload in heterogeneous computing environment. Trans Emerg Telecommun Technol 34(3):e4703CrossRef

Liu X, Fu L, Lin JCW, Liu S (2022) SRAS-net: low-resolution chromosome image classification based on deep learning. IET Syst Biol 16(3–4):85–97CrossRef

Cui Y (2019) Quality evaluation of marine statistical data on the basis of clustering algorithms. J Coastal Res 98(sp1):151MathSciNetCrossRef

Li X, Wu Z, Zhao Z, Ding F, He D (2021) A mixed data clustering algorithm with noise-filtered distribution centroid and Iterative Weight Adjustment Strategy. Inf Sci 577:697–721MathSciNetCrossRef

Nandal P, Bura D, Singh M (2021) Efficient data clustering algorithm designed using a heuristic approach. Int J Data Anal Techniques Strategies 13(1–2):3–14CrossRef

Chen L (2021) Research of the safety path of colleges and universities laboratory basing on the analysis of grey correlation degree. J Intell Fuzzy Systems: Appl Eng Technol 40(4):40CrossRef

Li XD, Wei D, Wang JS, Guo QS, Chen L (2021) Colony image edge detection algorithm based on FCM and RBI-FCM clustering methods. IAENG Int J Comput Sci 48(2):356–364

10.

Yan G, Woźniak M (2022) Accurate key frame extraction algorithm of video action for aerobics online teaching. Mob Networks Appl 27(3):1252–1261CrossRef

11.

Liu S, Xu X, Zhang Y, Muhammad K, Fu W (2023) A reliable sample selection strategy for weakly-supervised visual tracking. IEEE Trans Reliab 72(1):15–26CrossRef

12.

Elong N, Rahal SA (2021) The effect of clustering in filter method results applied in medical datasets. Int J Healthc Inform Syst Inf 16(1):38–57CrossRef

13.

Tarekegn AN, Michalak K, Giacobini M (2020) Cross-validation approach to evaluate clustering algorithms: an experimental study using multi-label datasets. SN Comput Sci 1(5):263CrossRef

14.

Hung LP, Yang DY, Wu ZJ, Chen CL (2022) Constructing a search mechanism for dementia patient based on multi-hop transmission path planning and clustering method. Mobile Netw Appl, online first. https://doi.org/10.1007/s11036-022-01938-2CrossRef

15.

Ushakov AV, Vasilyev I (2021) Near-optimal large-scale k-medoids clustering. Inf Sci 545:344–362MathSciNetCrossRefMATH

16.

Zhao J (2021) Research on network security defence based on big data clustering algorithms. Int J Inf Comput Secur 15(4):343356

17.

Hadi AS (2022) A new distance between multivariate clusters of varying locations, elliptical shapes, and directions. Pattern Recogn 129:108780CrossRef

18.

Sathyamoorthy M, Kuppusamy S, Dhanaraj RK et al (2022) Improved K-means based q learning algorithm for optimal clustering and node balancing in WSN[J]. Wireless Pers Commun 122(3):2745–2766CrossRef

19.

Liu S, Huang S, Wang S, Muhammad K, Bellavista P, Ser JD (2023) Visual tracking in complex scenes: a location fusion mechanism based on the combination of multiple visual cognition flows. Inform Fusion 96:281–296CrossRef

20.

Gong C, Su ZG, Wang PH, Yang Y (2022) Distributed evidential clustering toward time series with big data issue. Expert Syst Appl 191:116279CrossRef

21.

Jiang YW (2019) Simulation of multi-dimensional discrete data efficient clustering method under big data analysis. Computer Simulation 36(02):205–208

22.

Vovan T, Phamtoan D, Tuan LH, Nguyentrang T (2021) An automatic clustering for interval data using the genetic algorithm. Ann Oper Res 303(1/2):359–380MathSciNetCrossRefMATH

23.

Draisbach U, Christen P, Naumann F (2019) Transforming pairwise duplicates to entity clusters for high-quality duplicate detection. J Data Inform Qual 12(1):1–30CrossRef

Titel: Research on Hybrid Data Clustering Algorithm for Wireless Communication Intelligent Bracelets
verfasst von: Jian-zhao Sun
Kun Yang
Marcin Woźniak
Publikationsdatum: 19.09.2023
Verlag: Springer US
Erschienen in: Mobile Networks and Applications
Print ISSN: 1383-469X
Elektronische ISSN: 1572-8153
DOI: https://doi.org/10.1007/s11036-023-02249-w

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Internationaler Motorenkongress/© [M] ATZlive | Chisnikov / Fotolia.com, Search Icon, Banner Hanser, Gardiner von Trapp/© Alpega Group, Benny Hahn/© ZEP GmbH, Customer Experience/© © oatawa / Getty Images / iStock, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH

Springer Professional

Abstract

Publisher’s Note

1 Introduction

2 Improvement of data clustering algorithm

2.1 Original setting

2.2 Indicator setting

2.3 Hybrid data clustering based on improved K-medoids algorithm

3 Experimental analysis

4 Conclusion

Declarations

Foundations

Competing interests

Publisher’s Note

Unsere Produktempfehlungen

ATZelektronik

ATZelectronics worldwide

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.