Top

Neural Processing Letters

Published in:

Open Access 01-04-2024

A Time-Series-Based Sample Amplification Model for Data Stream with Sparse Samples

Authors: Juncheng Yang, Wei Yu, Fang Yu, Shijun Li

Published in: Neural Processing Letters | Issue 2/2024

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

The data stream is a dynamic collection of data that changes over time, and predicting the data class can be challenging due to sparse samples, complex interdependent characteristics between data, and random fluctuations. Accurately predicting the data stream in sparse data can create complex challenges. Due to its incremental learning nature, the neural networks suitable approach for streaming visualization. However, the high computational cost limits their applicability to high-speed streams, which has not yet been fully explored in the existing approaches. To solve these problems, this paper proposes an end-to-end dynamic separation neural network (DSN) approach based on the characteristics of data stream fluctuations, which expands the static sample at a given moment into a sequence of sample streams in the time dimension, thereby increasing the sparse samples. The Temporal Augmentation Module (TAM) can overcome these challenges by modifying the sparse data stream and reducing time complexity. Moreover, a neural network that uses a Variance Detection Module (VDM) can effectively detect the variance of the input data stream through the network and dynamically adjust the degree of differentiation between samples to enhance the accuracy of forecasts. The proposed method adds significant information regarding the data sparse samples and enhances low dimensional samples to high data samples to overcome the sparse data stream problem. In VDM the preprocessed data achieve data augmentation and the samples are transmitted to VDM. The proposed method is evaluated using different types of data streaming datasets to predict the sparse data stream. Experimental results demonstrate that the proposed method achieves a high prediction accuracy and that the data stream has significant effects and strong robustness compared to other existing approaches.

Supplementary file1 (RAR 1 kb)

Supplementary Information

The online version contains supplementary material available at https://doi.org/10.1007/s11063-024-11453-y.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

The data stream is the related data returned by the sensors in the engineering system after sampling the target object at a specific frequency in a time sequence. In general, a data stream is determined for collecting the data that grows dynamically over time and is used in network monitoring, sensor networks, weather forecasting, financial services, and other fields. They correspond to the attribute values of the monitored samples in the engineering system. The purpose of analyzing the data stream is to find a specific state semantic pattern and then predict whether the data flow or the entire system will enter the corresponding state in the future, such as system anomaly indicators, specific attribute values exceeding the standard, update indicators after the system upgrade, etc.

Predicting the state of the data stream in an engineering system is a difficult task for three reasons: (1) Sample sparsity PCA [1‐3]. Sparse samples inevitably exist in engineering systems. For example, when the relevant properties of samples are in a rare state and the system is in a special state, or when the system is disturbed by external factors, the number of these samples is sparse relative to the samples when the system is normal. This sparsity will hinder the training of the prediction model. During model training, it is also necessary to keep the proportion of sample amount of all categories consistent. Therefore, we need to conduct appropriate amplification of sparse samples. (2) There are complex interdependence features between data streams of different attributes [4, 5], especially in large-scale industrial systems. This interdependence feature determines the relational complexity of data streams and generates relatively high-level data semantic patterns. (3) The random volatility of the data stream. Engineering systems are susceptible to external interference such as weather, resonance, and misoperation. This disturbance spreads through data association on multiple related attribute values, causing random fluctuations in sensor data [6, 7].

The performance of data stream analysis methods has not been satisfactory. To address this problem, a pair-wise multilayer network method is developed to learn multi-valued category features and enhance network robustness by learning interaction information within and between features [8, 9]. A Midway NN network is generated to perform the automatic feature on high-dimensional log data streams which converts the input feature vectors of specific windows into dense intermediate features and caches them for incremental learning and predictive analysis. The number of Substation samples is determined as 10,000, as well as the number of features obtained as 163 [10].

The Temporal augmentation network (TAM) method is proposed to extend the static sample in a series of data streams by utilizing the data stream state transition method. The Variance Detection Module (VDM) detects the variance of the input data stream and adjusts the variations obtained among samples to improve accuracy. Although existing methods have shown good performance, they still face some issues. For example, the Sparse Representation-based Adaptive Graph Learning and Adaptive Weighted Cooperative Learning (SRAGL-AWCL) method fails to determine the generalization of learning methods, and it cannot handle the data obtained in TBM conditions for machine vibration. In some cases, data augmentation methods are not efficient in time series data, and it is difficult to learn a unified sequence strategy from complicated domains, which leads to generating phase errors. These limitations are addressed by the proposed TAM method, which is based on sparse sample augmentation and characteristics of data stream fluctuations.

The TAM method has attracted interest due to various reasons. It enhances prediction accuracy when processing samples under high-dimensional multi-attribute data, which is higher than another baseline algorithm. Large datasets can be processed using a distributed architecture, and small portions of the original data sample can be extracted. The DSN method is also proposed to monitor the variance of the input data stream, and TAM overcomes issues encountered in sparse data streams.

Initially, the paper proposed a method to detect abnormal points in univariate time series based on extreme value optimization theory and optimized the threshold to predict abnormal points by fitting GPD (Generalized Pareto Distribution). Given the characteristics of sparse category data flow, this paper proposes solutions to these problems at both the data augmentation level with the Temporal Augmentation Module (TAM) (see Fig. 3) and the network architecture level with the Dynamic Separation Network (DSN) (see Fig. 4). Each state probability distribution $\varphi_{i} \left( 0 \right)$ is applied vertically on the actual space with dimension i and applied horizontally on the auxiliary space with dimension $\overline{i}$.

Figures 1 and 2 visually demonstrate the solution to the random fluctuations of the DSN data stream. The DSN is trained by sequentially inputting four sets of data streams with progressively increasing variance. First, the VDM is closed and set as a constant, as shown in Fig. 1. Then, under the same conditions, the VDM is opened, as shown in Fig. 2. The two attribute data streams with the largest weights are retrained and extracted, and after unitizing their partial derivatives, the distribution patterns of the samples under these two attributes are examined. As shown in Fig. 1, the characteristics of the data stream are distributed into four groups based on sample variances ranging from small to large. The horizontal axis represents the coordinate value of the attribute with the largest sample weight, while the vertical axis represents the coordinate value of the attribute with the next largest sample weight. The heat map indicates the size of the partial derivative, where positive numbers indicate the direction of attracting sparse samples, while negative numbers represent the direction of repelling sparse samples. It can be observed that when the sample variance is small, the data stream of the two main attributes shows a significant correlation pattern, and the boundary between attraction and repulsion is clear. However, as the variance increases, this boundary becomes complicated because the larger sample variance will evenly spread the two samples that were originally far away, and the boundaries of the two samples will overlap, making it difficult to distinguish between them.

As shown in Fig. 2, the variance factor will cause the dynamic separation loss function to more strictly distinguish between different samples on the gradient. After the variance at the corresponding position in Fig. 1 is affected by the variance detection factor, the distribution of important attributes presents a clearer pattern boundary once again. This is because the large variance detected by the variance detection factor influences the gradient of the loss function to different samples, making it more repulsive and overcoming the blurring of boundaries.

1.1 Novelty of Our Proposed Method

Recently data stream has gained significant needs due to it collecting large amounts of data with higher speed. Efficient data stream prediction provides many advantages in several fields such as financial data prediction, security prediction, and electricity demand prediction. DSN approach is one of the suitable approaches for data streaming because of its learning capacity for solving problems. However, this approach has a high computational cost due to the presence of high-velocity streams. Therefore the TAM approach is applied to minimize the computational cost and time and maintain the high accuracy for prediction.

1.1.1 DSN Approach

It detects the variability of the input data stream through the network and dynamically adjusts the discriminant level of the loss function to the samples according to the stochasticity of the input samples at the micro-scale. The supervised learning approach of DSN is used to generate label information in each block to classify the data stream. It monitors the variability of the input data stream and is also known as a deep convergent network. The combination of TAM and DSN creates a sparse data stream-type prediction path. The sparse data stream is augmented in TAM and then the normal (non-sparse) data stream is detected by DSN.

1.1.2 Reduction of Computational Time

This paper proposes a method for multiplying models in the time dimension. It uses the state transition properties of the data stream to probabilistically extrapolate the properties of static models. When training the data to get the best approach, it makes sparse features denser. Redundant data used for training is removed and handled more efficiently than sparse data. This helps avoid timing issues and reduce overfitting and performance.

1.1.3 Performance Evaluation

The proposed method is evaluated using three different datasets such as EURUSD foreign exchange data, CIFAR-10 dataset, and a professional dataset. The performance of the DSN approach is compared with other existing approaches such as SRAGL-AWCL, SAR-ATR, OCNN-DAE, and M-RNN.

1.2 Contribution

The main contributions of this paper are as follows:

A new temporal amplification method is proposed to expand the static model at a particular moment into continuous sample streams in the time dimension and realize the multiplicity of space samples using the data stream phase transformation method.
A dynamic separation neural network (DSN) is designed to monitor the variability of the input data stream through the VDM and dynamically adjust the strength of the difference between samples so that the network adapts to the random fluctuation of the current input sample at the micro level.
The pre-processed data is passed to the VDM for data multiplication, and the samples are sent to the VDM.
The test datasets are divided into two public datasets and a large-scale professional dataset to evaluate the prediction accuracy of the data stream.
Experiments show that this method achieves satisfactory results when compared to existing methods.

The rest of the paper is organized as follows: Sect. 2 represents recent literary works of different authors related to and Sect. 3 portrays the proposed time-series-based sample amplification methodology. Section 4 enumerates the experimental results and discussions. Finally, Sect. 5 concludes the paper with future scope.

In this section, the authors review several existing works related to the problem of enhancing the performance of learning techniques for time series data. The works mentioned in this section include methods for feature extraction, data augmentation, missing data validation, action recognition, data sparsity, and anomaly detection.

Tan et al. [11] introduced the Sparse Representation-based Adaptive Graph Learning and Adaptive Weighted Cooperative Learning (SRAGL-AWCL) technique to extract features from multi-view data and enhance diversity efficiently. This method overcomes the inadequacy of traditional techniques by improving the ability of learning techniques to represent internal geometric features. Thus, it guarantees better structural correlation among multiple views. However, it failed to consider the generalization ability of learning techniques. Agarwal et al. [12] presented a data augmentation method that utilizes sparse signal models to optimize the problem associated with enhancing the learning efficiencies of Synthetic Aperture Radar-based Automatic Target Recognition (SAR-ATR). Its performance was limited to a small neighborhood with a specified azimuth angle. Moreover, it struggles to learn the sequences of a unified strategy in the complicated domain, resulting in phase errors. Zhang et al. [13] discussed predicting the geological condition for tunnel boring machines by utilizing big operational data. They employed the Balanced Iterative Reducing and Clustering employing Hierarchies (BIRCH) technique to handle the big operational data. The K-means ++ algorithm was utilized to identify the potential rock mass classes with TBM operational data. The BIRCH method was applied to big TBM operational data to accurately detect, predict, and characterize the rock mass classes for efficient and safe tunneling. However, they did not utilize the machine vibration data for predicting the TBM boring conditions.

Yoon et al. [14] developed a Multi-directional Recurrent Neural Network (M-RNN) for validating missing data in temporal data streams. They employed medical datasets such as deterioration, biobank, MIMIC-III, UNOS-Lung, and UNOS-heart to determine the amount of missing data. Performance metrics such as RMSE, accuracy, and missing rate were used to evaluate the performance rate. The results showed improvements in all five datasets. Ullah et al. [15] presented an optimized convolutional neural network-based deep auto-encoder (OCNN-DAE) system for processing data streams in real-time. They used a CNN model to extract frame-level deep features and a DAE to learn temporal changes of action. They employed datasets such as UCF50, UCF101, HMDB51, and YouTube Actions for recognizing actions. Performance metrics such as accuracy, precision, recall, processing time, and F1-measure were used to measure the performance rate. The OCNN-DAE system showed the best outcomes for recognizing actions in surveillance data streams. However, the system was not applicable for multiple action recognition in the sequence of online video streams.

Zhong et al. [16] suggested using Random Erasing to randomly erase a rectangular area in a randomly selected image and replace pixel values with random values. Wen et al. [17] presented a concept to convert different period sequences into multiple time–frequency scales using the maximum overlap discrete wavelet transform. However, these data augmentation methods are not effective enough for time series data.

Ergen and Kozat [18] jointly trained and optimized the parameters of the LSTM architecture and the OC-SVM (or SVDD) algorithm for the first time to propose a decision function for an anomaly detector. Ting et al. [19] described population anomaly detection as point anomaly detection and used isolated distribution kernels to measure the similarity of two distributions. These methods have their advantages, but their effects on high-dimensional data streams are relatively unsatisfactory. This is because the data stream in the engineering system has already presented fairly high-level semantics, and it is hard to classify high-dimensional data streams by pattern clustering.

Wang and Qi [20] introduced CLSA to supervise retrieval queries for strong data augmentation by exploiting the distribution difference between weakly enhanced images and strongly enhanced images on the feature base. The generalization ability and data sparsity are complex problems, and the data stream has generated random fluctuations among the data [21]. The existing methods such as SRAGL-AWCL, SAR-ATR, OCNN-DAE, and M-RNN do not satisfy the generalization ability in multiple features of the data stream [22]. The Adaptive Infinite Dropout (AIDropout) [23] and Approximate Linear Dependence (ALD) [24] method increased the processing time, which has a volatility issue and is complex while training the data as well as containing noisy data that misleads the samples and causes sudden variation in the input data stream. However, the state-of-the-art methods do not apply to large datasets, and thus the complicated data stream can cause errors [25]. Data augmentation is not efficient in time series data, as well as increasing the time complexity. However, the random fluctuation that occurs in the industrial system should not be stable, and they can vary the variance [26]. However the proposed DSN method is suitable for processing complex samples and monitoring the variance of the data stream by VDM, which adjusts the strength of different samples, and the fluctuation that happens in the samples is modified. Azim et al. [27] introduced fully connected data description (FCDD) to replace the cluster centers in the data. They used CNN to extract the feature map of the image to locate abnormal regions effectively. However, the effects on high-dimensional data streams are relatively unsatisfactory.

The twin support vector machine (TSVM) approach was developed by Gupta et al. [28] to deal with nonstationary details and data for financial time series prediction. The accurate time series prediction enhances the financial decision. They used several time series datasets for financial to conduct the experimentations. The evaluation measures were utilized to validate the effectiveness of the TSVM approach. As a result, the TSVM approach was computationally faster than the other existing approaches. On the other hand, the cost of the model was very expensive.

Almeida et al. [29] discussed big data analysis because of its importance in supporting the decision-making process, providing knowledge, and optimizing the utilization of infrastructure, services, and resources. They developed a methodology to handle large amounts of data and provide valuable knowledge. The developed approach examines how the developed method was superior when applied to real-time prediction and classification. However, dealing with small data was difficult.

For forecasting financial time series, Bousbaa et al. [30] established data stream mining approaches. They used FOREX historical financial data to predict future values using a Particle Swarm Optimization (PSO) approach. Changes were found to reduce the amount of sliding and increased when the distribution was more stable. Consequently, superior performance was achieved by using flexible sliding windows for response and detection mode changes. On the other hand, when dealing with the classification problem, it required more resources.

The authors compared various existing works related to data streams for sparse categories. Some of the existing methods, such as SRAGL-AWCL, SAR-ATR, BIRCH, M-RNN, OCNN-DAE, and OC-SVM, were evaluated, and they attained better results. However, they have some limitations. Although the existing methods boost the ability of learning techniques with multiple views and improve performance, they do not consider the generalization capacity of the learning metrics and do not gain knowledge regarding the complicated domain. For example, according to BIRCH, the machine vibration data in the TBM did not accurately predict the boring conditions. On the other hand, the proposed DSN method monitors the variance of the initial data stream, which adjusts the variations and fluctuations of the input sample. It overcomes the issues obtained in sparse data streams, and the generalization ability obtained in different methods is proved to be effective.

3 Proposed Methodology

The current advanced deep learning methods have mature and stable end-to-end performance in extracting natural signal semantic patterns and are suitable for processing samples with complex attribute relationships. However, they are not suitable for data with random fluctuations generated in industrial systems. The data flow generated in industrial systems has strong randomness, which is manifested in the abnormal fluctuation of some attribute values of the sample stream, resulting in a dramatic change in the sample variance during this period. Random fluctuations in input data make it difficult for deep neural networks to capture patterns. Therefore, this paper proposes a novel dynamic separation neural network (DSN) that monitors the variance of the input data stream through a variance detection module (VDM) and dynamically adjusts the strength of the distinction between samples so that the network adapts to the random fluctuation of the current input sample at the micro level. A brief elaboration of the proposed method is discussed as follows.

3.1 Augmentation of Data Streams

In engineering systems, samples are collected from sensors and input as a data stream. The data characteristics at each moment may differ, so amplification of sparse samples must consider the time factor. This paper proposes a method to amplify samples in the time dimension. It uses the state transition property of the data stream to probabilistically expand the attributes of static samples [31]. This expansion of static samples in the time dimension results in a sequence of sample streams, thus generating additional samples for sparse categories and achieving a balance in the number of training data categories.

3.1.1 Potabilization of Data Stream States

The sample attributes generated in industrial systems have various formats, and these formats need to be unified for easy processing and analysis. The data sources based on digital signals generate discrete data, so they need to be divided according to the value range and converted into different attribute value intervals. Each interval is called the state of a data stream [32]. The attribute values are divided into n subintervals with each one assigned a state vector flag.

Definition 1

The state vector of the data stream

$$ S = \left( {s_{1} ,s_{2} ,.....,s_{n} } \right). $$

(1)

$S$ is the state vector of a certain attribute data stream, and the element $s_{i} ,\,\,\,1 \le i \le n$ represents the i-th state of this data stream. If the data stream harmonizes with numerical data, then $s_{i}$ is the i-th division of the data stream; if the data stream corresponds to non-numeric data, then $s_{i}$ is the one-hot vector of the i-th division in all divisions, and the union of all elements in $S$ is equivalent to the range of the property, so $S$ is also the state space of this property.

Definition 2

State probability distribution of data stream

$$\phi \left( t \right) = \left\{ p\left[ x\left( t \right) = s_1 \right] {,p} \left[x\left( t \right) = s_2 \right] {, \ldots, p} \left[x\left( t \right) = s_n \right] \right\} = \left[ {\theta_1 \left( t \right), \theta_2 \left( t \right), \ldots, \theta_n \left( t \right)} \right]$$

(2)

where $\varphi \left( t \right)$ represent the probability distribution of the state vector at time t, $p\left[ {x\left( t \right) = s_{i} } \right]$ denotes the probability when the data stream $x\left( t \right)$ is in the state $s_{i}$, and element $\theta_{i} \left( t \right),1 \le i \le n$ represents the probability value of the data stream falling in state $s_{i}$ at time t. Described as a probability distribution, it provides a formal interface for subsequent random process analysis.

3.1.2 State Transition of the Data Stream

Each attribute value in the data stream changes dynamically with time, so conditional probability can be used to describe this changing process.

Definition 3

State transition probability of data stream

$$ p_{ij} \left( t \right) = p\left[ {x\left( {0 + t} \right) = s_{j} |x\left( 0 \right) = s_{i} } \right]\,,\,t \ge 0,i,j \ge 0 $$

(3)

where $p_{ij} \left( t \right)$ represents the probability that the data stream reaches the state $s_{j}$ from the state $s_{i}$ at time 0 after t time units. It describes the possibility that the data stream of a certain attribute changes dynamically between states, so it is a dynamic probability. From definition 1, it can be known that the data stream of each attribute has multiple states. To completely record the information on the mutual transfer of the data stream between all states, it is necessary to combine the state vector with the state transition probability. Therefore, this paper constructs a dynamic transfer matrix.

Definition 4

Dynamic state transfer matrix (DTM)

$$ P\left( t \right) = \left( {\begin{array}{*{20}c} {p_{11} \left( t \right)} & {p_{12} \left( t \right)} & \cdots & {p_{1k} \left( t \right)} \\ {p_{21} \left( t \right)} & {p_{22} \left( t \right)} & \cdots & {p_{2k} \left( t \right)} \\ \vdots & \vdots & \ddots & \vdots \\ {p_{n1} \left( t \right)} & {p_{n2} \left( t \right)} & \cdots & {p_{nn} \left( t \right)} \\ \end{array} } \right) $$

(4)

where, $P\left( t \right)$ represents the dynamic transfer matrix, and the meaning of its element $p_{ij} \left( t \right)$ in a row $i$, the column $j$ is consistent with definition 3. The dynamic transfer matrix describes the probability of the data stream transitioning between all possible states after $t$ time units [16]. Based on a random process, the state probability distribution and dynamic transfer matrix of attribute values are a priori, which can be obtained by the statistical frequency of historical data. According to Definition 4, the probability distribution of a data stream at any time point is compared to the distribution of former unit time, which is proved as follows:

$$\begin {aligned}& \phi \left( t \right)\cdot P\left( 1 \right) = \left[ {{\theta _1}\left( t \right),{\theta _2}\left( t \right), \ldots ,{\theta _n}\left( t \right)} \right]\cdot\left( {\begin{array}{*{20}{c}} {{p_{11}}\left( 1 \right)}&{{p_{12}}\left( 1 \right)}& \cdots &{{p_{1k}}\left( 1 \right)}\\ {{p_{21}}\left( 1 \right)}&{{p_{22}}\left( 1 \right)}& \cdots &{{p_{2k}}\left( 1 \right)}\\ \vdots & \vdots & \ddots & \vdots \\ {{p_{n1}}\left( 1 \right)}&{{p_{n2}}\left( 1 \right)}& \cdots &{{p_{nn}}\left( 1 \right)} \end{array}} \right) \\ & = \left[ {\mathop \sum \limits_{i = 1}^n {\theta _i}\left( {t + 1} \right) \cdot {p_{i1}}\left( 1 \right),\mathop \sum \limits_{i = 1}^n {\theta _i}\left( {t + 1}t \right) \cdot {p_{i2}}\left( 1 \right), \ldots ,\mathop \sum \limits_{i = 1}^n {\theta _i}\left( {t + 1}\right) \cdot {p_{in}}\left( 1 \right)} \right] \\ & = \left[ {{\theta _1}\left( {t + 1} \right),{\theta _2}\left( {t + 1} \right), \ldots ,{\theta _n}\left( {t + 1} \right)} \right] = \phi \left( {t + 1} \right) \end{aligned}$$

(5)

Equation (5) shows that the data stream divided by state reflects the Markov property, so the state probability distribution of the data stream in discrete states constitutes a Markov chain, and Eq. (5) defines the recursive form of the state transition of the data stream, which can be deduced Theorem 1.

Theorem 1

Timing Augmentation of Data Stream State

$$\phi \left( {t + m} \right) = \phi \left( t \right) \cdot {\left[ {P\left( 1 \right)} \right]^m}$$

(6)

where $\varphi \left( {t + m} \right)$ is the probability distribution of the data stream state at the time $t + m$, and $\left[ {P\left( 1 \right)} \right]^{m}$ represents the $m$ power of the dynamic transfer matrix that has experienced 1-time unit. The proof is as follows:

$$\begin {aligned} &{\phi _{ij}}\left( {t + m} \right) = p\left\{ {x\left( {t + m} \right) = {s_j}{\rm{|}}x\left( 0 \right) = {s_i}} \right\} \\ & = \mathop \sum \limits_{k = 1}^n p\left\{ {x\left( {t + m} \right) = {s_j},x\left( t \right) = {s_j},x\left( t \right) = {s_k}{\rm{|}}x\left( 0 \right) = {s_i}} \right\} \\ & = \mathop \sum \limits_{k = 1}^n p\left\{ {x\left( {t + m} \right) = {s_j}{\rm{|}}x\left( t \right) = {s_k},x\left( 0 \right) = {s_i}} \right\}p\left\{ {x\left( t \right) = {s_k}{\rm{|}}x\left( 0 \right) = {s_i}} \right\}\\ & = \mathop \sum \nolimits_{(k = 1)}^n{p_{kj}}(m){p_{ik}}(t)\end {aligned} $$

(7)

where the derivation process of calculating the second row to the third row applied the non-after effect property of the Markov chain. $s_{i} ,s_{j}$ can take all the elements in $S$, so $\theta_{ij} \left( {t + m} \right)$ corresponds to the elements in the row $i$ and column $j$ in $\varphi \left( t \right)\left[ {P\left( 1 \right)} \right]^{m}$.

Theorem 1 gives a method to calculate the state distribution of the data stream for any long time in the future. It only needs to count the dynamic transfer matrix after 1-time unit to implement the operation. This method can obtain the ideal (unaffected by external factors) distribution of the data stream state at a certain time point t + m in the future only based on prior information.

3.1.3 Sparse Samples of Augmented Data Streams

A sparse data augmentation stream is used to balance the training sets of time series data. Data streaming methods are typically used in combination with machine learning approaches built around an artificial neural network to achieve better results. Among all networks, the DSN network provides the best performance through effective network management and advanced security. The sparsity of data does not contain actual data within data analysis. It contains a zero or empty value. It is different from the missing data in which the missing data does not exhibit the value range while the sparse data extracts the value. The sparsity is defined based on the amount of zero and null values. It converts the sparse features into dense ones while training the data to obtain a better approach. The redundant data employed for training is removed and handled in the sparse data more efficiently. It helps to avoid time complexity, overfitting, and reduction in performance. The timing enhancement method is designed according to Theorem 1. As shown in Fig. 3, the samples have attribute values, and each attribute is a data stream $x_{\left( i \right)} \left( 0 \right),1 \le i \le a\,,a \in Z^{ + }$. The sample is passed into the Temporal Augmentation Module (TAM). In TAM, the data stream $x_{\left( i \right)} \left( 0 \right)$ is first converted into the corresponding attribute value state probability distribution $\varphi_{i} \left( 0 \right)$, and then the temporal transfer probability from $t = 1$ to $t = t_{{k_{1} }}$ is applied by the temporal transfer matrix to generate the corresponding $t_{{k_{1} }}$ estimated state distributions $\varphi_{i} \left( {t_{k} } \right),1 \le i \le a,1 \le k \le k_{1} \,,\,k_{1} \in Z^{ + }$. By analogy, pass all the attribute data streams of the sample into the TAM, take the state corresponding to the maximum probability value according to formula 7, and then unitize the maximum value of the function as the output.

$$AS\left( {{t_i}} \right) = \frac{{\left[ {argmax\left( {{\phi _1}\left( {{t_i}} \right)} \right),argmax\left( {{\phi _2}\left( {{t_i}} \right)} \right), \ldots ,argmax\left( {{\phi _a}\left( {{t_{{k_1}}}} \right)} \right)} \right]}}{\Vert{{\left[ {argmax\left( {{\phi _1}\left( {{t_i}} \right)} \right),argmax\left( {{\phi _2}\left( {{t_i}} \right)} \right), \ldots ,argmax\left( {{\phi _a}\left( {{t_{{k_1}}}} \right)} \right)} \right]}}\Vert}, i \in \left[ {1,{t_{{k_1}}}} \right],$$

(8)

$t_{{k_{1} }}$ unitized enhanced sample can be obtained, which is denoted as $AS\left( {t_{i} } \right)$ when the source of the sample is not distinguished.

This paper stipulates that when the input sample belongs to the sparse category, the enhanced sample is denoted as $\nu_{i} ,i = 1,2,.....n$. if the input sample is from the non-sparse category, the enhanced sample is denoted as $u_{i} ,i = 1,2,.....n$. Amplifying samples in time series has two functions: the first is to supplement the number of sparse samples. High-density augmentation of categories with a small sample size ratio can balance the number of categories of training samples. The second is to generate dynamic statistical features of the sample stream, and these statistical features are involved in regulating the ability of the dynamic loss function to separate similar negative samples.

After sample x is input, it is processed by the TAM timing enhancement module and then enters the DTM module for data enhancement, and then outputs the attribute value v of the position corresponding to the maximum probability in each distribution.

3.2 Dynamic Separation Network

This section proposes a dynamic separation network structure, which detects the variance of the input data stream through the network, and dynamically adjusts the degree of discrimination of the loss function to the samples to adapt to the random volatility of the input samples on a micro level. The DSN approach aims to provide high performance in data streaming prediction, allowing the network to receive data at high speed. The DSN is a supervised learning approach that is used to generate label information in each module for classifying the data stream. It monitors the variance of the input data stream and it is also called a deep convex network. The neural network loss function is determined by the dynamic separation function. When it attains the requirement the number of employed layers efficiently determines the depth of the data stream and thus this method obtains the prediction output after completing the training process. The DSN is generated for sparse regularization for the deep learning approach. The DSN is designed by the input layer and output layer as well as contains a huge number of modules. Every module has separate characteristics and they evaluate with various layers for adjusting the strength of the module and adapting the random fluctuations. Figure 4 presents the framework of a dynamic separation network for data streams. As shown in Fig. 4, DSN consists of three parts: a data input layer, a variance detection module, and a neural network structure.

In the data input layer, each batch consists of the amplification of sparse samples and the normal samples (non-sparse) of the corresponding time to form a sequence of data pairs. The variance detection factor calculates the variance from the input samples and then passes the variance information to the dynamic separation loss function. Finally, the neural network outputs the corresponding results according to the discrimination of the dynamic separation loss function [33].

3.2.1 Dynamic Separation Loss Function

The variance of the sample directly reflects the random dispersion degree of the sample stream. The core of the dynamic separation network is the dynamic separation loss function, which contains a variance detection factor, that can adjust the ability of the loss function to distinguish samples according to the detected sample stream variance. The dynamic separation loss function needs to give positive feedback to this random dispersion phenomenon [33].

Definition 5

The general form of the dynamic separation loss function

$$ E\left( {\nu_{i} } \right) = - \log \frac{{e^{{\frac{{sim\left[ {f\left( {\nu_{i} } \right),\,f\left( {\nu_{j} } \right)} \right]\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {f\left( {u_{t} } \right) - \hat{f}\left( {u_{t} } \right)} \right\|^{2} } } \right)}}{\lambda }}} }}{{\sum\nolimits_{j = 1}^{N} {e^{{\frac{{sim\left[ {f\left( {\nu_{i} } \right),\,f\left( {u_{j} } \right)} \right]\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {f\left( {u_{t} } \right) - \hat{f}\left( {u_{t} } \right)} \right\|^{2} } } \right)}}{\lambda }}} + e^{{\frac{{sim\left[ {f\left( {\nu_{i} } \right),\,f\left( {\nu_{j} } \right)} \right]\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {f\left( {u_{t} } \right) - \hat{f}\left( {u_{t} } \right)} \right\|^{2} } } \right)}}{\lambda }}} } }} $$

(9)

where $E\left( {\nu_{i} } \right)$ represents the general form of the dynamic separation loss function, independent variables $\nu_{i} ,\,i = 1,\,2,\,...,\,N$ respectively represent samples from sparse samples, and independent variables $u_{j} ,\,j = 1,\,2,\,...,N$ represent amplified samples from non-sparse samples, respectively. The function $f\left( * \right)$ performs feature extraction on the independent variable and outputs the representation vector of the independent variable.$f\left( {\nu_{i} } \right)$ and $f\left( {\nu_{j} } \right)$ take the non-sparse sample feature.$f\left( {u_{t} } \right)$ takes non-sparse sample features.$\hat{f}\left( {u_{t} } \right)$ takes the mean of non-sparse sample features. The function $sim\left[ {f\left( {\nu_{i} } \right),f\left( {\nu_{j} } \right)} \right]$ and $sim\left[ {f\left( {\nu_{i} } \right),f\left( {u_{j} } \right)} \right]$ are to find the cosine similarity of two input representation vectors. $\frac{{\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {f\left( {u_{t} } \right) - \hat{f}\left( {u_{t} } \right)} \right\|^{2} } }}{\lambda }$ is the variance detection factor, which measures the variance of the input sample.$\lambda$ represents the variance separation coefficient, which is used to adjust the quantitative relationship between the sample variance and the degree of separation of the sample by the loss function.

Definition 6

The simplified form of the dynamic separation loss function

$$ l\left( {\nu_{i} } \right) = - \log \frac{{e^{{\frac{{sim\left( {\nu_{i} ,\,\nu_{j} } \right)\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} }}{{\sum\nolimits_{j = 1}^{N} {e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} + e^{{\frac{{sim\left( {\nu_{i} ,\,\nu_{j} } \right)\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } }} $$

(10)

where $l\left( {\nu_{i} } \right)$ represents the simplified dynamic separation loss function. At this time, the variance detection factor (referred to as variance factor) is simplified $\frac{{\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } }}{\lambda }$ and the meanings of other symbols are the same as those of Eq. 9.

Contrastive learning is used to enhance the performance of various networks by using samples that are obtained similarly among the attributes and the data. This paper makes use of the Dual-level Contrastive Learning Network (DCLN) which is formed by integrating the cross-domain intra-domain and contrastive learning strategies. Contrastive learning is used to extract the dynamic visualization of the learning approach. It generates a large amount of data and reduces the variations that occur between the data. It ignores similar data and obtains only the samples determined with different instances. It also shows the differences between the low-dimensional and high-dimensional data samples. The dynamic separation loss function borrows ideas from the standard contrastive loss function. Minimizing it closes the distance between sparsely amplified samples and keeps those sparsely amplified samples farther away from other non-sparse (original or amplified) samples. The dynamic separation loss function has no temperature coefficient, but instead a variance detection factor. It is a function on non-sparse data streams. When the input data stream has large random fluctuations, the variance increases, and the variance detection factor will output a dependent variable proportional to the variance. Among sparse samples, the augmented ones have a stronger mutual attraction in a high variance environment; for non-sparse samples, once they tend to fall near the sparse samples, they will experience stronger mutual repulsion and move away from the sparse samples.

3.3 The Principle of Variance Detection Module (VDM)

To determine the influence mechanism of variance detection factor on the loss function, this paper first calculates the gradient of the similarity of $l\left( {\nu_{i} } \right)$ uses $l\left( {\nu_{i} } \right)$ to calculate the derivatives of $sim\left( {\nu_{i} ,\,\nu_{j} } \right)$ and $sim\left( {v_{i} ,\,u_{j} } \right)$ respectively, as well as deduces the reason for the formation of the mechanism and obtains the quantitative relationship between $l\left( {\nu_{i} } \right)$ and sample similarity.

The function of $l\left( {\nu_{i} } \right)$ concerning to $sim\left( {\nu_{i} ,\,\nu_{j} } \right)$ is as follows:

$$ \begin{aligned} \frac{{\partial l\left( {\nu_{i} } \right)}}{{\partial sim\left( {\nu_{i} ,\,\nu_{j} } \right)}} & = - \frac{\partial }{{\partial sim\left( {\nu_{i} ,\,\nu_{j} } \right)}}\left[ \begin{gathered} \frac{{sim\left( {\nu_{i} ,\,\nu_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda } \hfill \\ - \log \left( {\sum\limits_{j = 1}^{N} {e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} + e^{{\frac{{sim\left( {\nu_{i} ,\,\nu_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } } \right) \hfill \\ \end{gathered} \right] \\ & = - \frac{{\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda } + \frac{1}{{\sum\limits_{j = 1}^{N} {e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} + e^{{\frac{{sim\left( {\nu_{i} ,\,\nu_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } }} \\ & \quad \left( {0 + \frac{{\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }e^{{\frac{{sim\left( {\nu_{i} ,\,\nu_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } \right) \\ & = - \frac{{\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }\left[ {\frac{{\sum\nolimits_{j = 1}^{N} {e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } }}{{\sum\nolimits_{j = 1}^{N} {e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } + e^{{\frac{{sim\left( {\nu_{i} ,\,\nu_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} }}} \right] \\ \end{aligned} $$

(11)

The derivative of $l\left( {\nu_{i} } \right)$ concern $sim\left( {\nu_{i} ,\,u_{j} } \right)$ is as follows:

$$ \begin{aligned} \frac{{\partial l\left( {\nu_{i} } \right)}}{{\partial sim\left( {\nu_{i} ,\,u_{j} } \right)}} & = - \frac{\partial }{{\partial sim\left( {\nu_{i} ,\,u_{j} } \right)}}\left[ \begin{gathered} \frac{{sim\left( {\nu_{i} ,\,\nu_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda } \hfill \\ - \log \left( {\sum\limits_{j = 1}^{N} {e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} + e^{{\frac{{sim\left( {\nu_{i} ,\,\nu_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } } \right) \hfill \\ \end{gathered} \right]\, \\ & = - \left( \begin{gathered} 0 - \frac{1}{{\sum\nolimits_{j = 1}^{N} {e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} + e^{{\frac{{sim\left( {\nu_{i} ,\,\nu_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } }} \hfill \\ \,\,\,\,\,\,\,\frac{\partial }{{\partial sim\left( {\nu_{i} ,\,u_{j} } \right)}}\left( {\sum\limits_{j = 1}^{N} {e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} + e^{{\frac{{sim\left( {\nu_{i} ,\,\nu_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } } \right) \hfill \\ \end{gathered} \right) \\ & = \frac{{\left( {\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)} \right)}}{\lambda }\left[ {\frac{{e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} }}{{\sum\nolimits_{j = 1}^{N} {e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} + e^{{\frac{{sim\left( {\nu_{i} ,\,\nu_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } }}} \right] \\ \end{aligned} $$

(12)

From Eqs. (11) and (12), it is estimated that the partial derivative of the similarity of different samples is related not only to the variance factor but also to the similarity of the different samples per se. The larger the sample variance or the greater the similarity, the larger the repulsion force of the non-sparse samples away from the sparse samples will be; the partial derivative of the similarity of the same samples is only related to the variance factor. The larger the sample variance, the greater the attraction between the same samples, and it can be seen from the numerator of formula (12) that the partial derivative and absolute value of the similarity of different samples are equal to the absolute value of the partial derivative of the similarity of the same sample. Formula (12) can be unitized by formula (8) to obtain the distribution of partial derivatives of the similarity of different samples.

In this paper, the relationship between distribution entropy and the variance detection factor, and the influence of the variance factor on the distribution of different samples can be obtained. H can be defined as follows:

$$ \begin{aligned} H & = - \sum\nolimits_{j = 1}^{N} {\left( {\frac{{e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} }}{{\sum\nolimits_{j = 1}^{N} {e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } }}\log \frac{{e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} }}{{\sum\nolimits_{j = 1}^{N} {e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } }}} \right)} \\ & = - \frac{{\sum\nolimits_{j = 1}^{N} {\left( {e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} \log \,e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } \right)} }}{{\sum\nolimits_{j = 1}^{N} {e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } }} \\ & \quad + \log \left( {\sum\nolimits_{j = 1}^{N} {e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } } \right) \\ \end{aligned} $$

(13)

The derivation of H concerning the variance factor is as follows,

$$ \begin{aligned} & \frac{\partial H}{{\partial \left[ {\frac{{\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }} \right]}} = \frac{\partial }{{\partial \left[ {\frac{{\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }} \right]}} \\ & \quad \left[ {\frac{{\sum\nolimits_{j = 1}^{N} {\left( {e^{{\frac{{sim\left( {v_{i} ,u_{j} } \right)\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} \log e^{{\frac{{sim\left( {v_{i} ,u_{j} } \right)\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } \right)} }}{{\sum\nolimits_{j = 1}^{N} {e^{{^{{\frac{{sim\left( {v_{i} ,u_{j} } \right)\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} }} } }} + \log \left( {\sum\limits_{j = 1}^{N} {e^{{\frac{{sim\left( {v_{i} ,u_{j} } \right)\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } } \right)} \right] \\ \end{aligned} $$

(14)

$$ = \frac{\lambda }{{\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } }} \cdot \frac{{\sum\nolimits_{j = 1}^{N} {\left( {\sqrt {e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } \,\log \left( {e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } \right)} \right)^{2} } }}{{\left( {\sum\nolimits_{j = 1}^{N} {e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } } \right)^{2} }} $$

(15)

$$ \le \frac{\lambda }{{\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}\frac{{\left( {\sum\nolimits_{j = 1}^{N} {e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } } \right)^{2} - \left( {\sum\nolimits_{j = 1}^{N} {e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } } \right)^{2} }}{{\left( {\sum\nolimits_{j = 1}^{N} {e^{{\frac{{sim\left( {\nu_{i} ,\,u_{j} } \right)\,\left( {\frac{1}{N}\sum\nolimits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}}{\lambda }}} } } \right)}} = 0 $$

(16)

where the entropy of partial derivative distribution similarity of different samples is denoted by H. In this paper, it can be found from Eq. 16 obtained by the partial derivative of this entropy that the partial derivative function monotonically decreases with respect to the variance detection factor. When the sample variance increases, the loss function tends to make the sparse samples with high similarity and normal samples experience a greater repulsion force, thereby offsetting the situation that the distance between the different samples is too close due to the increase of variance. The variance factor is the opposite. If the variance of the input sample is small, the entropy will become larger, and the repulsive force between samples will decrease so that the different samples can keep the original feature distribution position as much as possible. Finally, the dynamic separation loss function is assembled according to the set batch size to obtain the total loss function:

$$ L_{loss} = \frac{1}{M}\sum\limits_{i = 1}^{M} { - \log \frac{{e^{{{{sim\left( {v_{i} ,v_{j} } \right)g\left( {\frac{1}{N}\sum\limits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)} \mathord{\left/ {\vphantom {{sim\left( {v_{i} ,v_{j} } \right)g\left( {\frac{1}{N}\sum\limits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)} \lambda }} \right. \kern-0pt} \lambda }}} }}{{\sum\limits_{j = 1}^{N} {e^{{{\raise0.7ex\hbox{${sim\left( {v_{i} ,u_{j} } \right)g\left( {\frac{1}{N}\sum\limits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}$} \!\mathord{\left/ {\vphantom {{sim\left( {v_{i} ,u_{j} } \right)g\left( {\frac{1}{N}\sum\limits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)} \lambda }}\right.\kern-0pt} \!\lower0.7ex\hbox{$\lambda $}}}} + e^{{{\raise0.7ex\hbox{${sim\left( {v_{i} ,v_{j} } \right)g\left( {\frac{1}{N}\sum\limits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)}$} \!\mathord{\left/ {\vphantom {{sim\left( {v_{i} ,v_{j} } \right)g\left( {\frac{1}{N}\sum\limits_{t = 1}^{N} {\left\| {u_{t} - \overline{u}_{t} } \right\|^{2} } } \right)} \lambda }}\right.\kern-0pt} \!\lower0.7ex\hbox{$\lambda $}}}} } }}} $$

(17)

here, $L_{loss}$ represents the total loss function and $M$ is the amplified number of sparse samples.

The combination of TAM and DSN constitutes the prediction path of the category sparse data stream. The sparse data stream is augmented in TAM and then self-supervised learning in DSN with the normal (non-sparse) data stream. The Deep Neural Network (DNN)structure evaluates the high-level semantic patterns of the data stream and finally forms the prediction model of the data stream. The variance separation, and structure of TAM and DSN are obtained as input. The state vector is initialized to obtain the probability distribution and state transition probability. The data stream attribute is obtained to validate various measures. The two input representations are obtained to calculate the dynamic separation loss function. The algorithmic description process of the entire DSN is represented in algorithm 1.

Table 1 depicts various notations and their descriptions related to the data stream.

Table 1

Notations and description of the equations

Symbol	Description
$S$	State vector
$\varphi \left( t \right)$	Probability distribution of the state vector
$p_{ij} \left( t \right)$	Probability reaches the state
$P\left( t \right)$	Dynamic transfer matrix
$\varphi \left( {t + m} \right)$	Probability distribution of the data stream
$\left[ {P\left( 1 \right)} \right]^{m}$	$m$ power of the dynamic transfer matrix
$\theta_{ij} \left( {t + m} \right)$	elements in row $i$ and column $j$
$x_{\left( i \right)} \left( 0 \right)$	Data stream
$\varphi_{i} \left( 0 \right)$	State probability distribution
$t_{{k_{1} }}$	Unitized enhanced sample
$E\left( {\nu_{i} } \right)$	Dynamic separation loss function
$f\left( * \right)$	Feature extraction on the independent variable
$l\left( {\nu_{i} } \right)$	Simplified dynamic separation loss function
$H$	Entropy of the partial derivative distribution
$L_{loss}$	Total loss function
$M$	Amplified number of sparse samples

4 Experimental Results

The random volatility in the data stream is employed in the distribution variation of data generation and it is normally defined as virtual drift. The conditional distribution of the target variable is varied in different cases but it does not vary the data distribution. Furthermore, the monitoring of the random volatility of the input sample is determined without utilizing any training dataset and this enhanced the performance of prediction accuracy. The random volatility is predicted from the data distribution that enters into the data stream and this helps to detect the data quickly without the requirement of real-time data. However, random volatility is also applicable in the weighting strategy when it minimizes the deterioration of the accuracy. The experiment verifies the effectiveness of the dynamic separation network from three aspects. For experimental purposes, two public datasets namely EURUSD and CIFAR-10, and large-scale professional datasets like sensing datasets are utilized. The different existing approaches namely SRAGL-AWCL, SAR-ATR, OCNN-DAE, and M-RNN are applied to validate the effectiveness of the DSN approach.

4.1 Experimental Setup

The experimental hardware environment server is ubuntu 11.04, pytorch is 1.11.0, torch vision is 0.12.0, torch audio is 0.11.0, GPU is NVIDIA Geforce RTX 3090.

4.2 Parameter Settings

The proposed method is determined with various parameter settings described in Table 2.

Table 2

Parameter settings

Method	Parameter	Values
DSN	$\lambda$	10
	Learning rate	0.001
	Number of fully connected network	3
	Number of neurons	[64, 128, 256]
	Activation function	ReLU
	Output function	Softmax function
	Substation samples	10,000
	Charging samples	20,000
SAR-ATR	Bandwidth	80 MHz
	Activation function	ReLU
	Gaussian interpolation (D)	12
	Operating frequency	2.36 GHz
	Width of Gaussian $\left( {\sigma_{G} } \right)$	Constant
OCNN-DAE	First layer	15,000 dimension
	Learning rate	0.001
	Number of neurons	8000
	Number of Iteration	500
	Activation function	Softmax
	DAE epochs	4000
	$\alpha$	0.05

4.3 Dataset Description

A total of eight different time series datasets are used in this study obtained from the UCR repository. The length of the data stream refers to the number of points in each individual instance of the data stream. The experimental datasets are divided into two public datasets and a large-scale professional dataset. Public dataset 1 is the stock trading portfolio data and EURUSD foreign exchange trading data in Bloomberg database, which contains nearly 100,000 trading samples as of December 23, 2021, 15:59. The period of each sample is one hour, and the stock portfolio attributes are fixed at 36, and the foreign exchange attributes are 21. Stock and transaction data are typical multi-dimensional data streams, but stocks are very vulnerable to unexpected factors outside the system, while EURUSD foreign exchange data is relatively less affected by external factors and is relatively stable data. The time series length of the data set is 70.

Public dataset 2 is the CIFAR-10 dataset [23]. We use 50,000 images as a training set, 10,000 images as a testing set, and some CIFAR-100 samples are used as sparse values, which do not share any classes with CIFAR-10. LeNet [25] is used as the feature extractor, and batch normalization is added. The data set has the time series length of 1639.

The professional data set is the sensing data set of the whole network power monitoring system of China State Grid Jiangxi Province, involving multiple subsystems such as power production, power transmission, transformer distribution, and power supply. The sample size is 497,533, and the attributes vary from 31 to 429 according to the complexity of the subsystem. The total number of Production samples is 182,500 with 1290 features. The Transmission samples are 105,000 in total with 687 features. The total Dispatch samples are 7500 and the features are 245. The total number of Substation samples is determined as 10,000, as well as the number of features is obtained as 163. The number of charging samples is 20,000, and the features are 31.the time series length of the data is 500.

The optimization method adopts SGD, the learning rate is fixed to 0.001, and the neural network structure in DSN is set to a 3-layer fully connected network. The number of neurons is set to 64, 128, and 256, respectively. The activation function, output layers such as ReLU, and nonlinear function are set to the Softmax function.

For every data, the simulation is performed by 100 runs of 5000 timesteps based on the random initialization conditions and evaluates the prediction accuracy of the data stream. However, the initial condition is created to obtain a node with different states. The model parameters are processed by the introduction of grid search. For classification, various parameters are employed in DSN. For the extraction process, the batch size is fixed at 20. The stochastic gradient descent with the Adadelta update rule is obtained for the training process of the professional dataset.

4.4 Performance Analysis

This section tests the predictive ability of DSN to process sparse samples in data streams. Figure 5 depicts the comparative analysis for accuracy in the existing and proposed method. The existing SRAGL-AWCL, SAR-ATR, OCNN-DAE, and M-RNN methods attained prediction accuracy of 6%, 5.8%, 8.2%, 7.7%, and the proposed DSN method achieved 8.8%. Thus the results revealed that when compared with the existing methods the proposed enhanced the prediction accuracy and enhanced performance efficiently.

Table 3 delineates the comparison of various existing methods such as SRAGL-AWCL, SAR-ATR, OCNN-DAE, and M-RNN with the proposed DSN method to evaluate the prediction accuracy. The other existing methods lower the prediction accuracy but the proposed method maximized the prediction accuracy to obtain better performance analysis.

Table 3

Comparative analysis table of prediction accuracy for the proposed method

Methods	Accuracy (%)
SRAGL-AWCL	6
SAR-ATR	5.8
OCNN-DAE	8.2
M-RNN	7.7
Proposed DSN	8.8

The existing methods increased the processing time which has a volatility issue and is complex while training the data as well as containing the noisy data and this misleads the samples and cause sudden variation in the input data stream. However, the state-of-the-art methods do not apply to large datasets and thus the complicated data stream can able to cause errors. The data augmentation is not efficient in time series data as well as increasing the time complexity. However, the random fluctuation that occurs in the industrial system should not be stable and they can vary the variance. But the proposed DSN method is suitable for processing complex samples and monitors the variance of the data stream by VDM which adjusted the strength of different samples and the fluctuation that happens in the samples is modified. Thus the monitoring of various samples maximizes the prediction accuracy and improved the performance.

In the testing phase, rare samples and normal samples in the test set are randomly selected, and the ratio of samples after extraction is kept as 1:1000. The results are detailed in Table 4. As determined in Table 4, DSN performs well in prediction accuracy using each of the datasets. On professional datasets, the classification accuracy of DSN under multi-attribute (high-dimensional) models is significantly higher than other baseline algorithms. Thus, DSN outperforms other existing approaches with a better prediction rate of 0.11%. The performance analysis of the proposed Dynamic Separation Network (DSN) with different datasets is shown in Fig. 6.

Table 4

Dataset validation with existing and proposed methods

Data set	M-RNN	OCNN-DAE	BIRCH	SRAGL-AWCL	SAR-ATR	LSTM	FCDD	DSN
Stocks	0.652	0.573	0.769	0.793	0.687	0.738	0.809	0.834
EURUSD	0.689	0.648	0.781	0.843	0.714	0.751	0.812	0.861
Production	0.552	0.598	0.832	0.672	0.541	0.630	0.761	0.956
Transmission	0.627	0.653	0.845	0.836	0.550	0.641	0.862	0.961
Dispatch	0.705	0.725	0.893	0.878	0.689	0.733	0.792	0.950
Substation	0.769	0.740	0.927	0.963	0.702	0.885	0.691	0.964
Charging	0.918	0.892	0.948	0.942	0.711	0.904	0.918	0.949
CIFAR-10	0.734	0.823	0.576	0.829	0.847	0.815	0.950	0.961

4.4.1 Convergence

This section tests DSN convergence. An attribute of randomly selected samples is the label data stream. Set the status of the label data stream entering the 85% confidence interval to the "normal" label. At this time, the data streams of other associated attributes are combined into samples corresponding to the "normal" label, and the state of not entering the interval is set to the "sparse" label. The experimental training set and test set are divided by 8:2. The label in the training set is the state (sparse or normal) of the data stream at the calibration moment, and the sample data is the attribute value of all the data streams at the previous moment. The sparse samples in the special state are expanded to be consistent with the number of normal samples.

It can be seen from Fig. 6 that DSN shows good convergence, and with the increase in the number of iterations, the LOSS has a further downward trend. When there is no VDM, the high random fluctuation data stream causes convergence instability to LOSS. The addition of the VDM module can significantly suppress the volatility of LOSS during training, indicating that the VDM plays a role in stabilizing data stream training.

The outcomes of convergence loss analysis are shown in Fig. 7. It evaluates the proposed DSN and existing methods such as SRAGL-AWCL, SAR-ATR, OCNN-DAE, and M-RNN based on different iterations with loss functions. In this evaluation, the proposed method's convergence loss analysis is minimized when compared with other methods to improve accuracy and efficiency.

Figure 8 depicts the prediction accuracy ablation results based on production, transmission, dispatch, and substation and charging by restricting TAM or VDM. The validation of these methods diminished the DSN prediction accuracy to enhance the efficiency.

Figure 9 describes the time complexity for the proposed and existing methods. The existing methods such as SRAGL-AWCL, SAR-ATR, OCNN-DAE, and M-RNN methods are determined based on the data streams and the validation results revealed that the computation time of these methods are increased and the proposed DSN method reduced the complexity thus improved the performance.

4.4.2 Ablation Experiment

This section describes experiments that test the effect of removing the Temporal Augmentation Module (TAM) or the Variance Detection Module (VDM) from DSN. Specifically, the TAM is disabled, meaning that no temporal amplification is performed on sparse samples. Similarly, the VDM is disabled, meaning that it is set to a constant value. This paper examines how the prediction accuracy of DSN changes when these modules are disabled. The experimental results for the power dataset are shown in Table 5.

Table 5

Results of ablation experiments

Subsystem	TAM− VDM+	TAM+ VDM−	TAM− VDM−	TAM+ VDM+
Production	0.674	0.803	0.613	0.956
Transmission	0.643	0.792	0.626	0.961
Dispatch	0.685	0.785	0.580	0.950
Substation	0.693	0.700	0.621	0.964
Charging	0.662	0.829	0.653	0.947

As shown in Table 5, the minus sign represents the cancellation of the left module, and the plus sign represents the reservation of the left module. It can be found that after completely removing TAM and VDM, the prediction accuracy of the model drops sharply from 0.95 to around 0.6. This shows that TAM and VDM are crucial to the performance of the model. Only removing TAM or only removing VDM will significantly reduce the prediction accuracy of DSN, indicating that both the time series amplification of samples and the dynamic separation loss function are keys to the data stream prediction performance of DSN.

4.5 Statistical Analysis

The statistical results obtained from the experiments are analyzed by Friedman testing to accurately investigate the hypothesis of our proposed method. The DSN approach leads the ranking with a high score difference with other existing approaches such as SRAGL-AWCL, SAR-ATR, OCNN-DAE and M-RNN. If the p value (< 0.001) is less than the significance level (0.005), the null hypothesis is rejected. The results of the Friedman test are depicted in Table 6.

Table 6

Friedman test ranking

Methods	Value
SRAGL-AWCL	3.541
SAR-ATR	4.423
OCNN-DAE	5.367
M-RNN	6.160
Proposed DSN	1.056

5 Conclusion

Data stream approaches have evolved largely because mass data cannot be generated and sorted. Data streams with multiple time series are more difficult to mine in several applications such as medicine, finance, and environmental monitoring. Data streaming prediction has become a significant task in many industries that need to make real-time decisions based on incoming details. Neural networks are a well-suited technique for streaming visualization due to their incremental learning nature. Therefore, this paper proposes a novel dynamic separation neural network (DSN) for data stream analysis based on sparse categories. The proposed method overcomes the limitations of existing methods and achieves better performance on various datasets. The DSN monitors the variance of the input data stream through the variance detection module (VDM) and dynamically adjusts the strength of the distinction between samples using the temporal augmentation module (TAM) to adapt to the random fluctuations in the input data. The ablation experiment confirms the importance of both TAM and VDM in achieving high prediction accuracy of 8.8% for the DSN. These results demonstrate that the DSN approach adapts to changes in the data without the need for any explicit detection approaches. Therefore, the positive results of the conducted experimental analysis were helpful in emphasizing the use of DSN in the streaming literature. However, data sparsity is still a challenging problem, and the generalization ability of the proposed method needs to be further proven. One possible future direction is to explore the unique attributes of the data and develop algorithms that can better handle sparse categories in various scenarios. Another potential direction is to improve the internal structure of the network to enhance its robustness and adaptability to different types of data streams. Moreover, the proposed DSN method can be extended to other applications, such as anomaly detection, prediction, and classification in various domains. It can also be combined with other advanced deep learning techniques such as reinforcement learning, transfer learning and meta-learning to further improve its performance. In summary, this paper proposes a novel approach to address the challenges of sparse categories in data stream analysis, and the experimental results demonstrate its effectiveness. The proposed method opens up new avenues for future research and has the potential to be applied in various fields.

Declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Not applicable.

Ethical Approval

This article does not contain any studies with human participants.

Human and Animal Rights

This article does not contain any studies with human or animal subjects performed by any of the authors.

Informed consent was obtained from all individual participants included in the study.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

previous article Conv-Attention: A Low Computation Attention Calculation Method for Swin Transformer

next article PCTDepth: Exploiting Parallel CNNs and Transformer via Dual Attention for Monocular Depth Estimation

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (RAR 1 kb)

Yang J, Cao J, Liu Y (2022) Deep learning-based destination prediction scheme by trajectory prediction framework. Secur Commun Netw 2022:1–8

Li Y, Zheng L, Lops M, Wang X (2019) Interference removal for radar/communication co-existence: the random scattering case. IEEE Trans Wirel Commun 18(10):4831–4845CrossRef

Wen Y, Chen T, Wang J, Zhang W (2019) Pairwise multi-layer nets for learning distributed representation of multi-field categorical data. In: Proceedings of the 1st international workshop on deep learning practice for high-dimensional sparse data, pp 1–8

Hu K, Wang J, Liu Y, Chen D (2019) Automatic feature engineering from very high dimensional event logs using deep neural networks. In: Proceedings of the 1st international workshop on deep learning practice for high-dimensional sparse data, pp 1–9

Zeng Y, Cao H, Ouyang Q, Qian Q (2021) Multi-task learning and data augmentation for negative thermal expansion materials property prediction. Mater Today Commun 27:102314CrossRef

Iosifidis V, Ntoutsi E (2020) Sentiment analysis on big sparse data streams with limited labels. Knowl Inf Syst 62(4):1393–1432CrossRef

Kómár P, Kalinic M (2020) Denoising DNA encoded library screens with sparse learning. ACS Comb Sci 22(8):410–421CrossRefPubMed

Waheed H, Anas M, Hassan SU, Aljohani NR, Alelyani S, Edifor EE, Nawaz R (2021) Balancing sequential data to predict students at-risk using adversarial networks. Comput Electr Eng 93:107274CrossRef

Wang B, Niu H, Zeng J, Bai G, Lin S, Wang Y (2020) Latent representation learning model for multi-band images fusion via low-rank and sparse embedding. IEEE Trans Multimed 23:3137–3152CrossRef

10.

Li T, Zuo R, Xiong Y, Peng Y (2021) Random-drop data augmentation of deep convolutional neural network for mineral prospectivity mapping. Nat Resour Res 30:27–38CrossRef

11.

Tan J, Yang Z, Cheng Y, Ye J, Wang B, Dai Q (2021) SRAGL-AWCL: a two-step multi-view clustering via sparse representation and adaptive weighted cooperative learning. Pattern Recogn 117:107987CrossRef

12.

Agarwal T, Sugavanam N, Ertin E (2020) Sparse signal models for data augmentation in deep learning ATR. In: 2020 IEEE radar conference (RadarConf20). IEEE, pp 1–6

13.

Zhang Q, Liu Z, Tan J (2019) Prediction of geological conditions for a tunnel boring machine using big operational data. Autom Constr 100:73–83CrossRef

14.

Yoon J, Zame WR, van der Schaar M (2018) Estimating missing data in temporal data streams using multi-directional recurrent neural networks. IEEE Trans Biomed Eng 66(5):1477–1490CrossRefPubMed

15.

Ullah A, Muhammad K, Haq IU, Baik SW (2019) Action recognition using optimized deep autoencoder and CNN for surveillance data streams of non-stationary environments. Future Gener Comput Syst 96:386–397CrossRef

16.

Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, no 07, pp 13001–13008

17.

Wen Q, He K, Sun L, Zhang Y, Ke M and Xu H (2021) RobustPeriod: robust time-frequency mining for multiple periodicity detection. In: Proceedings of the 2021 international conference on management of data, pp 2328–2337

18.

Ergen T, Kozat SS (2019) Unsupervised anomaly detection with LSTM neural networks. IEEE Trans Neural Netw Learn Syst 31(8):3127–3141MathSciNetCrossRefPubMed

19.

Ting KM, Xu BC, Washio T and Zhou ZH (2020) Isolation distributional kernel: a new tool for kernel based anomaly detection. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 198–206

20.

Wang X, Qi GJ (2022) Contrastive learning with stronger augmentations. IEEE Trans Pattern Anal Mach Intell 45:5549–5560

21.

Lutfhi A (2022) The effect of layer batch normalization and droupout of CNN model performance on facial expression classification. JOIV Int J Inform Vis 6(22):481–488

22.

Xian X, Zhang C, Bonk S, Liu K (2021) Online monitoring of big data streams: a rank-based sampling algorithm by data augmentation. J Qual Technol 53(2):135–153CrossRef

23.

Hilal A, Arai I, El-Tawab S (2021) DataLoc+: a data augmentation technique for machine learning in room-level indoor localization. In: 2021 IEEE wireless communications and networking conference (WCNC). IEEE, pp 1–7

24.

Zan X, Wang D, Xian X (2022) Spatial rank-based augmentation for nonparametric online monitoring and adaptive sampling of big data streams. Technometrics 65:1–14MathSciNet

25.

Nguyen H, Pham H, Nguyen S, Van Linh N, Than K (2022) Adaptive infinite dropout for noisy and sparse data streams. Mach Learn 111(8):3025–3060MathSciNetCrossRef

26.

Coelho DN, Barreto GA (2022) A sparse online approach for streaming data classification via prototype-based kernel models. Neural Process Lett 54:1679–1706CrossRef

27.

Azim E, Wang D, Fu Y (2023) Deep graph stream SVDD: anomaly detection in cyber-physical systems. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Cham, pp 83–95

28.

Gupta D, Pratama M, Ma Z, Li J, Prasad M (2019) Financial time series forecasting using twin support vector regression. PLoS ONE 14(3):e0211402CrossRefPubMedPubMedCentral

29.

Almeida A, Brás S, Sargento S, Pinto FC (2023) Time series big data: a survey on data stream frameworks, analysis and algorithms. J Big Data 10(1):83CrossRefPubMedPubMedCentral

30.

Bousbaa Z, Sanchez-Medina J, Bencharef O (2023) Financial time series forecasting: a data stream mining-based system. Electronics 12(9):2039CrossRef

31.

Shorten C, Khoshgoftaar TM, Furht B (2021) Text data augmentation for deep learning. J Big Data 8:1–34CrossRef

32.

Masud M, Gao J, Khan L, Han J, Thuraisingham BM (2010) Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans Knowl Data Eng 23(6):859–874CrossRef

33.

Guan J, Meng M, Liang T, Liu J, Wu J (2022) Dual-level contrastive learning network for generalized zero-shot learning. Vis Comput 38(9–10):3087–3095CrossRef

Title: A Time-Series-Based Sample Amplification Model for Data Stream with Sparse Samples
Authors: Juncheng Yang
Wei Yu
Fang Yu
Shijun Li
Publication date: 01-04-2024
Publisher: Springer US
Published in: Neural Processing Letters / Issue 2/2024
Print ISSN: 1370-4621
Electronic ISSN: 1573-773X
DOI: https://doi.org/10.1007/s11063-024-11453-y

Symbol	Description
\(S\)	State vector
\(\varphi \left( t \right)\)	Probability distribution of the state vector
\(p_{ij} \left( t \right)\)	Probability reaches the state
\(P\left( t \right)\)	Dynamic transfer matrix
\(\varphi \left( {t + m} \right)\)	Probability distribution of the data stream
\(\left[ {P\left( 1 \right)} \right]^{m}\)	\(m\) power of the dynamic transfer matrix
\(\theta_{ij} \left( {t + m} \right)\)	elements in row \(i\) and column \(j\)
\(x_{\left( i \right)} \left( 0 \right)\)	Data stream
\(\varphi_{i} \left( 0 \right)\)	State probability distribution
\(t_{{k_{1} }}\)	Unitized enhanced sample
\(E\left( {\nu_{i} } \right)\)	Dynamic separation loss function
\(f\left( * \right)\)	Feature extraction on the independent variable
\(l\left( {\nu_{i} } \right)\)	Simplified dynamic separation loss function
\(H\)	Entropy of the partial derivative distribution
\(L_{loss}\)	Total loss function
\(M\)	Amplified number of sparse samples

Method	Parameter	Values
DSN	\(\lambda\)	10
	Learning rate	0.001
	Number of fully connected network	3
	Number of neurons	[64, 128, 256]
	Activation function	ReLU
	Output function	Softmax function
	Substation samples	10,000
	Charging samples	20,000
SAR-ATR	Bandwidth	80 MHz
	Activation function	ReLU
	Gaussian interpolation (D)	12
	Operating frequency	2.36 GHz
	Width of Gaussian \(\left( {\sigma_{G} } \right)\)	Constant
OCNN-DAE	First layer	15,000 dimension
	Learning rate	0.001
	Number of neurons	8000
	Number of Iteration	500
	Activation function	Softmax
	DAE epochs	4000
	\(\alpha\)	0.05

Springer Professional

Abstract

Supplementary Information

Publisher's Note

1 Introduction

1.1 Novelty of Our Proposed Method

1.1.1 DSN Approach

1.1.2 Reduction of Computational Time

1.1.3 Performance Evaluation

1.2 Contribution

2 Review of Related Works

3 Proposed Methodology

3.1 Augmentation of Data Streams

3.1.1 Potabilization of Data Stream States

3.1.2 State Transition of the Data Stream

3.1.3 Sparse Samples of Augmented Data Streams

3.2 Dynamic Separation Network

3.2.1 Dynamic Separation Loss Function

3.3 The Principle of Variance Detection Module (VDM)

4 Experimental Results

4.1 Experimental Setup

4.2 Parameter Settings

4.3 Dataset Description

4.4 Performance Analysis

4.4.1 Convergence

4.4.2 Ablation Experiment

4.5 Statistical Analysis

5 Conclusion

Declarations

Conflict of interest

Consent to Participate

Consent for Publication

Ethical Approval

Human and Animal Rights

Informed Consent

Publisher's Note

Supplementary Information

Other articles of this Issue 2/2024

CNN-based Methods for Offline Arabic Handwriting Recognition: A Review

A New Optimization Model for MLP Hyperparameter Tuning: Modeling and Resolution by Real-Coded Genetic Algorithm

APRE: Annotation-Aware Prompt-Tuning for Relation Extraction

Image Deblurring Using Feedback Mechanism and Dual Gated Attention Network

A Novel Boundary-Guided Global Feature Fusion Module for Instance Segmentation

Stability Analysis of Deep Belief Network: Based SD-AR Model for Nonlinear Time Series