Top

The Journal of Supercomputing

Published in:

Open Access 06-06-2023

Temporal fusion transformer-based prediction in aquaponics

Authors: Ahmet Metin, Ahmet Kasif, Cagatay Catal

Published in: The Journal of Supercomputing | Issue 17/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

Aquaponics offers a soilless farming ecosystem by merging modern hydroponics with aquaculture. The fish food is provided to the aquaculture, and the ammonia generated by the fish is converted to nitrate using specialized bacteria, which is an essential resource for vegetation. Fluctuations in the ammonia levels affect the generated nitrate levels and influence farm yields. The sensor-based autonomous control of aquaponics can offer a highly rewarding solution, which can enable much more efficient ecosystems. Also, manual control of the whole aquaponics operation is prone to human error. Artificial Intelligence-powered Internet of Things solutions can reduce human intervention to a certain extent, realizing more scalable environments to handle the food production problem. In this research, an attention-based Temporal Fusion Transformers deep learning model was proposed and validated to forecast nitrate levels in an aquaponics environment. An aquaponics dataset with temporal features and a high number of input lines has been employed for validation and extensive analysis. Experimental results demonstrate significant improvements of the proposed model over baseline models in terms of MAE, MSE, and Explained Variance metrics considering one-hour sequences. Utilizing the proposed solution can help enhance the automation of aquaponics environments.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

The increasing global population has led to an increase in food demand, putting a strain on traditional agriculture. This demand is further exacerbated by climate change, leading to challenges in water availability [1]. The adoption of chemical fertilizers and pesticides in traditional agriculture has helped increase crop yields but has also had negative impacts on the environment and human health. The excessive use of fertilizers and pesticides has led to soil degradation, water pollution, and health problems, leading to debates worldwide about the sustainability of agriculture. Organic agriculture proposes a healthier alternative to reduce the impact of pesticides but provides less food compared to traditional agriculture [2, 3]. In recent years, there has been a growing interest in sustainable farming practices that can address these challenges, such as the use of aquaponics.

Aquaponics is a new farming method that has gained popularity due to its benefits over traditional agriculture. It eliminates the need for synthetic fertilizers and pesticides, making it an organic farming method. The aquaponics technique proposes a more complex yet highly rewarding alternative to conventional agriculture using a combination of hydroponics, fish farms, and bacteria. In aquaponics, fish waste provides nutrients to the plants, which in turn filters the water and provides clean water back to the fish. Fish waste produces ammonia, which is toxic to fish but can be converted into nitrate by beneficial bacteria. Figure 1 depicts a sample aquaponics ecosystem. Hydroponic farms need nutritious water to reproduce. Fish waste contains ammonia, which can be collected through water pumps and turned into nitrate using beneficial bacteria.

The only input to the aquaponics ecosystem is fish food. Many existing aquaponics ecosystems require careful maintenance through human intervention, which can be time-consuming and labor-intensive. Keeping pH between certain levels, feeding time for fish, and maintaining temperature [4] are all things that must be periodically checked. This manual maintenance can be more challenging for larger aquaponics systems that require more frequent monitoring. With the developments in the Internet of Things (IoT) technology, there is now an opportunity to reduce human intervention and improve food yields by implementing a smart environment capable of mass production.

To predict nitrate levels in an aquaponics environment accurately, highly accurate prediction models are needed. While several models are possible using traditional machine learning algorithms, their performance needs to be improved. Therefore, this study utilized the most recent machine learning and deep learning algorithms. The algorithms include Long Short-Term Memory (LSTM), Encoder–Decoder LSTM, Attention LSTM, Extreme Learning Machine (ELM), and Temporal Fusion Transformer (TFT). Since LSTM models are effective in capturing temporal dependencies, which are often present in time series data, they have been successfully used for several problems involving time series data [5]. Recently, a new deep learning algorithm named Transformer has been developed, and several variations of this algorithm have been implemented [6]. For longer sequences with complex dependencies, transformer-based models provide relatively better performance than LSTM models. To the best of our knowledge, Temporal Fusion Transformers have not been applied in this problem before. Therefore, we aimed to utilize these algorithms in this research. Since the main objective is to achieve a highly accurate prediction model, state-of-the-art machine learning algorithms were applied and compared.

The contributions of the study are given as follows:

A novel deep learning model has been developed for forecasting aquaponics. The model comprises a TFT-based solution.
The TFT network improves the hourly forecasting performance for the aquaponics environment over the previous works regarding Mean Absolute Error (MAE) and Explained Variance.
The high-performing forecasting accuracy provides opportunities for automated processes.

The paper is organized as follows: Sect. 2 provides the related work. Section 3 explains the methods, including the ELM, LSTM, Encoder–Decoder networks, the TFT technique, and with Evaluation Metrics. Section 4 presents the dataset and the experimental results. Section 5 includes the discussion, and Sect. 6 concludes the paper.

In this section, previous studies and the current state of literature in the context of the maintenance of smart aquaponics systems based on Artificial Intelligence (AI) are discussed. Monitoring and maintaining self-sufficient smart aquaponics systems requires autonomous control through sensors. Arvind et al. developed a miniature smart aquaponics ecosystem through several IoT sensors and used the produced data to implement an autoML regressor [7]. The regressor is then utilized to create autonomous anomaly signals, which can be used to reduce the maintenance burden of the proposed ecosystem. Mehra et al. proposed an artificial neural network (ANN) to classify several anomalies, such as lack of nutrients and changes in levels of humidity or lighting, through sensor reports [8], though the system lacks reproducibility as the accuracy metrics are not provided. Hydroponics systems require a certain pH level to operate properly. One of the main factors affecting the pH level is the presence of heavy metals in the ecosystem. Dhal et al. proposed a real-time machine learning-based solution supported by a real-life application to monitor and detect anomalies in heavy metal levels [9]. The solution only analyzes calcium, sulfate, and phosphate and can be expanded to include other heavy metals such as iron, copper, and zinc. Another limitation of the work was that, while there was a high-dimensional feature space, the observation size was small, resulting in reduced prediction performance.

Advancements in modern cameras improved industrial image quality in the last decade and enabled more powerful image processing techniques to be applied in the management of aquaponics. Handling these images through inference is still a challenging topic, as these systems consist of several low-capacity IoT devices. Another solution requires server-based handling and fast communication technologies. With the adoption of 5 G, improved data transfer speeds provide better opportunities to establish autonomous aquaponics systems. Kumar et al. provided an end-to-end aquaponics system to detect anomalies in the physical conditions of fish [10]. The fish tanks are periodically imaged and subjected to classification through Bayesian classification. The tanks are connected through 6LowPAN, which provides the bandwidth for image transfer. Another advantage of vision systems has been that the stage of growth can become monitorable and the study proposed by Lauguico et al. has shown that crop yielding can be assisted by Machine Learning (ML) based algorithms [11]. The analysis lacks number of observations and does not provide an autonomous handling and resolution strategy.

In recent years, studies have shown great opportunities for handling both aquaculture, hydroponics and aquaponics anomalies through time-series-based analysis and future forecasting. Cardenas et al. proposed an RNN-based solution to forecast sudden changes in pH using Recurrent Neural Networks (RNNs) [12]. Thai-Nghe et al. conducted univariate time-series analysis to monitor water quality in real-time [13]. The study has shown that the LSTM algorithm can produce better results against baseline ML methods when tackled with univariate representations. Liu et al. implemented a water quality forecasting framework using a Bi-directional Stacked Simple Recurrent Unit (Bi-S-SRU) [14]. The Bi-S-SRU framework, compared to a vanilla RNN, shows improved forecasting accuracy in terms of longer sequences while also providing good inference time. Both the fish and the plant need certain conditions for a healthy aquaponics environment. Thus, nutrient-based analysis to detect input anomalies in the environment is an important problem. Dhal et al. provided the much needed research with proportional nutrient data analysis [15]. The researchers deployed an IoT-based aquaponics laboratory and collected inputs with high-dimensional feature space. Still, the research lacked a proper number of observations and is supported by data aggregation techniques to enable AI-based assistance. The drawback of having small datasets is investigated in other studies using baseline ML algorithms [16, 17]. While the studies offered a benchmarking viewpoint on small datasets, the general applicability of the results remained low.

Table 1 presents the relevant studies and the current research. It is clearly seen that most papers used proprietary datasets instead of public datasets.

Table 1

The relevant studies

Study	Dataset	Method	Metrics	Gaps
Proposed approach	Sensor-based aquaponics	TFT	MAE, RMSE	Multi-variate time-series analysis with,
				Multi-head attention-based TFT architecture
Arvind et al. [7]	PASCAL VOC 2017 and 2012	AutoML	ROC, F1	Non-temporal analysis
Mehra et al. [8]	Proprietary	ANN, Bayesian Network	Accuracy	Small number of observations,
				Non-temporal analysis
Lauguico et al. [11]	Proprietary	Logistic regression, KNN, L-SVM	F1	Small number of observations,
				Non-temporal analysis
Liu et. al. [14]	Water quality dataset	Bi-S-SRU	MSE	Non-temporal analysis
Cardenas et. al. [12]	Proprietary	MLP, LSTM, GRU	MSE	Method is susceptible to small-scale noises
Dhal et al. [15]	Proprietary	XG-boost, extra trees classifier	F-Score	Small number of observations,
				Non-temporal analysis
Thai-Nghe et al. [13]	Water quality—Tomales Bay	LSTM, SVM	RMSE	Only univariate analysis

3 Materials and methods

In this section, the proposed TFT model as well as the baseline techniques leading to the development of the proposed model (LSTMs, encoder–decoder networks, the attention concept) are briefly presented. The proposed model is also compared with the Extreme Learning Machine (ELM) as another baseline method. ELM is a simple and fast-converging algorithm, which can excel at representing complex datasets. Still, the need to employ high number of neurons makes the algorithm slower on inference time. Liu et al. demonstrate the power of ELM against LSTM in the estimation of photovoltaic power, where the ELM algorithm is both more accurate and also computationally more efficient [18]. LSTM is appropriate for modeling sequences with long-term dependencies, whereas encoder–decoder networks are appropriate for modeling complicated sequences with variable-length input and output. Encoder–decoder networks function better when attention mechanisms are used, and the Temporal Fusion Transformer is a neural network architecture that was specifically designed for time series forecasting tasks. It combines the strengths of LSTMs, attention mechanisms, and transformers to produce accurate and robust predictions for multivariate time series data. The inefficiency of simpler models like LSTM or GRU in comparison with more sophisticated models like TFT is also demonstrated by research that estimates energy usage [19].

The quality of the data and how it is represented determines how well time series analysis turns out. The characteristics of the dataset and the procedures to prepare it for time forecasting models are described in the section on the dataset and data preparation. The section ends with an explanation of the evaluation metrics applied in the study. This entire method process is illustrated in the workflow diagram in Fig. 2.

Aquaponics environments, when integrated with several sensors, produce a series of time-stamped measurements such as levels of nitrite, ammonia, and pH. A fusion-based transformer deep learning model to perform a precise forecast of nitrate levels for the upcoming time window using historical data is proposed. From a machine learning perspective, the problem can be classified as a regression problem. Classic machine learning techniques are inadequate for modeling the complexity of the problem due to the high input dimension; instead, more complex models are required.

3.1 Dataset

This study utilized the sensor-based aquaponics dataset proposed by Ogbuokiri et al. [20]. This dataset was selected because it is a recent dataset, has high-quality data points with reliable sensor measurements, includes several relevant parameters, is easily accessible, and has a suitable frequency of data collection. The prediction of nitrate levels in aquaponic systems was performed for the first time using a time-stamped dataset with a high-dimensional feature space. The dataset contains 6 parameters of water quality sensors (i.e., temperature, turbidity, dissolved oxygen, pH, ammonia, nitrate), time, and physical conditions of fish (i.e., length, width, population). The dataset’s default data collection interval is five seconds and contains sensor data for nine freshwater catfish ponds. Each sensor was initially calibrated in accordance with industry standards before being tested [20]. The trustworthiness of the data and the lack of a prior time-stamped investigation led to the selection of the proposed dataset. The basic statistical analysis of dataset parameters is depicted in Table 2.

Table 2

Dataset statistics for sensor-based attributes

Attribute	Definition	Average	SD
Time	Timestamp	–	–
Temperature	Temperature sensor (DS18B20)	24.565268	0.899205
Turbidity	DF Robot Turbidity sensor	69.490202	43.233901
Dissolved oxygen	DF robot dissolved oxygen sensor	10.583218	10.673741
pH	DF robot pH sensor V2.2	6.033098	2.949616
Ammonia	MQ-137 ammonia sensor	229841283.6471	9104453144.3300
Nitrate	MQ-135 nitrate sensor	699.520674	550.081504

3.2 Data preparation

Normalization is required if the features have drastically different values. The Aquaponics dataset shows a high variational difference between parameters, which leads to some features being dominant, as shown in Fig. 3. The proposed features are normalized to similar sizes because they should be equally important for estimating the nitrate level in an aquaponic system analysis. Without normalization, the training model could blow up with NaNs if the gradient update is too large. By defining a unique effective learning rate for each feature, optimizers like Adagrad and Adam provide protection against this problem. Still, ELM is not a gradient-based algorithm, meaning that the optimizer functionality cannot be utilized. Unlike gradient-based models, high-variance input data causes input saturation in the ELM model, where the activation function gets saturated at spiked values, limiting the model’s capacity to understand the underlying patterns in the data. This can be addressed by applying normalization techniques to the dataset. According to studies [21, 22], min-max normalization performs better than its counterparts in time-series-based analysis. Each piece of input is normalized to a value between 0 and 1, which minimizes the impact of noise and guarantees that neural networks update parameters effectively, accelerating the training of the network. Therefore, min-max normalization is utilized in the study as given in Eq. [1] to normalize the features.

$$\begin{aligned} \widetilde{X}=\dfrac{x-\text {min}}{\text {max}-\text {min}} \end{aligned}$$

(1)

X shows the input variable; min and max values point to the lowest and highest points present in the series, and $\widetilde{X}$ indicates the normalized value. The dataset also contained missing values with a percentage of around 0.001%. These missing values have been cleaned from the dataset. The original dataset reports at an interval of 20 s, which leads to a noisy structure. Thus, the interval length has been selected as 60 s for the study, effectively merging each of the three sensor reports by using the arithmetical mean operator. There are 421140 data points in chronological order. There are three subsets of the entire data set: the training dataset, the validation dataset, and the test dataset. The data set consists of 90% training data, 8% validation data, and 2% test data.

3.3 LSTM

RNN and ELM differ in that RNN is better at handling sequence data. It has the ability to back-propagate as well. The gradient’s functionality is to update the recurrent neural network’s weight value. If the weight is kept too low, the gradient vanishes, and the hidden layer’s ability to learn diminishes. The gradient also explodes if the specified weight is too high. RNNs also include feedback connections in the hidden layer units of their architecture. They are able to process temporal information and learn sequences as a result of this capability. The hidden layer functions as a memory and has the capacity to store sequential data.

The LSTM method was developed as an improved version of RNNs, where the vanishing gradient problem is addressed [23]. The term “gated” cell refers to this type of cell because it allows the user to choose whether to retain or disregard stored information. LSTM comprises three gates, namely an input gate, a forget gate, and an output gate (Fig. 4).

Forget gate selectively decides what information from earlier time steps should be retained. Equation 2 is applied.

$$\begin{aligned} f(t)= \sigma (x(t) U_{f} + h(t -1) W_{f}) \end{aligned}$$

(2)

In order to determine which information should be retained in the LSTM memory, this control gate employs a sigmoid function. The values of $h(t-1)$ and x(t) are largely responsible for the selection. f(t) produces output values between 0 and 1. The values close to 0 denote the total loss of the previously acquired information, while the values close to 1 preserve the entire information.

Input gate that determines which information from the most recent time step should be added. According to Eqs. 3, 4 and 5, this gate is made up of a sigmoid layer and a hyperbolic tangent (tanh) layer.

$$\begin{aligned} i_{1}= \, &{} \sigma (x(t)U_{i} + h(t-1)W_{i}) \end{aligned}$$

(3)

$$\begin{aligned} i_{2}= \, &{} \text {tanh}(x(t)U_{j} + h(t-1)W_{g}) \end{aligned}$$

(4)

$$\begin{aligned} i(t)= \, &{} i_{1}(t) * i_{2}(t) \end{aligned}$$

(5)

A vector of new candidate values that will be added to the LSTM memory is represented by $i_{2}$ and $i_{1}$ represents whether the value needs to be modified or not. Following that, element-wise multiplication is applied to the tanh and sigmoid outputs. Cell state, which carries information during the entire sequence and in Eq. 6, serves as a representation of the network’s memory.

$$\begin{aligned} C(t) = f(t) * C(t-1) + i(t) \end{aligned}$$

(6)

First, the forget gate’s output is multiplied element-wise by the cell state from the previous time step. This makes it possible to reject values in the cell state when they are multiplied by values close to 0. Next, the cell state is enhanced by adding the input gate’s output element by element. The new cell state is what is produced in Eq. 7. The output gate decides the value of the output at the current time step.

$$\begin{aligned} o(t)= \, &{} \sigma (x(t)U_{0} + h(t-1)W_{0}) \end{aligned}$$

(7)

$$\begin{aligned} h(t)= \, &{} \text {tanh}(C_{t}) * o(t) \end{aligned}$$

(8)

This gate first determines which side of the LSTM memory contributes to the output using a sigmoid layer. The information that the hidden state should contain is ultimately determined by multiplying the tanh output by the sigmoid output in Eq. 8. The new hidden state is the output.

3.4 Encoder–decoder networks

The Encoder–Decoder standard model [24] is generally incapable of accurately handling long input sequences. The encoder processes the input sequence and compresses the information into a context vector of fixed length. Therefore, only the last hidden state of the encoder RNN is used as the context vector for the decoder. It is expected that this representation will be a good summary of the full capture sequence. On the contrary, the first part is mostly forgotten once it completes the processing of the entire entry. In the Encoder–Decoder model, an encoder reads the input sentence, a sequence of vectors $x = (x_{1}$,...,$x_{T})$, into a vector c. At each time step t, the hidden state $h_{t}$ of the RNN is updated by using Eqs. 9 and 10 where $f$ and $q$ are nonlinear activation functions.

$$\begin{aligned} h_{t}= \, &{} f(x_{t},h_{t-1}) \end{aligned}$$

(9)

$$\begin{aligned} c= \, &{} q(\{h_{1},\ldots ,h_{T}\}) \end{aligned}$$

(10)

The suggested model’s decoder is conditioned to produce the output sequence by anticipating the subsequent symbol $y_{t}$ given the hidden state $h_{t}$. Additionally, $y_{t}$ and $h_{t}$ are dependent on $y_{t-1}$ and the input sequence’s summary $c$.

$$\begin{aligned} h_{t} = f(h_{t-1}, y_{t-1},c) \end{aligned}$$

(11)

Consequently, the decoder’s hidden state at time $t$ is computed.

3.5 Attention mechanism

Initially, attention focused on solving the main problem around the Encoder–Decoder model and achieved great success. The attention was presented by Bahdanau [25], who revolutionized the field of deep learning with the concept of parallel treatment of words instead of processing them sequentially. The central idea of this layer is as follows: Each time the model predicts an output word, it only uses parts of the input where the most relevant information is concentrated, rather than the whole sequence. It only pays attention to the most relevant inputs and the computing process, as follows:

The input sentence is mapped by an encoder to a sequence of annotations $ (h_{1}$, ...,$ h_{T}) $, which determine the context vector $c_{i}$. Although each annotation $h_{i}$ contains details on the input sequence, only a specific portion of the input is highlighted. As a weighted sum, the context vector is subsequently calculated as follows:

$$\begin{aligned} e_{t,i} = a(s_{t-1},h_{i}), \end{aligned}$$

(12)

This is known as the alignment model. Based on how closely the input at position $i$ and the output at position $t$ match, the alignment model gives a score $e_{t, i}$. The weights $a_{t, i}$ are computed in Eq. 13 by applying a softmax operation to the previously computed alignment scores.

$$\begin{aligned} a_{t,i} = \dfrac{\text {exp} (e_{t,i})}{\sum _{j=1}^{T}\text {exp}(e_{t,j})} \end{aligned}$$

(13)

Each time step, the decoder receives a distinct context vector, $c_{t}$. It is calculated using the weighted total of all the hidden states of the encoder in Eq. 14.

$$\begin{aligned} c_{t} = \sum _{i=1}^{T}a_{t,i}h_{i} \end{aligned}$$

(14)

3.6 The temporal fusion transformer

The TFT provides a neural network architecture that combines the workings of a number of existing neural architectures, such as LSTM layers, encoder–decoders, and the attention heads used in transformers, as shown in Fig. 5 [26]. The transformer primarily consists of an encoder and a decoder, where the encoder part uses the time series data as input and the decoder part produces context-aware embeddings to predict future values. LSTM encoders and decoders summarize shorter patterns, whereas long-range relationships are left to the attention heads. The temporal multi-head attention block finds and prioritizes the most important long-range patterns that the time series may include. Each attention head can focus on a different temporal pattern.

The context vector is supplied to the Gate layer and then, to the Add & Norm layer. The dropout layer is only used during training and helps prevent the over-fitting of the network by randomly eliminating some weights at a rate set by the user. The gated layer merely regulates the bandwidth of information flow within a particular neuron. Self-attention gathers information from a couple of different neurons. The layer first combines the weights from the gated layer with the residual connection weights in the Add & Norm layer. The dependency on batches is eliminated by later normalizing each input to a particular layer across all features. Because of this feature, sequence models like transformers and recurrent neural networks are well suited for layer normalization.

In terms of processing and predicting time series data, TFT models have proven to be more sophisticated than conventional LSTM models. By taking advantage of self-attention, this model offers a novel multi-head attention mechanism that, when analyzed, sheds further light on feature significance. Therefore, in contrast to other deep neural networks, these features are no longer regarded as black box. The following is a list of TFT’s primary components:

3.6.1 Gated residual network (GRN)

GRNs are used to eliminate unnecessary and unimportant inputs. In order to avoid over-fitting, nodes can be dropped arbitrarily. A more sophisticated model may not always produce greater prediction performance for machine learning models. The ELU (Exponential Linear Unit) and GLU (Gated Linear Units) activation functions assist the network in determining which input transformations are straightforward and which require more complex modeling. The output is the standard Layer Normalized before being output. Additionally, the GRN has a residual connection, which enables the network to learn, if required, to ignore the input. The GRN has two different inputs: an optional context vector $c$ and a primary input p, which are depicted in Eqs. 15-17 as follows:

$$\begin{aligned} \text {GRN}_w(p,c)= \, &{} \text {LayerNorm}(p+GLU_w(\eta _{1})) \end{aligned}$$

(15)

$$\begin{aligned} \eta _{1}= \, &{} W_{1,w}\eta _{2} + b_{1,w} \end{aligned}$$

(16)

$$\begin{aligned} \eta _{2}= \, &{} ELU(W_{2,w}p + W_{3,w}c) b_{2,w} \end{aligned}$$

(17)

Where ELU is the activation function, $\eta _{1} \in R^{d_{\text {model}}}, \eta _{2} \in R^{d_{\text {model}}} $ are intermediate layers, LayerNorm is standard layer normalization and the index $w$ indicates weight sharing. Following is a description of the GLU:

$$\begin{aligned} \text {GLU}_{w}(\gamma ) = \sigma (W_{4,w}\gamma + b_{4,w}) \bigodot (W_{5,w}\gamma + b_{5,w}) \end{aligned}$$

(18)

If the input is $\gamma $, the sigmoid activation function is represented by $\sigma $. In addition, $w $ and $b$ represent weights and biases, respectively. The element-wise Hadamard product is $ \bigodot $. The model’s structure can be managed by GRN through the GLU, and extra layers can be disregarded. Because nonlinear contributions may be suppressed by having all the GLU’s outputs close to zero, this layer may be completely omitted if necessary (Fig. 6).

3.6.2 Variable selection network (VSN)

TFT’s variable selection networks are able to decide which input variables are suitable for each time step. Additionally, to enhance forecast accuracy, this module can remove the impact of irrelevant variables. TFT uses three instances of the Variable Selection Network (VSN) because there are three different input modalities. As a result, each instance has a distinct weight. Categorical variables are represented with entity embeddings, and continuous variables are represented with linear transforms. GRN is managed internally by the VSN for filtering. Following is a description of the VSN:

$$\begin{aligned} v_{x_{t}}= \, &{} \text {softmax}(\text {GRN}_{v_{x}} (\Xi _{t},c_{s})) \end{aligned}$$

(19)

$$\begin{aligned} \tilde{\xi }_{t}= \, &{} \sum _{i=1}^{m_{x}} v_{x_{t}}^{(i)} \tilde{\xi }_{t}^{(i)} \end{aligned}$$

(20)

$$\begin{aligned} \tilde{\xi }_{t}^{(i)}= \, &{} \text {GRN}_{\tilde{\xi }(i)} (\xi _{t}^{(i)}) \end{aligned}$$

(21)

The flattened vector of all previous inputs, called $\Xi _{t}$, which is from the corresponding lookback period, is fed through a GRN unit and a softmax function at time t to produce a normalized vector of weights, denoted by the $v_{x_{t}}$. The context vector, abbreviated as $c_{s}$, comes from a static covariate encoder. $\tilde{\xi }_{t}^{(i)}$ is the output of a gated residual network. $\tilde{\xi }_{t}^{(i)}$ was calculated by feeding $\xi _{t}^{(i)}$ in GRN.

3.6.3 Interpretable multi-head attention

The self-attention mechanism is used in this step to assist the model in learning long-range dependencies across various time steps. Contrary to the standard implementation, the novel Multi-Head Attention mechanism proposed by TFT provides feature interpretability. To project the input into different representation subspaces, the original architecture included various “heads” like Query, Key, and Value weight matrices. This method’s disadvantage is that there is no common ground between the weight matrices, making it impossible to interpret them. The addition of a new matrix by TFT’s multi-head attention allows the various heads to share some weights, which can then be explained in terms of seasonality analysis. According to the following relationships between keys $ K \in \mathbb {R}^{N \times d_{attn}} $ and queries $ Q \in \mathbb {R}^{N \times d_{attn}} $, attention mechanisms scale values $ V \in \mathbb {R}^{N \times d_{v}} $ generally as follows:

$$\begin{aligned} \text {Attention}(Q, K, V ) = A(Q, K)V \end{aligned}$$

(22)

$ A() $ is a function that normalizes data. The scaled dot-product for attention values is typically given as follows:

$$\begin{aligned} A(Q, K) = \text {Softmax}\left(\dfrac{QK^T}{\sqrt{d_{\text {attn}}}}\right) \end{aligned}$$

(23)

To improve the model’s capacity for fitting, the TFT employs a multi-head attention structure. The mathematical interpretation of Multi-head attention is given as follows:

$$\begin{aligned} \text {MultiHead}(Q, K, V)= \, &{} \left[ H_{1},\ldots ,H_{m_{H}}\right] W_{H} \end{aligned}$$

(24)

$$\begin{aligned} H_{h}= \, &{} \text {Attention}(QW_{Q}^{(h)}, KW_{K}^{(h)}, VW_{V}^{(h)}) \end{aligned}$$

(25)

Attention weights alone would not be a good indicator of the significance of a particular feature because of the different values used in each head. Therefore, a multi-head attention technique to share values across different heads and use additive head aggregation is utilized. This approach particularly improves the multi-feature representative capability of the proposed model. The characteristics of an interpretable multi-head are described as follows:

$$\begin{aligned} \text {InterpretableMultiHead}(Q, K, V)= \, &{} {\tilde{H}}W_{H} \end{aligned}$$

(26)

$$\begin{aligned} \tilde{H}= \, &{} \tilde{A}(Q,K)VW_{V} \end{aligned}$$

(27)

$$\begin{aligned}= \, &{} \left\{ \dfrac{1}{m_{H}} \sum _{h=1}^{m_{H}} A ( QW_{Q}^{(h)}, KW_{K}^{(h)})\right\} VW_{V} \end{aligned}$$

(28)

$$\begin{aligned}= \, &{} \dfrac{1}{m_{H}} \sum _{h=1}^{m_{H}} \text {Attention}(QW_{Q}^{(h)}, KW_{K}^{(h)}, VW_{V}) \end{aligned}$$

(29)

It is clear to see that the result of interpretable multi-head attention is very similar to that of a single attention layer, with the main distinction being the process used to produce attention weights $ \tilde{A}(Q,K) $. While paying to a common set of input features V, each head can learn various temporal patterns $A ( QW_{Q}^{(h)}, KW_{K}^{(h)})$, which can be understood as a simple ensemble over attention weights into combined matrix $ \tilde{A}(Q,K) $. When compared to $ A(Q, K) $, $ \tilde{A}(Q,K) $, the representation capacity is successfully enhanced.

3.7 ELM

Backpropagation employs gradients as a basis. The structural nature of the algorithm provides high computational capacity to model complex problems. Gradient-based neural network algorithms also show a high predisposition to local optimus. Thus, regarding the nature of the problem analyzed, the algorithm is also included in the study for comparative purposes.

The Extreme Learning Machine (ELM) offers a rapid and powerful alternative for both Machine Learning and Deep Learning-based solutions [27]. ELM is a training approach for a single hidden layer feed-forward neural network (SLFN). The architecture employs the following three layers: an input layer, a hidden layer, and an output layer. The hidden layer bias and input weights for the Extreme Learning Machine are determined at random and frozen during training. The ELM only optimizes the hidden layer weights. A single training iteration and random hidden layer weights enable faster convergence to the global optimum.

Mathematically, ELM can be formulated according to the following equation:

$$\begin{aligned} \sum _{i=1}^{\hat{N}} \beta _{i}g(w_{i}x_{i}+b{i}) = o_{j}, j=1,\ldots ,N \end{aligned}$$

(30)

3.8 Experimental environment

The development language for both pre-processing and network implementation was Python version 3.7. The proposed LSTM networks were implemented using the Keras framework with version 2.9.0, and the TFT network was developed with Pytorch Lightning version 1.8.0.post1 and Pytorch Forecasting version 0.10.1.

The hyperparameter optimization has been conducted at the B.T.U. High-Performance Clustering Laboratory (HPCLAB).¹ The model is trained on Nvidia 3090 GPUs with CUDA version 8. Eight GPUs were used in parallel to accelerate the overall training process.

3.9 Evaluation metrics

Three measures, namely Mean Absolute Error (MAE), Explained Variance Score, and Mean Square Error (MSE), were used to evaluate the proposed model’s predictive ability. The Mean Absolute Error is the difference between the expected and actual values expressed in absolute terms. In MSE, the average squared difference between the observed and predicted values is assessed. Any unfavorable indications are changed by the squaring. The error measures are defined in Eqs. [31] and [32].

$$\begin{aligned} \text {MAE}= \, &{} \dfrac{1}{N} * \sum _{L=1}^{N} \mid y_{p}-y_{a} \mid \end{aligned}$$

(31)

$$\begin{aligned} \text {MSE}= \, &{} \dfrac{1}{N} * \sum _{L=1}^{N} (y_{p}-y_{a})^2 \end{aligned}$$

(32)

Explained Variance Score is a metric for measuring the disparity between a model’s predictions and the actual data. In other words, it is the portion of the model’s total variance that is not attributable to error variance but is explained by factors that are actually present. Scores close to 1.0 are highly desired, and error measures are defined in Eqs. [33]. The variance of the predicted errors and the variance of the actual values are denoted by $ Var(y_{a}-y_{p} ) $ and $ Var(y_{a}) $, respectively.

$$\begin{aligned} \text {Explained}\, \text {Variance} (y_{a},y_{p})= 1 - \dfrac{\text {Var}(y_{a}-y_{p})}{\text {Var}(y_{a})} \end{aligned}$$

(33)

$ R^2 $ Eqs. [34], where N is the total number of the forecasting value, $y_{p}$ is the predicted value, $y_{a}$ is the original actual value, and $y_{average}$ is the average of the original value. Perfect forecasting results in a value of 1, whereas a value of 0 means that the performance is identical to that of a simple model that consistently forecasts the mean value of the data. There is minimal association between the findings and the dataset when the $ R^2 $ value is negative.

$$\begin{aligned} R^2(y_{a},y_{p})= 1 - \dfrac{\sum _{L=1}^{N} {(y_{a}-y_{p})^2}}{\sum _{L=1}^{N}{(y_{a}-y_{\text {average}})^2}} \end{aligned}$$

(34)

4 Experimental results

4.1 Hyperparameter optimization

The selection of hyperparameters has a significant impact on how well deep learning models perform. Because of this, fine-tuning becomes crucial in the training stage to produce a successful model. This work uses the Keras Bayesian optimizer within the Keras-Tuner framework [28] to construct a hyperparameter optimization utilizing a random search method for the LSTM models. The ELM hyperparameters are optimized using a hand-made search technique. The hyperparameters for the suggested TFT network for predicting aquaponics systems have also been determined using Optuna [29] optimizer techniques within the Pytorch framework. The resulting combinations are shown in Table 3. Random combinations of all accessible hyperparameters defining the search spaces are produced, with the number of combinations generated depending on the maximum number of trials and the number of models to be trained for each test. A model is trained for each of these combinations, and the one that performs the best is saved as the best model.

Table 3

Hyperparameter search space and best hyperparameters

Hyperparameter	Search space	Selected value
Learning rate	[30.0 - - - 0.00001]	0.0912
Epochs	[10 - - - 400]	27
Dropout rate	[0.1 - - - 0.5]	0.36411
Gradient clip value	[0.01 - - - 1.0]	0.03279
Number of attention heads	[1 - - - 4]	2
Hidden size	[8 - - - 128]	13
Hidden continuous size	[8 - - - 128]	12

4.2 Discussion of results

Table 4 and Fig. 7 present the error rates of the proposed study for different methods concerning the metrics RMSE, MAE, Explained Variance and $R^{2}$. The proposed TFT algorithm demonstrates remarkable improvements over baseline models in all metrics. The higher rates of Explained Variance represent better association of variance between the original data-space and the generated data-space.

In all time windows, the performance improvement is also competitive. As the measurements deteriorate, the proposed model’s limitation of a one-hour forecasting window remains. The proposed TFT algorithm performs better than all baseline methods over an all period. As the value of the forecasting period rises, the forecasting performance of all models gradually deteriorates.

Table 4

Performance comparison of the proposed model with baseline methods in different time windows

Methods	Metrics	Time Windows
		15	30	45	60	90	120
Lstm	RMSE:	0.03559	0.03904	0.04524	0.04830	0.04928	0.05732
	Mae:	0.02341	0.02686	0.03246	0.03495	0.03404	0.04329
	Explained variance score:	0.76193	0.72133	0.63764	0.58409	0.51928	0.43771
	$R^{2}$ Score:	0.70776	0.64832	0.52807	0.46250	0.44105	0.24616
Attention	RMSE:	0.03441	0.04220	0.04275	0.04416	0.04679	0.05352
	Mae:	0.02132	0.03049	0.02941	0.02951	0.03033	0.03864
	Explained variance score:	0.73827	0.68339	0.64295	0.02951	0.49679	0.44146
	$R^{2}$ Score:	0.72683	0.58925	0.57854	0.55075	0.49607	0.34273
EncoderDecoder	RMSE:	0.03553	0.04136	0.04466	0.04523	0.05119	0.05615
	Mae:	0.02300	0.02962	0.03132	0.02906	0.03508	0.04111
	Explained variance score:	0.75628	0.71764	0.63622	0.55144	0.43935	0.39943
	$R^{2}$ Score:	0.70885	0.60529	0.54000	0.52870	0.39686	0.27651
ELM	RMSE:	0.04487	0.04808	0.04850	0.05113	0.05379	0.05730
	Mae:	0.02757	0.03167	0.03131	0.03449	0.03591	0.03961
	Explained variance score:	0.53905	0.48314	0.45864	0.40324	0.35113	0.28430
	$R^{2}$ Score:	0.53550	0.46666	0.45758	0.39770	0.33413	0.24656
TFT	RMSE:	0.02556	0.02658	0.03171	0.03222	0.03792	0.04039
	Mae:	0.01665	0.01850	0.02233	0.02261	0.02687	0.02899
	Explained variance score:	0.81492	0.80844	0.70583	0.67483	0.55179	0.46643
	$R^{2}$ Score:	0.81491	0.79612	0.69783	0.67474	0.52035	0.44584

The proposed model also shows clear improvements in terms of metric scores over the previous works. Table 5 shows a comparison of the proposed study with similar studies.

Table 5

Comparison of the proposed model with other studies

Study	Target	Methods	Metrics	Scores
[12]	pH	RNN	MSE	0.0032
[14]	Diss. oxygen	Bi-S-SRU	MSE	0.1058
Proposed model	Nitrate	TFT	MSE	0.0322

In the aquaponics nitrate forecasting study, having lots of training data can improve accuracy and produce more effective outcomes. Training, validation, and test rates must be very carefully adjusted in order to prevent over-fitting. Otherwise, the likelihood of receiving inaccurate forecasts is high. Figure 7 displays the estimation graph that the entire method generated. The test results made using the hourly nitrate data gathered for the following all test data are displayed.

Short-term representational strength is lacking in the baseline models. Figure 7 shows that the models can pick up certain aspects of the long-term structure. Although the dataset’s characteristics have been normalized, the internal structure of the dataset poses a complex modeling challenge for the baseline models. Because of the sudden jumps it contains, the basic models fail to simulate temporal representation. The result is that the look-ahead output is skewed and vulnerable to temporal drift. Shifts were always present in the forecasts, despite the ELM and LSTM models in Fig. 7 being able to detect the noise. However, alternative algorithms produce forecasts with a smoother transition while not capturing noisy data. This guarantees that the models produce results that are accurate. When looking at Fig. 7e, it is clear that the TFT’s predicted and actual nitrate level values overlap and that there is no excess in the deviations or variances between them. Given the near values of the data and the similarity of the directional breaks, it is clear why the graph is successful.

The results of the various temporal test data sets of the proposed model are shown in Fig. 8, and it is clear that even though it does not exactly match the data, the model enables the noise to be estimated with a smooth transition. The model is quite good at forecasting long-term predictions, but due to the enormous noise in short-term projections, it cannot capture such predictions accurately. However, the short-term model can sufficiently describe the pattern.

The computational performance of the models can be examined through the models’ parameter sizes or the total time spent on the training phase [14]. In terms of both model parameter size and total training time (Table 6), the suggested TFT model is higher to the basic models. The Encoder–Decoder layers operate with much fewer parameters while providing the model with improved learning capacity. The proposed model needs more epochs to train properly and requires more time in total. Prediction timeframes are the longest of all methodologies yet provide for sufficient time to run in production settings. The proposed TFT model outperforms existing algorithms in real-world settings in terms of performance needs.

Table 6

Computational complexity of models

Method	Param. size	Training time (s)	Prediction time (s)
Proposed TFT model	31.9k	12828.23	11.97
Attention LSTM	34.2k	147.65	0.40
Encoder–decoder LSTM	4.5k	102.73	0.24
LSTM	31k	215.05	0.38
ELM	–	0.05	0.05

5 Conclusions

Soil usage in agriculture poses a great limitation in matching the increasing demand for food. The soilless farming proposed by aquaponics creates an ecosystem where the only input is fish food and fertilizers are moved out of the system. Autonomous handling of aquaponics with less human intervention provides increased efficiency for higher throughput and fewer maintenance costs.

The baseline solutions for simulating aquaponics environments include LSTM, Encoder–Decoder architectures and Attention-based methods. As shown in Table 5, the proposed model provided 0.0322 MSE value for predicting the nitrate level. Also, different metrics such as RMSE, MAE, and explained variance scores per method are shown in Table 4. The TFT has a more intricate architecture and better learning capabilities than other baseline deep learning algorithms. In addition, the TFT is capable of taking into account a variety of factors in datasets, including static input, known input, and observed input. In contrast, traditional deep learning architectures may overemphasize factors that are irrelevant to the target variable.

TFT proposes several improvements over these baseline methods by combining the Encoder–Decoder LSTMs, which model short-term relations with good accuracy, with the feature weighting mechanism of attention matrices to capture long-term relations. Transformer, a recently created encoder–decoder model based on the attention mechanism, accurately calculates sequences without the use of any repeating neural networks. As a result, fewer parameters are needed to produce significantly better results. The architecture also improves the predictive performance of multi-variate forecasting due to the masked multi-head attention method used in the attentive layer. Thus, a transformer-based Deep Learning solution to forecast nitrate levels using multiple input-features in an aquaponics environment is utilized.

The predictive performance of the proposed method shows clear improvements over baseline models in terms of metrics MAE, MSE and Explained Variance when considering all sequences. The problem is impractical for sequences longer than an hour due to the memory space needed to process the attention matrix. Thus, rendering the simulation infeasible with limited computational resources. Besides, multi-step forecasting performance for longer sequences deteriorates after a certain sequence length. The proposed method offers increased sequence modeling capacity, enabling longer sequences to be represented. The employment of auto-regressive models along with TFT would mean even better sequence handling. Also, the Memory Transformers can be employed in multi-head attention architectures to lower the memory requirement of the general approach. To further capture noisy data in short-time predictions as in the LSTM model, new LSTM or RNN layers might be added to the proposed model. Short-time noise data are believed to be properly learned in this manner.

The proposed solution improves the overall viewpoint of aquaponics handling with better simulation performance. Obtained results can be used to handle anomalies in the ecosystem, such as fish diseases, pump failures, or bacterial problems. The employment of real-life applications along with the simulation can increase the effectiveness of the aquaponics architecture. Furthermore, the improvements obtained will bring us closer to the reality of the unmanned agriculture of aquaponics.

Declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical Approval

Not applicable.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

previous article Blockchain-assisted secure message authentication with reputation management for VANETs

next article Dynamic spiral updating whale optimization algorithm for solving optimal power flow problem

BTU HPCLAB Website: http://ceng.btu.edu.tr/hpclab/

Smit B, Smithers J (1993) Sustainable agriculture: interpretations, analyses and prospects. Can J Regional Sci 16(3):499–524

Reganold JP, Wachter JM (2016) Organic agriculture in the twenty-first century. Nat Plants 2(2):1–8CrossRef

Kaşif A, Ortaç G, Esma İ, Bilgin TT (2020) Performing similarity analysis on organic farming crop data of turkish cities. In: 2020 Innovations in Intelligent Systems and Applications Conference (ASYU), pp 1–4. IEEE

Yanes AR, Martinez P, Ahmad R (2020) Towards automated aquaponics: a review on monitoring, iot, and smart systems. J Clean Prod 263:121571CrossRef

Siami-Namini S, Tavakoli N, Namin AS (2018) A comparison of arima and lstm in forecasting time series. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp 1394–1401. IEEE

Narayan A, Mishra BS, Hiremath PS, Pendari NT, Gangisetty S (2021) An ensemble of transformer and lstm approach for multivariate time series data classification. In: 2021 IEEE International Conference on Big Data (Big Data), pp 5774–5779. IEEE

Arvind C, Jyothi R, Kaushal K, Girish G, Saurav R, Chetankumar G (2020) Edge computing based smart aquaponics monitoring system using deep learning in iot environment. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pp 1485–1491. IEEE

Mehra M, Saxena S, Sankaranarayanan S, Tom RJ, Veeramanikandan M (2018) Iot based hydroponics system using deep neural networks. Comput Electronics Agric 155:473–486CrossRef

Dhal SB, Mahanta S, Gumero J, O’Sullivan N, Soetan M, Louis J, Gadepally KC, Mahanta S, Lusher J, Kalafatis S (2023) An iot-based data-driven real-time monitoring system for control of heavy metals to ensure optimal lettuce growth in hydroponic set-ups. Sensors 23(1):451CrossRef

10.

Kumar NH, Baskaran S, Hariraj S, Krishnan V (2016) An autonomous aquaponics system using 6lowpan based wsn. In: 2016 IEEE 4th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW), pp 125–132. IEEE

11.

Lauguico SC, Concepcion R, Alejandrino JD, Tobias RR, Macasaet DD, Dadios EP (2020) A comparative analysis of machine learning algorithms modeled from machine vision-based lettuce growth stage classification in smart aquaponics. Int J Environ Sci Dev 11(9):442–449CrossRef

12.

Cardenas-Cartagena J, Elnourani M, Beferull-Lozano B (2022) Forecasting aquaponic systems behaviour with recurrent neural networks models. In: Proceedings of the Northern Lights Deep Learning Workshop, vol 3

13.

Thai-Nghe N, Thanh-Hai N, Chi Ngon N (2020) Deep learning approach for forecasting water quality in iot systems. Int J Adv Comput Sci Appl 11(8):686–693

14.

Liu J, Yu C, Hu Z, Zhao Y, Bai Y, Xie M, Luo J (2020) Accurate prediction scheme of water quality in smart mariculture with deep bi-s-sru learning network. IEEE Access 8:24784–24798CrossRef

15.

Dhal SB, Jungbluth K, Lin R, Sabahi SP, Bagavathiannan M, Braga-Neto U, Kalafatis S (2022) A machine-learning-based iot system for optimizing nutrient supply in commercial aquaponic operations. Sensors 22(9):3510CrossRef

16.

Dhal SB, Bagavathiannan M, Braga-Neto U, Kalafatis S (2022) Can machine learning classifiers be used to regulate nutrients using small training datasets for aquaponic irrigation?: A comparative analysis. Plos One 17(8):0269401CrossRef

17.

Dhal SB, Bagavathiannan M, Braga-Neto U, Kalafatis S (2022) Nutrient optimization for plant growth in aquaponic irrigation using machine learning for small training datasets. Artif Intell Agric 6:68–76

18.

Li Q, Zhang X, Ma T, Jiao C, Wang H, Hu W (2021) A multi-step ahead photovoltaic power prediction model based on similar day, enhanced colliding bodies optimization, variational mode decomposition, and deep extreme learning machine. Energy 224:120094CrossRef

19.

Nazir A, Shaikh AK, Shah AS, Khalil A (2023) Forecasting energy consumption demand of customers in smart grid using temporal fusion transformer (tft). Res Eng, 100888

20.

Udanor C, Ossai N, Nweke E, Ogbuokiri B, Eneh A, Ugwuishiwu C, Aneke S, Ezuwgu A, Ugwoke P, Christiana A (2022) An internet of things labelled dataset for aquaponics fish pond water quality monitoring system. Data Brief 43:108400CrossRef

21.

Wu C, Chau KW, Fan C (2010) Prediction of rainfall time series using modular artificial neural networks coupled with data-preprocessing techniques. J Hydrol 389(1–2):146–167CrossRef

22.

Cao J, Li Z, Li J (2019) Financial time series forecasting model based on ceemdan and lstm. Phys A Stat Mech Appl 519:127–139CrossRef

23.

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef

24.

Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:1409.1259

25.

Chorowski JK, Bahdanau D, Serdyuk D, Cho K, Bengio Y (2015) Attention-based models for speech recognition. Adv Neural Inf Process Syst, 28

26.

Lim B, Arık SÖ, Loeff N, Pfister T (2021) Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int J Forecasting 37(4):1748–1764CrossRef

27.

Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501CrossRef

28.

O’Malley T, Bursztein E, Long J, Chollet F, Jin H, Invernizzi L et al (2019) KerasTuner. https://github.com/keras-team/keras-tuner

29.

Akiba T, Sano S, Yanase T, Ohta T, Koyama M (2019) Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Title: Temporal fusion transformer-based prediction in aquaponics
Authors: Ahmet Metin
Ahmet Kasif
Cagatay Catal
Publication date: 06-06-2023
Publisher: Springer US
Published in: The Journal of Supercomputing / Issue 17/2023
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-023-05389-8

Springer Professional

Temporal fusion transformer-based prediction in aquaponics

Abstract

Publisher's Note

1 Introduction

3 Materials and methods

3.1 Dataset

3.2 Data preparation

3.3 LSTM

3.4 Encoder–decoder networks

3.5 Attention mechanism

3.6 The temporal fusion transformer

3.6.1 Gated residual network (GRN)

3.6.2 Variable selection network (VSN)

3.6.3 Interpretable multi-head attention

3.7 ELM

3.8 Experimental environment

3.9 Evaluation metrics

4 Experimental results

4.1 Hyperparameter optimization

4.2 Discussion of results

5 Conclusions

Declarations

Conflict of interest

Ethical Approval

Publisher's Note

Premium Partner

Springer Professional

Abstract

Publisher's Note

1 Introduction

2 Related work

3 Materials and methods

3.1 Dataset

3.2 Data preparation

3.3 LSTM

3.4 Encoder–decoder networks

3.5 Attention mechanism

3.6 The temporal fusion transformer

3.6.1 Gated residual network (GRN)

3.6.2 Variable selection network (VSN)

3.6.3 Interpretable multi-head attention

3.7 ELM

3.8 Experimental environment

3.9 Evaluation metrics

4 Experimental results

4.1 Hyperparameter optimization

4.2 Discussion of results

5 Conclusions

Declarations

Conflict of interest

Ethical Approval

Publisher's Note

Other articles of this Issue 17/2023

A novel spatio-temporal hybrid neural network for remaining useful life prediction

COLMA: a chaos-based mayfly algorithm with opposition-based learning and Levy flight for numerical optimization and engineering design

Micro-expression action unit recognition based on dynamic image and spatial pyramid

ASM-SDN: an automated station migration system in cluster-based heterogeneous software-defined network

Novel accelerated methods for convolution neural network with matrix core

Decentralized group decision making using blockchain

Premium Partner