Skip to main content

Open Access 21.03.2024 | Original Paper

Detailed analysis of Türkiye's agricultural biomass-based energy potential with machine learning algorithms based on environmental and climatic conditions

verfasst von: I. Pence, K. Kumas, M. Siseci Cesmeli, A. Akyüz

Erschienen in: Clean Technologies and Environmental Policy

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the study, the biomass and energy potential of each province of Türkiye was calculated for the years 2010–2021, using data from 15 different fields and 16 different horticultural crops. The total theoretical energy potential obtained from field and garden products was calculated as 222,620 Terajoule (TJ) and 61,737 TJ for 2010 and 308,888 TJ and 77,002 TJ for 2021, respectively. The agricultural biomass potential for 2021 was estimated using machine learning algorithms, depending on the environmental and climate data covering 2010–2020, which has not been studied in the literature. In this study, agricultural biomass potential for Türkiye was tried to be modeled by using Random Forest, K-Nearest Neighbors (KNN), Gradient Boosting, and eXtreme Gradient Boosting Regressor (XGBR) from machine learning methods. Agricultural biomass potential was tested in a tenfold cross-validation analysis and prediction for 2021 using only climatic and agricultural area data. In addition, by applying feature selection, it has been tried to reduce the features to be used and increase the success rate. Accordingly, when the results of the Random Forest algorithm were generalized, it achieved an R2 value of 0.9328 using all features for the tenfold cross-validation analysis. At the same time, it reached an R2 value of 0.9434 using four features in the prediction of 2021 and was found to be successful. Considering only the 2021 forecast, the KNN algorithm reached the highest result with an R2 value of 0.9560 using only four features. Also, the Wilcoxon rank-sum test result at p = 0.05 shows no significant difference between the predictions and the actual values.

Graphical abstract

Hinweise

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Abkürzungen
A
Availability
AEP
Available energy potential
CP
Annual yield
EU
European Union
GHG
Global greenhouse gas
GIS
Geographic information system
KNN
K-nearest neighbors
LHV
Lower heating value
M
Moisture content
MAE
Mean absolute error
R 2
Coefficient of determination
RMSE
Root-mean-square error
RPR
Product residue ratio
SGD
Stochastic gradient descent
TBP
Theoretical biomass potential
TEP
Theoretical energy potential
TJ
Terajoule
TUIK
Turkish Statistical Institute
XGBR
Extreme gradient boosting regressor

Introduction

Energy is needed at every stage of modern life. Energy consumption is an essential parameter in determining a country's development level. Energy consumption in developed countries is higher than in developing countries (Khanlari et al. 2020; Can 2022; Ozturk et al. 2017). With the development of technology, the energy demand of developing countries is increasing rapidly to meet the needs. Energy use is both a cause and a consequence of economic growth and development. Energy is essential for most economic activities. Fossil fuels are the primary source of energy for worldwide energy needs (Sayin et al. 2005; Jayarathna et al. 2020). These energy sources' production, transportation, and use cause high emissions, damage to the atmosphere, and climate change now and in the future (Pence et al. 2023; Kaygusuz 2010). Climate change has become an essential global problem related to energy, the economy, the environment, and technology. Significant steps must be taken to reduce global greenhouse gas (GHG) emissions (Barbera et al. 2019; D'Adamo et al. 2019; Zheng et al. 2019). For this reason, it is necessary to determine greenhouse gas emission management in countries subject to international commitments, agreements, and national policies. In this context, it is aimed to increase the share of renewable energy resources with some agreements and protocols to prevent emissions worldwide (Zheng et al. 2019; Chang and Hu 2019). In the 2015 Paris Agreement, which Türkiye also signed, it was decided to reduce CO2 emissions by 45% compared to 2010 and zero them by 2050.
The European Union (EU) aims to increase the share of renewable energy to at least 32% by 2030, reduce greenhouse gas emissions by 55%, and increase energy efficiency by at least 32.5% (Guler et al. 2022; Filipović et al. 2022). Depending on this decision, reducing emissions caused by energy consumption, which is one of the essential parameters in emissions, is crucial. The economy of Türkiye and industry organization in this direction will strengthen the country's adaptation to the European Green Deal (Tanasa et al. 2020; Cekinir et al. 2022). Increasing renewable energy sources in all sectors is vital to reducing greenhouse gas emissions, especially in industry. Increasing the use of energy resources such as hydroelectric, wind, solar, and biomass will be beneficial in preventing emissions by taking steps toward energy independence and security (Tumen Ozdil and Caliskan 2022).
The largest share of electricity in Türkiye is obtained from thermal power plants that consume natural gas, oil, and imported coal. Although lignite and coal have an essential potential, oil and natural gas reserves are almost negligible compared to world reserves (Erat et al. 2021; Telli et al. 2021). While production increased by 45.21% between 2010 and 2020, it increased by 17.20% between 2015 and 2020. Türkiye's electricity production in 2020 is 306,703 GWh. The use rate of renewable energy in electricity generation is approximately a 3.7% annual average increase from 2012 to 2020, 7.8% in hydraulics, 20.50% in wind, 36% in geothermal, and 31% in biomass (Pence et al. 2023; Ocak and Acar 2021). On the other hand, according to Türkiye's renewable energy targets, renewable energy sources are expected to meet 20% of total energy consumption and 30% of electricity production by 2023 (Pence et al. 2023; Rincon et al. 2019).
Türkiye's electricity demand is expected to reach 424 TWh in 2023, and it is predicted that the demand will be approximately five times higher in 2030 compared to 2000 (Melikoglu and Menekse 2020; Yurtkuran 2021). Türkiye has great potential in renewable energy. Türkiye's geographical location has several advantages for the widespread use of most renewable energy sources. Due to these advantages, it is a suitable country for energy production from renewable sources such as wind, solar, hydroelectricity, and biomass. Hydroelectric power plants come to the fore in Türkiye among these energy sources. As of the middle of 2022, there are 750 hydroelectric power plants in Türkiye (Şenol et al. 2030; Bakay and Ağbulut 2021). Türkiye's total installed power as of the end of 2023 is 106,071 MW. About 55.3% of the power plants in operation are power plants that produce electricity from renewable sources. Türkiye's total electricity installed power consists of thermal power plants (44.7%), hydroelectric power plants (29.8%), wind (11%), solar energy (10.6%), biomass (2.3%), and geothermal (1.6%) (TETC 2024).
Biomass is one of the important renewable energy sources with many uses worldwide. Municipal waste, vegetable oil waste, agricultural waste, forest product waste, industrial waste sludge, and sewage sludge are sources of biomass (Guler et al. 2022). This energy source has many advantages over other renewable energy sources. Some of these advantages are that it is easily obtained from organic materials, easy to obtain energy, and has many uses, such as home and industrial sectors that concern a large part of society (Ozturk et al. 2017; Channi et al. 2022; Jayarathna et al. 2022). Biomass can be burned directly or converted into solid, gaseous, and liquid fuels with the help of conversion technologies such as fermentation to produce alcohol, anaerobic digestion to produce biogas, and gasification to produce natural gas substitutes (Samadi et al. 2020). From an economic and technical point of view, it is one of the most sustainable energy sources for the country. This resource is available in a stored form and can increase employment opportunities in rural areas. It can help to reduce the trade deficit by reducing the dependence of developing countries like Türkiye, which imports a large part of its energy needs, on energy imports (Knápek et al. 2020). Due to these advantages, biomass stands out compared to other renewable energy sources, and it is accepted that biomass is more efficient in terms of economic and technical feasibility (Cekinir et al. 2022; Asghar et al. 2022). Biomass energy can be converted into thermal and electrical energy. For these reasons, energy policies and plans that ensure food safety worldwide should be supported in using biomass for energy purposes locally, nationally, and globally (Toklu 2017; Singh 2016).
Aiming to use bioenergy in Türkiye, at the First Agricultural Congress in 1931, it was discussed that the fuels needed for agricultural machinery would be produced with domestic resources instead of imports. It was emphasized that the fuels obtained using local resources benefit the national economy, and the importance of biofuel production was discussed in various dimensions. The first official document regarding biofuels in Türkiye was signed in 1934. Biogas production, one of the bioenergy sources, was initiated by the Soil and Water Research Institute in the 1950s. In the 1960s, pilot facilities were established within the State Production Farms, and eight biogas facilities were established by the Eskişehir Soil Water Research Institute affiliated with the Ministry of Agriculture. However, the work has ended due to a lack of technical personnel and inadequate training of farmers. Especially with the oil crisis in the early 1980s, studies on establishing biogas units increased. After 1980, studies on biogas production in Türkiye gained momentum, and the work that started with establishing a 35 m3 facility in Muş Province expanded to establishing approximately 1000 facilities with state support. Studies in this field in Türkiye continued increasingly after 2000. By the end of 2023, there are a total of 212 bioenergy facilities in Türkiye, 199 of which produce electrical energy from bioenergy, eight of which produce biodiesel, and five of which produce bioethanol (RTMENR 2024; Kumaş et al. 2019; Hatunoğlu 2010).
Biomass residues are directly linked to crop yield at all stages of agricultural production. Excess product production results in excess residues since residues constitute a certain percentage of the crop. During crop and fruit production in agriculture, many residues are obtained from field and horticultural crops. Biomass residues are the stems, straws, stems, leaves, branches, etc., after harvesting the main crop in agriculture, which are the remnants left by cutting and pruning (Tumen Ozdil and Caliskan 2022; Güney and Kantar 2020; Avcıoğlu et al. 2019). Biomass energy potential can be calculated based on these parameters. The potential also depends on environmental factors such as crop yield and biomass residues and their agronomic development, climatic conditions, and soil structure. For this reason, even if the same amount of agricultural product is obtained, different amounts of agricultural product residues used as biomass are obtained in different countries (Zafar et al. 2021; Balsalobre-Lorente et al. 2019; Aydin 2019). Türkiye shows different characteristics depending on its geographical location and landforms. Therefore, it has a wide variety of agricultural products (Senocak and Guner 2022). Different indicators such as climatic data, harvest time, and geographical conditions affect the amount of agricultural products. In recent years, agricultural areas have also been affected by the instantaneous effect of meteorological events, which causes climate change (Avcıoğlu et al. 2019; Zheng and Qiu 2020). In the agricultural sector, the climatic characteristics of that region are primarily considered when making construction and operation plans and production plans.
Climate and meteorological data are the absolute guiding factors in selecting the plant variety to be grown, tillage, planting, pruning, hoeing, irrigation, fertilization, spraying, harvesting, and micro-climatological environment planning. Climatic factors (temperature, relative humidity, dew, fog, precipitation, cloudiness, light, wind, snow, and frost) affect agricultural activities and cause severe problems if various applications are not made according to climatic factors (RTMAF 2022).
In recent years, studies have been carried out in Türkiye on using biomass resources such as hazelnut shells, agricultural waste, wheat straw, tea waste, and olive peel for energy purposes. It is imperative to focus on efficient production to meet the increasing energy demand and use of biomass energy to meet both traditional and modern fuel requirements (Yurtkuran 2021; Balat 2005). Studies have been carried out on the potential of biomass resources in the world and Türkiye. These studies, which investigated different results for calculating agricultural biomass residues and potentials, are as follows.
Singh (2016) developed a method for estimating India's biomass power potential from agricultural waste. It has been determined that 650 Mt of agricultural biomass is obtained annually in the country, and 1/3 of it can be used. It is stated that the equivalent of this biomass has an energy potential of 3.72 EJ, which is approximately equal to 23–35 GW of electrical power (Singh 2016).
On the other hand, Ozturk et al. (2017) compared the renewable energy and biomass potential by showing the future energy scenarios of Türkiye and Malaysia. By evaluating the biomass presence of the countries, their possible contributions to the economy of both countries were evaluated (Ozturk et al. 2017). Toklu (2017) examined Türkiye's biomass potential for different sources in 2010. The study stated that the total biomass energy potential of Türkiye is approximately 33 Mtoe, while the usable biomass potential is approximately 17 Mtoe (Toklu 2017). Bilandzija et al. (2018) examined Croatia's agricultural waste's biomass and energy potential. Biomass potentials of 3050.3 t, 1441.8 t, and 733.68 t were determined for different scenarios. The study calculated that 51.14 PJ, 24.06 PJ, and 12.18 PJ energy potentials could be obtained depending on this potential (Bilandzija et al. 2018). Ma et al. (2018) used the artificial neural network method to model and predict the production and consumption values of biomass and hydroelectric energy resources in the USA. In the study, data between 2009 and 2016, as well as LSTM and RNN estimation algorithms, were used (Ma et al. 2018). Avcıoğlu et al. (2019) investigated agricultural wastes' biomass and energy potential in Türkiye. The total amount of biomass obtainable from field and garden plant wastes was calculated as 9432 kt and 15,652 kt, respectively. The theoretical energy potential based on the amount of biomass was estimated as 908,119 Terajoule (TJ) from field crops and 90,354 TJ from horticultural crops, respectively (Avcıoğlu et al. 2019). Moustakas et al. (2020) investigated the biomass potential of the Thessaly Region, Greece. In the study, the biomass potential from agricultural waste is approximately 707,164 tons/year, as per the data. It has been stated that with the potential of biomass, a maximum of 619 GWh of electricity and 895 GWh of heat can be obtained per year (Moustakas et al. 2020). Knápek et al. (2020) examined the biomass potential of the Czech Republic for different scenarios. The GIS (geographic information system) method was used to determine the potential, and it was stated that the biomass potential would increase by 35% in energy production with the planning of the arable land. Considering the year 2040, it was stated that the biomass potential could increase by 42 PJ in total (Knápek et al. 2020). Samadi et al. (2020) estimated Iran's energy production by gasification technology of agricultural waste. It is stated that the total energy obtained from agricultural residues by gasification is 341.29 TJ, and the amount of electricity and heat is 66,075 and 399,112 TJ, respectively (Samadi et al. 2020). Tumen Ozdil and Caliskan (2022) determined the biomass potential and the associated reproducible electrical energy from agricultural wastes in Türkiye between 2008 and 2018. The total theoretical average biomass potential obtained from its plants was calculated as 522,875 kt and 51,359 kt, respectively. The potential of electricity produced from agricultural field crops is 994 × 109 kWh (Tumen Ozdil and Caliskan 2022). Senocak and Guner (2022) estimated the amount of animal and agricultural waste expected for the coming years and the energy potential for the Acıpayam district of Denizli, Türkiye, using artificial intelligence. In addition, spatial analyses and different scenario evaluations were carried out using GIS (Senocak and Guner 2022).
Due to climate and land conditions, Türkiye has an essential geographical position regarding energy crop cultivation. Although it has a high potential in terms of biomass resources potential and the energy that can be obtained from these resources, the desired levels have not been reached in terms of installed power in the country (Pence et al. 2023; Şenol et al. 2030; Avcıoğlu et al. 2019). Determining the amount and distribution of biomass resources as accurately as possible is crucial in making strategic decisions such as energy management policies (Tumen Ozdil and Caliskan 2022; Toklu 2017; Avcıoğlu et al. 2019). According to the literature studies, generally known basic statistical approaches were used to estimate the renewable energy potential in Türkiye. Adopting a variety of scenario approaches that also take into account uncertain factors can lead to better results. In addition, reliable, analytical, and flexible estimation methods are needed to determine the energy potential of biomass. By adopting such a systematic, integrated approach, energy management decisions can be made more effectively. Unlike the literature, this study proposes a decision support method that enables estimating the energy potential of the resources produced from biomass.
A machine learning algorithm is a subset of artificial intelligence that learns from data to improve performance on a given task without being explicitly programmed. A machine can learn from experience and improve its performance over time by recognizing patterns and making decisions based on them. Machine learning can be classified into supervised, unsupervised, semi-supervised, and reinforcement learning. In supervised learning for regression, a model is trained to predict a continuous numerical output based on a set of input features. In supervised learning, the model is provided with training examples that include both the input features and the corresponding target values, and the goal is to build a mapping function that can accurately predict the target value based on new, unseen input values (Goodfellow et al. 2016).
Image and speech recognition, natural language processing, recommendation systems, and predictive modeling are just a few of the many uses for machine learning algorithms. They are incredibly well suited to manual tasks that are too difficult or time-consuming for humans to complete. Machine learning can be used to develop models that can accurately predict the energy equivalent of the theoretically calculated biomass potential using the waste rate, production amount, average moisture of each crop, sub-calorific value, and percentage availability of each field and orchard crop. These machine learning models can be trained on large datasets of various climate data, agricultural land, and the corresponding biomass potential data, allowing them to learn the relationships between them. The models can save time and resources for direct measurements or calculations by being trained to predict biomass potential in new areas or at various times.
Türkiye is geographically located between the European and Asian continents. Due to its location, the country has climate transitions, different land structures, and diverse agricultural products. No study has been found in the literature that includes long-term and up-to-date data for countries with such characteristics. When the literature is examined, it is noted that there are few studies on agricultural biomass. Generally, only theoretical biomass calculations have been made in the literature, and other factors affecting it have not been examined. This study tried to close this gap in the literature by incorporating climate data, the effects of which have not been examined before, into machine learning models for agricultural biomass potential estimation.
Regression models are used to predict the value of a dependent variable. It is essential to use these algorithms to predict the value of a dependent variable and explain the relationship between variables. It was assumed that the amount of agricultural biomass potential depends on independent variables such as climate data, and modeling with regression algorithms was preferred for this purpose. Among the regression models, Random Forest, K-Nearest Neighbors (KNN), and Gradient Boosting algorithms, known as state-of-the-art in the literature, are widely used. The eXtreme Gradient Boosting Regressor (XGBR), which has become popular with its successful results in recent years, also stands out.
In this study, the agricultural biomass potential of each province in Türkiye for the years 2010–2021 was modeled using various climate data and agricultural land. Random Forest, KNN, Gradient Boosting, and XGBR, popular machine learning algorithms in the literature, were used for modeling.
The novelty and contribution are: (1) Agricultural biomass potential has been modeled with high success by machine learning methods using climate data; (2) with the feature selection, the prediction success was increased by using only the agricultural area, temperature, biomass type, and humidity; and (3) considering many years in terms of agricultural biomass, a model has been established for Türkiye.
This study consists of four parts. The first part examined literature studies in the world and Türkiye. In the second part, the mathematical calculation of the theoretical biomass potential and the methods to be used in estimation have been explained in detail. The findings obtained from the mathematical method and estimation were compared in the third part. Finally, in the fourth part, the results and recommendations were given.

Materials and methods

Türkiye is geographically located between 36° and 42° north latitudes and 26°–45° east meridians. It is located at an important point connecting the continents of Europe and Asia like a bridge. The surface area of Türkiye is known as 814,578 km2. Türkiye's population reached 84.5 million in 2022 (Pence et al. 2023). Mediterranean, Black Sea, and continental climate types are seen in Türkiye. The summer months are hot and dry, and the winter months are mild and rainy. Mediterranean climate is seen on the Aegean and Mediterranean coasts. In the Black Sea climate, it is rainy in all seasons. The continental climate is observed in the inner parts of Türkiye. Wheat, barley, and corn are the most grown products in Türkiye. In addition, products such as cotton, flax, sesame, and poppy, which have high economic returns, have been grown for a long time. Soybeans are grown in the Mediterranean region. A wide variety of fruits are grown in many parts of Türkiye. Therefore, various agricultural residues are a source of biomass energy (Can 2022; Tumen Ozdil and Caliskan 2022; Şenol et al. 2030). In Türkiye, in the second half of 2022, cereal and other herbal products will increase by approximately 14% compared to 2021, while fruits, beverages, and spice plants will increase by 3%. Production amounts 2022 were approximately 70 million tons in cereals and other herbal products, 32 million tons in vegetables, and 26 million tons in fruits, beverages, and spice plants. Cereal production volumes increased by 21% in 2022 compared to 2021. When the past 2 years are compared, wheat production increased by 12% to 19.8 million tons, and corn production increased by 23% to 8.3 million tons. In addition, barley production increased by 48% to 8.5 million tons, rye production increased by 37% to 273 thousand tons, and oat production increased by 32% to 365 thousand tons. The production of fruits, beverages, and spice plants increased by 3.8% in 2022 compared to the previous year and became approximately 25.8 million tons. Compared to 2021, fruit products increased by 5% in apples, 13% in grapes, 12% in peaches and nectarines, 8% in plums, 8% in strawberries, and 71% in olives. However, there was a decrease of 17% in tangerines, 31% in oranges, and 32% in lemons. There was an increase of 11.8% in hazelnuts, 99% in pistachios, 9% in figs, and 13% in bananas. Bean production decreased by 11.5% to 270 thousand tons, soybean production decreased by 15% to 155 thousand tons, and sunflower production increased by 5% to approximately 2.6 million tons. Tobacco production increased by 15% to 82 thousand tons, and sugar beet production increased by 7% to 19 million tons (TUIK 2022).

Data collection and theoretical calculation of biomass potential energy

In this study, the production amount of different field and garden products used for biomass production between 2010 and 2021 was taken from the Turkish Statistical Institute (TUIK). It was examined in two categories: 15 different plants from field products and 16 different plants from garden products. The total production amounts of the field and garden products used in the study, selected as an example, in Türkiye for the past 5 years are given in Table 1 (TUIK 2022).
Table 1
Production amount of field and garden products in 2017–2021 (tons/year)
Agricultural crops
2017
2018
2019
2020
2021
Wheat
21,500,000
20,000,000
19,000,000
20,500,000
17,650,000
Maize
5,900,000
5,700,000
6,000,000
6,500,000
6,750,000
Rye
320,000
320,000
310,000
295,681
200,000
Beans
239,000
220,000
225,000
279,518
305,000
Soybean
140,000
140,000
150,000
155,225
182,000
Groundnut
165,330
173,835
169,328
215,927
234,167
Canola or rapeseed seed
60,000
125,000
180,000
121,542
140,000
Sunflower
1,964,385
1,949,229
2,100,000
2,067,004
2,415,000
Rice
900,000
940,000
1,000,000
980,000
1,000,000
Sugar beet
21,150,900
17,439,087
18,056,661
23,028,285
17,768,837
Tobacco
93,666
75,275
68,223
79,081
71,497
Cotton
2,450,000
2,570,000
2,200,000
1,773,646
2,250,000
Barley
7,100,000
7,000,000
7,600,000
8,300,000
5,750,000
Grapes
4,200,000
3,933,000
4,100,000
4,208,908
3,670,000
Banana
369,009
498,888
548,323
728,133
883,455
Fig
305,689
306,499
310,000
320,000
320,000
Grapefruit
260,000
250,000
249,185
238,012
249,000
Lemon
1,007,133
1,100,000
950,000
1,188,517
1,550,000
Oranges
1,950,000
1,900,000
1,700,000
1,333,975
1,742,000
Mandarin
1,550,469
1,650,000
1,400,000
1,585,629
1,819,000
Apples
3,032,164
3,625,960
3,618,752
4,300,486
4,493,264
Pear
503,004
519,451
530,723
545,569
530,349
Apricot
985,000
750,000
846,606
833,398
800,000
Cherry
627,132
639,564
664,224
724,944
689,834
Peach
771,459
789,457
830,577
892,048
891,857
Almond
90,000
100,000
150,000
159,187
178,000
Hazelnut
675,000
515,000
776,046
665,000
684,000
Pistachio
78,000
240,000
85,000
296,376
119,355
Olive
2,100,000
1,500,467
1,525,000
1,316,626
1,738,680
Different fields and plant species' necessary structural and physical properties were determined to obtain energy from biomass residues. These properties are residue-product ratio, residual moisture, and energy value. According to the information obtained from the literature, the amount of product, residual moisture content, product residue ratio, bottom heating value, and usability rates for field and horticultural crops accepted in the mathematical calculation are given in Table 2 (Tumen Ozdil and Caliskan 2022; Avcıoğlu et al. 2019).
Table 2
Parameters used for field crops and horticultural crops calculations
Agricultural crops
Residue types
Moisture (M) (%)
Ratio of product residue (RPR)
Lower heating value (LHV) (MJ/kg)
Availability (A) (%)
Min
Max
Avg
Min
Max
Avg
Min
Max
Avg
Wheat
Straws
10
15
13
0.5
1.75
1.13
13.9
19.5
16.7
15
Maize
Stalks
15
17
16
1.5
2.25
1.88
15.5
18.5
17
60
 
Cobs
7
9
8
0.27
0.86
0.57
12.6
18.4
15.5
60
Rye
Straws
15
15
15
0.99
0.99
0.99
17.4
17.4
17.4
15
Beans
Stems-leaves
5
5
5
1.4
1.5
1.45
14.7
14.7
14.7
15
Soybean
Straws
15
15
15
0.76
3.5
2.13
14.9
19.4
17.2
60
Groundnut
Shells
8
8
8
0.2
0.52
0.36
11.2
16.9
14.1
80
 
Straws-haulms
15
15
15
2.1
2.3
2.2
14.4
15.2
14.8
80
Canola or Rapeseed seed
Stalks
25
25
25
1.6
1.8
1.7
17.1
17.1
17.1
15
Sunflower
Stems-leaves
14
40
27
0.7
3.5
2.1
13.2
17.9
15.6
60
Rice
Straws
10
25
18
0.45
1.75
1.1
8.8
16
12.4
60
 
Husks
10
13
12
0.2
0.27
0.24
12.9
19.9
16.4
80
Sugar beet
Leaves
75
75
75
0.12
0.14
0.13
15.5
17.7
16.6
15
Tobacco
Stems
85
85
85
2.27
2.27
2.27
16.1
16.1
16.1
60
Cotton
Stalks
6
12
9
1.1
3.5
2.3
14.6
18.2
16.4
60
Barley
Straws
11
15
13
1.08
1.36
1.22
17.5
19.5
18.5
15
Grapes
Pruning
40
50
45
0.39
0.45
0.42
16.8
19.2
18
80
Banana
Stalk-peels
85
85
85
2
2
2
13.1
13.1
13.1
80
Fig
Pruning
55
55
55
0.21
0.21
0.21
18
18.4
18.2
80
Grapefruit
Pruning
40
40
40
0.11
0.11
0.11
17.6
17.6
17.6
80
Lemon
Pruning
35
45
40
0.19
0.4
0.3
17.6
17.6
17.6
80
Oranges
Pruning
35
45
40
0.2
0.5
0.35
17.6
18.5
18.1
80
Mandarin
Pruning
35
45
40
0.17
0.4
0.29
17.6
17.6
17.6
80
Apples
Pruning
40
40
40
0.19
0.19
0.19
17.8
17.8
17.8
80
Pear
Pruning
35
40
38
0.14
0.3
0.22
18
18.4
18.2
80
Apricot
Pruning
40
40
40
0.19
0.19
0.19
19.3
20.8
20
80
Cherry
Pruning
40
40
40
0.19
0.19
0.19
21.7
21.7
21.7
80
Peach
Pruning
35
45
40
0.3
0.5
0.4
18
18.4
18.2
80
Almond
Pruning
35
40
38
0.6
0.61
0.6
18
18.4
18.2
80
Hazelnut
Pruning
40
40
40
3.34
3.34
3.34
19
19
19
80
Pistachio
Pruning
35
35
35
0.4
0.48
0.44
18
19
18.5
80
Olive
Pruning
35
45
40
1.14
1.25
1.2
18.1
18.8
18.5
50
Considering the residue types (stalk, stem-leaf, straw, bark, pruning, etc.) belonging to field and garden products and their relative moisture content, residue production rates, and sub-calorific values, separate usable energy potential values were calculated for each of the 81 provinces in Türkiye. Knowing the areas where the crops are grown, the type and characteristics of residues, and their energy capacity are essential in choosing the location of possible biomass power plants to be established and the sustainability of energy supply. Unlike the studies conducted in Türkiye and the world, in this study, modeling of agricultural biomass potential was carried out using machine learning algorithms, depending on the climate data that are not available in the literature.
Theoretical biomass energy potential has been analyzed with a similar formulation (Eqs. 13) in the literature (Guler et al. 2022; Tumen Ozdil and Caliskan 2022; Avcıoğlu et al. 2019; Singh 2015). This calculation method in the literature was used in this study.
Theoretical biomass potential (TBP) was calculated from Eq. (1), theoretical energy potential (TEP) from Eq. (2), and available energy potential (AEP) using Eq. (3). In the equations, CP(i) is the annual yield (tons); RPR(i) is product residue ratio (%); M(i) is moisture content (%); LHV(i) indicates the lower heating value (MJ/kg); and A(i) is availability. Availability is considered in the part converted into energy in the final product. Availability refers to the percentage of theoretical energy produced from waste facilities. The product residue ratio, moisture content, heating value, and usability percentage for each field and garden product included in the equations are taken from Table 2 (Tumen Ozdil and Caliskan 2022; Singh 2015; Riva et al. 2014; Hiloidhari and Baruah 2011). Theoretical energy potential mapping of field and horticultural crops is shown in Fig. 1.
$${\text{TBP}} = \mathop \sum \limits_{i = 1}^{n} CP_{\left( i \right)} \times {\text{RPR}}_{\left( i \right)} \times \left[ {\frac{{100 - M_{\left( i \right)} }}{100}} \right]$$
(1)
$${\text{TEP}} = \mathop \sum \limits_{i = 1}^{n} {\text{TBP}}_{\left( i \right)} \times {\text{LHV}}_{\left( i \right)}$$
(2)
$${\text{AEP}} = \mathop \sum \limits_{i = 1}^{n} T{\text{EP}}_{\left( i \right)} \times A_{\left( i \right)}$$
(3)

Machine learning processes

The theoretical calculation of the energy equivalent of the biomass potential is a complicated process because it needs specific information about each product in the vegetable and grain category, and this information is difficult to obtain for different geographies. Therefore, determining the energy equivalent of biomass potential will be easier to calculate using more readily available climate data for geographic locations rather than waste rate, amount of production, average moisture of each crop, sub-calorific value, and percentage availability of each field and orchard crop. Machine learning algorithms were used to establish such a model. While creating the model, there are some essential processes, from the data preprocessing stage to the prediction output at the end of the training and testing of the model. The flowchart of the processes carried out in this study is shown in Fig. 2.
For 2010–2021 in Türkiye, the energy equivalent of the biomass potential of the products in each vegetable and grain category in all provinces was theoretically calculated, and a dataset was created. While the energy equivalents of the biomass potential constitute the target value, the input values are agricultural area, biomass type (vegetable or grain), year information, temperature, maximum temperature, minimum temperature, humidity, wind speed, precipitation, soil temperature at 5 cm, and sunshine duration. Climatic data were obtained from the Turkish State Meteorological Service (TSMS. Turkish State Meteorological Service 2022). The dataset contains a total of 1944 samples, including 972 vegetables and 972 grains, from 81 provinces for 12 years. In total, 1782 are from 2010 to 2020, and 162 are from 2021. Data specification provides an overview of the dataset, including its source, format, variables, quality, and usage. Table 3 provides the data specification of the dataset used in this study.
Table 3
The data specification of the dataset used in this study
Parameter
Description
Data source
Agricultural data from TUIK
Meteorological data from the Turkish State Meteorological Service
Data format
Excel file
Temporal resolution
 
 Yearly averages
Data have been aggregated into yearly averages
 Date range
From 2010 to 2021
Data size
[1944 × 12]
Target variable
Agricultural biomass potential (Terajoule)
Input variables
 
 Agricultural area
Area of land dedicated to agriculture (hectares)
 Biomass type
Type of biomass (vegetable or grain)
 Year
Year of the observation
 Temperature
Temperature in degrees Celsius
 Maximum temperature
Maximum temperature in degrees Celsius
 Minimum temperature
Minimum temperature in degrees Celsius
 Humidity
Humidity in percentage
 Wind speed
Wind speed in meters per second
 Precipitation
Precipitation in millimeters
 Soil temperature at 5 cm
Soil temperature at 5-cm depth in degrees Celsius
 Sunshine duration
Duration of sunshine in hours
Data quality
The dataset has been processed to handle missing values and data in a different range. The null data have been changed to 0 and normalized to the dataset
Usage
The dataset can be used to predict the agricultural biomass potential based on the input variables such as agricultural area, biomass type, meteorological factors, and year information

Preprocessing the data

Cleaning, transforming, and normalizing the data may be necessary to ensure it is in a suitable format before training a machine learning model. Tasks like scaling the data to a standard range or filling in missing values may be involved. Due to geographical conditions and other reasons, some products in the category of vegetables or grains are not grown in some provinces. Therefore, the generated dataset contains some null values. In the preprocessing stage, the null values in the dataset were changed to 0. Then, normalization was performed since each input value contains data in a different range. By converting a variable's values into standard scores, also called z-scores, the variable's value can be normalized using the z-score normalization technique. The mean of each variable is subtracted from it, and the result is then divided by the variable's standard deviation to accomplish this. Z-scores are the names given to the resulting values. Z-score normalization allows for comparing various variables for feature selection because the input variables are converted to the same scale.

Validation

The data should be divided into training sets and test sets after preprocessing. The training set will be used to train the machine learning model, while the test set will be used to evaluate its performance. It is important to evaluate the machine learning model's performance on a separate dataset not used for training to assess its generalizability and avoid overfitting. This can be done using various techniques, such as k-fold cross-validation or time-series datasets. In this study, while tenfold cross-validation was used for 2010–2020, only 2021 was used as test data in the form of time series.
A technique for assessing the effectiveness of a machine learning model is tenfold cross-validation. It entails dividing the dataset into tenfold and testing the model on the remaining fold after training it on 9 of them. A different fold is used as the test set for each ten times this process is repeated. The model's performance is then estimated using the average performance across all ten iterations. Since each observation is only used once during testing, tenfold cross-validation enables the model's performance to be assessed on a larger portion of the dataset. Because the model is tested over a broader range of data, this can provide a more accurate estimate of the model's performance. In this study, by selecting the random_state = 42 parameter in the Scikit-learn library, the same test data were used for all validations in all machine learning models.

Feature selection

A subset of pertinent features is chosen from a more extensive set of dataset features as part of the feature selection process. It can enhance the model's performance, interpretability, complexity, and capacity for generalization. The model's complexity can be decreased. It can also help reduce the risk of incorporating noise or irrelevant features into the model, which can negatively impact its performance. The model can provide a more comfortable future prediction and better generalize to new data by focusing only on the most essential features. Manual selection, wrapper methods, and filter methods are some ways that feature selection can be carried out. A family of supervised feature selection techniques known as wrapper methods evaluates various subsets of features before deciding on the best one using a particular machine learning algorithm scoring (Géron 2019). This study prefers Stochastic Gradient Descent (SGD) as a feature scorer. SGD uses a linear function to minimize a selected loss function. Using least-squares fitted, mean squared error is used as a loss function. The algorithm approximates the true gradient by considering one sample at a time while updating the model based on the loss function's gradient. The learning rate of SGD was determined to be 0.01, and the number of iterations was determined to be 100. When the scores of SGD are ordered from high to low, it is seen that the importance degrees of the features are shown in Fig. 3.
As shown in Fig. 3, the highest score value belongs to the agricultural area attribute. After the preliminary study, it was decided that choosing the four attributes with the highest score would be more appropriate, namely, agricultural area, temperature, biomass type, and humidity. Fifth- and sixth-row attributes are related to temperature, and the property has already been selected in this field. In addition, it was observed that the regression success decreased when more than four features were selected.

Machine learning methods

Several types of algorithms can be used for supervised learning for regression. This study has used Random Forest, KNN, Gradient Boosting, and XGBR of these algorithms. These methods are trendy in the literature, and algorithms have proven to solve many problems. XGBR, in particular, has gained widespread popularity in the machine learning community due to its high performance and ease of use. It is often used in machine learning competitions and as a base model for solving real-world problems.
Random forest
Random Forest regression, which can be used to predict a continuous outcome variable based on one or more input features, was proposed by Breiman (Breiman 2001). It is an ensemble method that combines the predictions of multiple individual models to make a final prediction. Decision trees make up each model in a Random Forest regression model. Decision trees are tree-like models that base their predictions on various choices regarding the input feature values. The final prediction is made at the tree's leaf nodes, where a branching point represents each decision. Using the powerful and widely used Random Forest regression method, one can accomplish various regression tasks, including predicting prices, temperatures, and quantities.
K-nearest neighbors
The KNN machine learning algorithm is straightforward but effective for classification and regression tasks. It can be used to predict an instance based on its nearest neighbors in the feature space because it is predicated on the idea that similar instances typically have similar outcomes. The number of nearest neighbors used to make the prediction is represented by the value k in the KNN algorithm. The prediction can be based on the k-nearest neighbors' average values for regression tasks. KNN does not make any assumptions about the underlying data distribution because it is a non-parametric method. Each prediction necessitates calculating the distances between each pair of dataset instances. It may need data normalization or standardization and may be sensitive to the input feature scale (Géron 2019).
Gradient boosting
Regression tasks that predict a continuous outcome variable based on one or more input features can be performed using the Gradient Boosting machine learning technique. It is an ensemble method that combines the predictions of numerous distinct models to produce a final prediction. The individual models in a Gradient Boosting regression model are decision trees. Decision trees are tree-like models that base their predictions on various choices regarding the input feature values. The individual decision trees are trained sequentially, with each tree attempting to correct the mistakes made by the previous tree, which is the main distinction between Gradient Boosting and other ensemble methods, such as Random Forest. The difference between the predicted and actual values of the outcome variable is measured by fitting the tree to the loss function's negative gradient (Friedman 2001).
Extreme gradient boosting
EXtreme Gradient Boosting (XGBoost), proposed by Chen in 2016, provides a fast and efficient implementation of the Gradient Boosting algorithm for machine learning (Chen and Guestrin 2016). A decision tree is created for each weak learner using the tree-based ensemble method XGBoost to correct errors made by earlier trees. A weighted combination of each tree makes up the final model. During the XGBoost tree's training phase, the split point is determined using an information gain-based greedy algorithm. When optimizing the objective function, a second-order Taylor expansion is used. A regular term is added to simplify the spanning tree's complexity and lessen overfitting. Large-scale problems are exceptionally well suited to the XGBR method, which is superior in speed, efficiency, and scalability when used for regression tasks.

Model evaluation

The model should be tested on the test set after it has been trained to see how well it performs and how generalizable it is. Model evaluation metrics can be used for this. In this study, the root-mean-squared error (RMSE), mean absolute error (MAE), and the coefficient of determination (R2), which are the most used metrics for regression analysis, were used. The equations for these metrics are given in Eqs. (4)–(6), respectively (Hajabdollahi Ouderji et al. 2023). In Eqs. (4)–(6), Y is the target value, \(\widehat{Y}\) is the predicted value, \(\overline{Y }\) is the mean of the target value, and n is the number of samples.
$${\text{RMSE}} = \sqrt {\frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {Y_{i} - \hat{Y}_{i} } \right)^{2} }$$
(4)
$${\text{MAE}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left| {Y_{i} - \hat{Y}_{i} } \right|$$
(5)
$$R^{2} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Y_{i} - \hat{Y}_{i} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Y_{i} - \overline{Y}_{i} } \right)^{2} }}$$
(6)
The effectiveness of machine learning algorithms is also assessed using statistical tests. The statistical significance of the relationships between our predictions and the actual values can be assessed this way. The Wilcoxon rank-sum test is non-parametric. Non-parametric statistical tests belong to a continuous distribution of samples, which is the main distinction between parametric and non-parametric statistics (Dao 2022). Wilcoxon rank-sum test is used to verify the difference between the model output value and the real value. The null hypothesis is that there is no significant difference in the predictions of the model and target values. If the p-value of the model predictions is less than 0.05, this is considered significantly different, and the null hypothesis is rejected; otherwise, the hypothesis is accepted.

Analysis of agricultural biomass potential in Türkiye

In this study, the agricultural biomass potential of each province in Türkiye for the years 2010–2021 was modeled using machine learning algorithms. First, a tenfold cross-validation analysis was performed on the dataset for 2010–2020, and then, the prediction for 2021 was made. For the training of machine learning algorithms, analyses were carried out using all features (the type of biomass, year, agricultural area, temperature, maximum temperature, minimum temperature, humidity, wind speed, precipitation, soil temperature at 5 cm, and sunshine duration) and only four features (the type of biomass, agricultural area, temperature, and humidity) determined during the feature selection stage. Python programming language (Python 3.8) implements machine learning algorithms and data analysis. As one of the machine learning libraries, the Scikit-learn library was preferred because it has a rich environment and is widely used in the data science community. Different values have been tried for hyperparameters for machine learning algorithms: the minimum sample on the leaf for Random Forest and the number of neighbors for KNN. All other parameters are selected as default settings in the relevant libraries.

Analysis of agricultural biomass potential in Türkiye for the years 2010–2020

The dataset created for the years 2010–2020 includes a total of 1782 samples in vegetable and grain biomass types. Each sample contains the amount of agricultural biomass potential as a target value against the climate data of a city in the relevant year. Tenfold cross-validation was used to validate the modeling success of machine learning algorithms on the dataset, and the performance scores obtained are shown in Table 4.
Table 4
Modeling of agricultural biomass potential for 2010–2020
Number of features
Algorithm
Param
Train
Test
RMSE
MAE
R2
RMSE
MAE
R2
Wilcoxon rank
h
p
11
Random Forest
leaf = 1
444,547
192,974
0.9903
1,175,017
519,010
0.9328
+
0.064
  
leaf = 3
690,184
286,129
0.9767
1,273,628
546,327
0.9210
0.043
 
KNN
k = 2
705,126
307,614
0.9757
1,287,123
602,813
0.9193
+
0.053
  
k = 5
1,085,099
524,093
0.9423
1,396,914
679,348
0.9050
0.000
 
Gradient Boosting
Default
838,456
509,298
0.9657
1,405,078
738,112
0.9038
0.004
 
XGBR
Default
67,471
44,555
0.9998
1,270,879
575,100
0.9213
+
0.288
4
Random Forest
leaf = 1
467,307
204,352
0.9893
1,273,104
552,269
0.9211
+
0.057
  
leaf = 3
797,415
344,105
0.9689
1,342,716
584,232
0.9122
0.043
 
KNN
k = 2
725,775
323,909
0.9742
1,314,307
616,668
0.9159
+
0.455
  
k = 5
998,293
486,228
0.9513
1,250,935
611,301
0.9238
0.028
 
Gradient Boosting
Default
946,409
554,566
0.9563
1,482,701
758,590
0.8929
0.009
 
XGBR
Default
163,441
100,465
0.9987
1,461,690
654,355
0.8959
+
0.304
In Table 4, "Param" represents the parameters, leaf represents the minimum sample on the leaf, k is the number of neighbor parameters, h is the statistical test result (“+”: accept and “−”: reject), the p-value of a test is the probability that the null hypothesis is true, bold values represent the best performance results for the Test set for each number of features, and "Default" represents the default parameters on the Scikit-learn library.
As shown in Table 4, the best performance values are obtained for Random Forest (leaf = 1) by using all the features. When Wilcoxon rank-sum test results are examined, it is seen that there is no significant difference between the results of this model and the actual values at the significance level of p = 0.05. On the other hand, when the results obtained when only four features are used are examined, it is seen that the Random Forest (leaf = 1) and KNN (k = 5) methods give the best results with R2 values of 0.9211 and 0.9238, respectively. However, when the Wilcoxon rank-sum test results of the KNN (k = 5) method are examined, the null hypothesis that there is no significant difference between the results of this model and the actual values is rejected at the significance level of p = 0.05. Accordingly, it provides strong evidence for the alternative hypothesis, revealing a significant difference between the results of the model and the actual values. Therefore, it is concluded that the Random Forest algorithm is more successful and reliable for cross-validation analysis. The comparison of the predictions of the Random Forest algorithm (leaf = 1), which gave the most successful result in the tenfold cross-validation analysis, with the theoretical calculations is shown in Fig. 4.
In Fig. 4, the sample represents the record of each province for each year between 2010 and 2020, while the value represents the agricultural biomass potential of the relevant record. Random Forest predictions are consistent with the original curve, and the results are similar in the box plot given as an inset plot.

Analysis of agricultural biomass potential in Türkiye for the year 2021

The dataset created for the year 2021 includes a total of 162 samples in vegetable and grain biomass types. The target value for each sample is the agricultural biomass potential associated with a city's climate data for the year 2021. Validation for this dataset is calculated by considering the data predictions for 2021 by the machine learning algorithms trained with the previous years. The performance scores obtained for the 2021 predictions of the algorithms are shown in Table 5. Bold values represent the best performance results for the Test set for each number of features in Table 5.
Table 5
Modeling of agricultural biomass potential for 2021
Number of features
Algorithm
Param
Train
Test
RMSE
MAE
R2
RMSE
MAE
R2
Wilcoxon rank
h
p
11
Random Forest
leaf = 1
405,619
182,453
0.9920
1,497,720
702,730
0.9206
+
0.320
  
leaf = 3
652,995
271,316
0.9792
1,665,375
751,171
0.9019
+
0.341
 
KNN
k = 2
695,129
303,128
0.9765
1,329,730
563,796
0.9374
+
0.284
  
k = 5
1,028,640
504,380
0.9485
1,545,439
765,054
0.9155
+
0.066
 
Gradient Boosting
Default
839,443
503,528
0.9657
1,498,785
821,458
0.9205
+
0.323
 
XGBR
Default
75,674
49,196
0.9997
1,194,713
637,291
0.9495
+
0.510
4
Random Forest
leaf = 1
461,767
199,767
0.9896
1,265,119
631,247
0.9434
+
0.573
  
leaf = 3
774,460
336,041
0.9708
1,519,927
709,140
0.9182
+
0.540
 
KNN
k = 2
723,121
319,897
0.9745
1,115,222
585,629
0.9560
+
0.769
  
k = 5
987,096
483,010
0.9525
1,270,772
663,559
0.9429
+
0.421
 
Gradient Boosting
Default
967,671
558,748
0.9544
1,776,675
881,727
0.8883
+
0.392
 
XGBR
Default
162,332
100,506
0.9987
1,492,456
718,437
0.9212
+
0.885
The most successful result was obtained as a 0.9560 R2 value with the KNN algorithm (k = 2) when four features were used. Feature selection has increased the success in making future predictions and enabled the making of these predictions with fewer features. In addition, the XGBR method obtained a 0.9495 R2 value when all features were used, while Random Forest (leaf = 1) obtained a 0.9434 R2 value when four features were used. When Wilcoxon rank-sum test results are examined, it is seen that there was no significant difference at the significance level of p = 0.05 between the results of all methods and the actual values when both all features and four features were used. The comparison of the predictions of the KNN algorithm (k = 2 and feature = 4), which gave the most successful result in the year 2021 analysis, with the theoretical calculations is shown in Fig. 5.
As shown in Fig. 5, the KNN predictions are consistent with the original curve, and residual errors appear relatively low except for a few samples. Also, the box plots for the model's results and actual values are similar. The box plot was given as an inset plot. Although the KNN method seems to be more successful for the prediction of 2021, it is seen that the Random Forest (leaf = 1) is generally more stable and prosperous when examined both in cross-validation analysis and for the predictions of 2021. The comparison of the predictions of the Random Forest algorithm (leaf = 1 and feature = 4), which gave the most successful result on average in 2021, and cross-validation analysis with the theoretical calculations is shown in Fig. 6.
As shown in Fig. 6, Random Forest predictions are consistent with the original curve, and the results are similar in the box plot given as an inset plot.

Discussion

Regarding the results obtained, the original data and KNN prediction values for 2021 were obtained as a total of 308,888 TJ and 327,122 TJ for field products and a total of 77,002 TJ and 74,395 TJ for garden products, respectively.
For field products, the minimum value of the original data for 2021 was 29 TJ, and the KNN estimate was 39 TJ, while the maximum value was 39,223 TJ in the original data and 39,363 TJ in the prediction.
For garden products, the minimum value of the original data for 2021 was 0.8 TJ, and the KNN estimate was 0.5 TJ, while the maximum value was 8079 TJ in the original data and 6908 TJ in the prediction.
In addition, regarding average values, the original values for 2021 field products are 3813 TJ, the KNN prediction is 4038 TJ, and the margin of error is 5.9%. The average original value for garden products was 950 TJ, the KNN prediction was 918 TJ, and the margin of error was 3.3%. Accordingly, it can be seen that the difference between the predicted values and the original values is slight.
Like this study, Avcıoğlu et al. (2019) obtained the agricultural biomass potential for Türkiye as 298,955 TJ for total field crops and 65,491 TJ for horticultural crops for 2015 (Avcıoğlu et al. 2019). When the original 2021 data calculated in this study was compared with the 2015 data, it was seen that there was an increase of 3.3% for field crops and 17.5% for garden crops. Senocak and Guner Goren (2022) used 16 years of data from agricultural and animal biomass resources for a small selected region in Türkiye and estimated the amount of energy using SVR-GIS. The theoretical total energy in the 3 years was estimated as 27,088.095, 27,165.993, and 26,862.373 TEP/year, respectively (Senocak and Guner 2022). Tumen Ozdil and Caliskan (2022) calculated the theoretical energy potential of 2,825,932 × 1012 J from field crops and 752,031 × 1012 J from garden plants, using the biomass potential in Türkiye between 2008 and 2018 (Tumen Ozdil and Caliskan 2022).

Conclusions

Türkiye is a wealthy country in terms of agricultural diversity due to its geographical location and climatic conditions. Along with agricultural diversity, crop residues originating from fields and gardens, which can be a significant energy source, are formed. Agricultural residues are generally used as animal feed in Türkiye or directly incinerated for primary thermal heating. The agricultural sector in Türkiye has developed over the years using modern agricultural methods, and its importance has increased in recent years as the surrounding countries are at war. Many different support mechanisms are established for farmers in Türkiye to develop the sector. Some of them include seed support, fuel support, fertilizer support, energy support, tractor support, appropriate agricultural credit support, and irrigation support. With the sector's development, the amount of agricultural products has increased every year, so the amount of product residue, which is a source of biomass, has also increased. Agricultural residue management is critical in biomass energy production and use in terms of its features, such as an alternative energy source and environmental friendliness. This study calculated the energy production potential obtained from agricultural products for Türkiye between 2010 and 2021. The total theoretical energy potential obtained from field and garden products has been calculated as 222,620 TJ and 61,737 TJ for 2010, respectively, and 308,888 TJ and 77,002 TJ for 2021.
This study determined the most successful features as agricultural area, temperature, biomass type, and humidity with the SGD method. Different models and hyperparameters were tested with cross-validation analysis using four features for 2010–2020, and the best model for agricultural biomass potential was determined. In addition, the biomass potential prediction for 2021 was performed using 11 years of data. Based on the environmental and climate data covering 2010–2020, the agricultural biomass potential for 2021 was estimated using machine learning algorithms. When the results of the tenfold cross-validation and predictions for 2021 were examined, it was concluded that the Random Forest algorithm (leaf = 1) was successful and stable. Accordingly, the Random Forest algorithm (leaf = 1) obtained 0.9211 and 0.9434 R2 values for tenfold cross-validation and 2021 prediction using four features (agricultural area, temperature, biomass type, and humidity), respectively. In addition, when four features are used, the KNN algorithm (k = 2) has achieved an R2 value of 0.9560 only in the prediction of 2021. When all features are used, the Random Forest algorithm (leaf = 1) reached 0.9328 R2 in tenfold cross-validation analysis, while the XGBR method reached 0.9495 R2 in 2021 predictions.
The successful results have shown that machine learning methods can be used to model agricultural biomass potential. By using environmental and climatic values instead of too many variables from too many categories that need to be known for the theoretical calculation, it is provided to perform the calculation much more comfortably. In addition, thanks to the feature selection, these variables were reduced to four, making the calculation even easier with a high estimation rate.
With the proposed model, web and mobile applications that can estimate biomass potential can be created using agricultural field information and meteorological data published online. In this way, decision mechanisms can be developed regarding industrial production, updating the installed capacity of the biomass power plants in Türkiye and which regions they should focus on for agricultural waste to be collected. Such applications can also contribute to Türkiye's renewable energy policies and achieve its zero-emission target.
The model developed in this study was created only according to Türkiye's climatic parameters and estimated the agricultural biomass potential. The model can be recreated with data and parameters for another country or other biomass resources.
This study offers a detailed perspective on improving energy management practices in agriculture. Additionally, it is thought that machine learning algorithms will contribute to the literature for institutions and researchers that promote sustainable development in the sector. The efficient use of renewable energy resources in agriculture and artificial intelligence-supported energy efficiency management practices can help achieve sustainable agriculture goals.

Acknowledgements

The authors would like to thank the Turkish State Meteorological Service for providing meteorological data.

Declarations

Competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Literatur
Zurück zum Zitat Friedman JH (2001) Greedy function approximation : a gradient boosting machine. Ann Stat 29:1189–1232CrossRef Friedman JH (2001) Greedy function approximation : a gradient boosting machine. Ann Stat 29:1189–1232CrossRef
Zurück zum Zitat Géron A (2019) Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: concepts, tools, and techniques to build intelligent systems. O’Reilly Media, Sebastopol Géron A (2019) Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: concepts, tools, and techniques to build intelligent systems. O’Reilly Media, Sebastopol
Zurück zum Zitat Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, London Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, London
Zurück zum Zitat Hatunoğlu EE (2010) The Impacts of biofuel policies on Agriculture Sector. İktisadi Sektörler ve Koordinasyon Genel Müdürlüğü, Planning Expertise Thesis 2010; Ankara Hatunoğlu EE (2010) The Impacts of biofuel policies on Agriculture Sector. İktisadi Sektörler ve Koordinasyon Genel Müdürlüğü, Planning Expertise Thesis 2010; Ankara
Zurück zum Zitat Kumaş K, Akyüz AÖ, Temiz D, Güngör A (2019) Biomass to energy: the potential of biogas in Turkey and World. J Voc Sci 8:70–77 Kumaş K, Akyüz AÖ, Temiz D, Güngör A (2019) Biomass to energy: the potential of biogas in Turkey and World. J Voc Sci 8:70–77
Zurück zum Zitat Riva G, Foppapedretti E, Carolis C. Handbook on Renewable Energy Sources-Biomass. Ener Supply; 2014. Riva G, Foppapedretti E, Carolis C. Handbook on Renewable Energy Sources-Biomass. Ener Supply; 2014.
Metadaten
Titel
Detailed analysis of Türkiye's agricultural biomass-based energy potential with machine learning algorithms based on environmental and climatic conditions
verfasst von
I. Pence
K. Kumas
M. Siseci Cesmeli
A. Akyüz
Publikationsdatum
21.03.2024
Verlag
Springer Berlin Heidelberg
Erschienen in
Clean Technologies and Environmental Policy
Print ISSN: 1618-954X
Elektronische ISSN: 1618-9558
DOI
https://doi.org/10.1007/s10098-024-02822-1