Common methods used for topic modeling have generally suffered problems of overfitting, leading to diminished predictive performance, as well as a weakness towards reconstructing sparse topic structures that involve only a few critical words to …
With digitisation globally on the rise, corporates are compelled to better understand the usage of their websites. In doing so, corporates will be empowered to better understand consumers, and make necessary adjustments to ultimately improve the …
This study explores the application of data mining techniques to analyse factors influencing university choice and predict enrolment trends in Kazakhstan. For this purpose, methods of analysis (multiple correlation and regression analysis, factor …
Course recommendation (CD) is essential for success in a student’s educational journey. Due to the variations in student’s knowledge system, it might be difficult to select the course content from online educational platforms. This problem is …
A novel probability distribution, the Generalized Alpha Power Inverted Weibull (GAPIW) distribution, is derived from the generalization of the $$\alpha$$ α -power family and compounded with the inverted Weibull distribution. The researchers looked …
The xgamma distribution was first introduced by Sen et al. [1] as an alternative distribution to the exponential model. The xgamma distribution exhibits a bathtub-shaped hazard rate function, so it is suitable for many lifetime phenomena. In this …
Recognizing and reducing risk is a major part of Supply Chain Management (SCM). Several companies are invested in Supply Chain Risk Management (SCRM) and they have the knowledge about the procurement occupancies within their companies and take …
One of the key objectives of statistics is to provide a model compatible with the data generated by an unknown random process. Often, it happens that the unknown process is intractable, and no prior data or information associated with the unknown …
We live in a world where everything is connected to online social media platforms, and the person uses social media networks like Face book, Twitter, Instagram, Whatsapp, etc. In the present scenario, working women, celebrities, sports persons …
The advancement of technology has increased competitiveness, especially in the manufacturing industry. Alongside Statistical Process Control (SPC), capacity indices are tools used to measure the quality of processes and are useful for establishing …
Two known characteristics of the distribution of stock returns (price fluctuations) and, more recently, the distribution of financial asset volumes are power laws and scaling. These power laws can be viewed as the asymptotic behaviour of …
In light of the rapid technological advancements witnessed in recent decades, numerous disciplines have been inundated with voluminous datasets characterized by multimodality, heavy-tailed distributions, and prevalent missing information.
Data science often employs discrete probability distributions to model and analyze various phenomena. These distributions are particularly useful when dealing with data that can be categorized into distinct outcomes or events. This study presents …
The rise of mobile technology has significantly transformed numerous aspects of our everyday lives, especially within food delivery services. The investigation aims to explore the food delivery mobile apps (FDMA) satisfaction (SAT) and the …
This paper has investigated an empirical study to consider the impact of supply chain management on small scale integrated commercial agriculture by focusing on the moderator role of impediments and obligations to offer solutions for agricultural …
This paper introduces a Modified Lindley distribution using a convex combination of exponential and gamma distribution. The fundamental properties of the proposed distribution such as the shapes of the distribution, moments, mean, variance …
Hyperspectral image classification involves assigning pixels or regions within a hyperspectral image to specific classes or categories based on the spectral information captured across multiple bands. Traditional method faces several challenges …
This paper presents a novel nonlinear binary classification method, namely the kernel-free reduced quadratic surface support vector machine with 0-1 loss function and L $$_{p}$$ p -norm regularization (L $$_p$$ p -RQSSVM $$_{0/1}$$ 0 / 1 ). It …
Nature-inspired algorithms (NIA) are proven to be the potential tool for solving intricate optimization problems and aid in the development of better computational techniques. In recent years, these algorithms have raised considerable interest to …
Advancements in genome sequencing technologies have significantly increased the availability of genomic data. The use of machine learning models to predict the pathogenicity or clinical significance of genetic mutations is crucial. However …
Developing effective methodologies for territory design and relativity estimation is crucial in auto insurance rate filings and reviews. This study introduces a novel approach utilizing fuzzy clustering to enhance the design process of territories …
Carbon emissions disclosure (CED) has become a pivotal aspect of corporate sustainability efforts, reflecting a company’s commitment to environmental responsibility and accountability. This study delves into the complex connection between CED and …
Partial label learning (PLL) is a particular problem setting within weakly supervised learning. In PLL, each sample corresponds to a candidate label set in which only one label is true. However, in some practical application scenarios, the …
In this paper, a nonparametric kernel method is introduced to estimate the well-known overlapping coefficient, Matusita $$\rho (X,Y)$$ ρ ( X , Y ) , between two random variables $$X$$ X and $$Y$$ Y . Due to the complexity of finding the formula …
In this paper we first define the class of Generalized Inflated Power Series Distributions (GIPSDs) which contain the inflated discrete distributions most often seen in practice as special cases. We describe the hitherto unkown exponential family …
Agriculture is the primary source of food, fuel, and raw materials and is vital to any country’s economy. Farmers, the backbone of agriculture, primarily rely on instinct to determine what crops to plant in any given season. They are comfortable …
With the widespread use of social networks, detecting the topics discussed on these platforms has become a significant challenge. Current approaches primarily rely on frequent pattern mining or semantic relations, often neglecting the structure of …
In this article, we propose the quadratic rank transmutation map approach on shifted Lindley distribution to improve the existing distribution further. An additional skewness parameter $$\lambda $$ λ is incorporated to transmute the distribution.
Automated detection of plant diseases is crucial as it simplifies the task of monitoring large farms and identifies diseases at their early stages to mitigate further plant degradation. Besides the decline in plant health, reduced production …
In light of the escalating privacy risks in the big data era, this paper introduces an innovative model for the anonymization of big data streams, leveraging in-memory processing within the Spark framework. The approach is founded on the principle …
In the era of big data, with the increase in volume and complexity of data, the main challenge is how to use big data while preserving the privacy of users. This study was conducted with the aim of finding a solution to this challenge. In this …
In this paper, we propose a new model by adding an additional parameter to the baseline distributions for modeling claim and risk data used in actuarial and financial studies. The new model is called alpha power transformed exponential Poisson …
Alcohol's dehydrating effects can cause vocal cords to dry out, potentially causing temporary voice changes and increasing the risk of vocal strain or damage. Short-term changes in pitch, volume, and alcohol consumption can cause voice clarity …
Metric learning consists of designing adaptive distance functions that are well-suited to a specific dataset. Such tailored distance functions aim to deliver superior results compared to standard distance measures while performing machine learning …
The Inverse Rayleigh distribution has many applications in the area of reliability studies. It is regarded as a model for a lifetime random variable. It is essential to develop an efficient goodness-of-fit test for this distribution. In this …
The Medical Imaging Query Response System is among the most challenging concepts in the medical field. It requires a significant amount of effort to organize and comprehend the various representations of the human body. Additionally, the system …
In this work, we propose a novel hybrid method for the estimation of regression models, which is based on a combination of LASSO-type methods and smooth transition (STR) random forests. Tree-based regression models are known for their flexibility …
In recent years, generative artificial intelligence has been developing rapidly. In the image domain, image generation models based on deep learning have made remarkable achievements. Early frameworks for image generation models were dominated by …
Nowadays, with the growth of emerging technologies, increased attention has been paid to the classification of privacy-preserved medical data and development of various privacy-preserving models for the promotion of online medical pre-diagnosis …
In this study, we use a novel approach to explore possible connections between foreign exchange and stock returns using Turkish financial data from 2005 to 2022. Our method involves a two-stage technique. The first stage begins by decomposing …
Modernization in the healthcare industry is happening with the support of artificial intelligence and blockchain technologies. Collecting healthcare data is done through any Google survey from different governing bodies and data available on the …
Early detection of dementia patients in advance is a great concern for the physicians. That is why physicians make use of multi modal data to accomplish this. The baseline visit data of the patients are mainly utilized for this task. Modern …
Real estate significantly contributes to the broader stock market and garners substantial attention from individual households to the overall country’s economy. Predicting real estate trends holds great importance for investors, policymakers, and …
Generalized linear mixed effect models (GLMEMs) are widely applied for the analysis of correlated non-Gaussian data such as those found in longitudinal studies. On the other hand, the Cox (proportional hazards, PHs) and the accelerated failure …
A mathematical approach to developing new distributions is reviewed. The method which composes of integration and the concept of a normalizing constant, allows for primitive interjection of new parameter(s) in an existing distribution to form new …
In the past decade, deep learning has greatly increased the complexity of industrial production intelligence by virtue of its powerful learning capability. At the same time, it has also brought security challenges to the field of industrial …
Search and recommendation are two essential features of any e-commerce website for finding and purchasing a specific product. Visual Search is a promising and quick method in comparison to a textual-based search method. Hence, the objective of …
This article, a new method for the diagnosis of Alzheimer’s disease in the mild stage is presented according to combining the characteristics of EEG signal and MRI images. The brain signal is recorded in four modes of closed-eyes, open eye …
Safer sexual practice is essential for improving women’s reproductive and sexual health outcomes. The goal of this study is to identify the contributing factors influencing safer sexual negotiations (SSN) through the application of machine …
This article introduced a three-parameter extension of the Generalized Rayleigh distribution called half-logistic Generalized Rayleigh distribution, which has submodels the Generalized Rayleigh and Rayleigh distribution. The proposed model is …
Data clustering is one of the main issues in the optimization problem. It is the process of clustering a group of items into several groups. Items within each group have the greatest similarity and the least similarity to things in other groups.
Agriculture, engineering, public health, sociology, psychology, and epidemiology are just few of the numerous disciplines that find analysis and modeling of zero-truncated count data to be of paramount importance. Very recently, researchers have …
In this paper, we propose the exponential ratio-type estimator for the elevated estimation of population mean, implying one auxiliary variable in stratified random sampling using the conventional ratio and, Bahl and Tuteja exponential ratio-type …
In the era of big data, preserving data privacy has become paramount due to the sheer volume and sensitivity of the information being processed. This research is dedicated to safeguarding data privacy through a novel data sanitization approach …
Panel count data refers to the information collected in studies focusing on recurrent events, where subjects are observed only at specific time points. If these study subjects are exposed to recurrent events of several types, we obtain panel count …
In this research, we introduce an innovative automated resume screening approach that leverages advanced Natural Language Processing (NLP) technology, specifically the Bidirectional Encoder Representations from Transformers (BERT) language model …
Traditionally, in cognitive modeling for binary decision-making tasks, stochastic differential equations, particularly a family of diffusion decision models, are applied. These models suffer from difficulties in parameter estimation and …
This paper introduces a new family of distributions called the hyperbolic tangent (HT) family. The cumulative distribution function of this model is defined using the standard hyperbolic tangent function. The fundamental properties of the …
The main objective of this paper is to forecast the realized volatility (RV) of Bitcoin futures (BTCF) market. To serve our purpose, we propose an augmented heterogenous autoregressive (HAR) model to consider the information on time-varying jumps …
In this paper, we propose and investigate a novel approach for generating the probability distributions. The novel method is known as the SMP transformation technique. By using the SMP Transformation technique, we have developed a new model of the …
In finance, various stochastic models have been used to describe price movements of financial instruments. Following the seminal work of Robert Merton, several jump-diffusion models have been proposed for option pricing and risk management. In …
Many items fail instantaneously or early in life-testing experiments, mainly in electronic parts and clinical trials, due to faulty construction, inferior quality, or non-response to treatments. We record the observed lifetime as zero or near …
In reliability literature and engineering applications, stress-strength (SS) models are particularly important. This paper aims to estimate the SS reliability for an inverse Weibull distribution having the same shape parameters but different scale …