Skip to main content

International Journal of Data Science and Analytics OnlineFirst articles

02-08-2022 | Regular Paper

Attention-like feature explanation for tabular data

A new method for local and global explanation of the machine learning black-box model predictions by tabular data is proposed. It is implemented as a system called AFEX (Attention-like Feature EXplanation) and consisting of two main parts. The …

Andrei V. Konstantinov, Lev V. Utkin

Open Access 30-07-2022 | Regular Paper

Optimizing graph layout by t-SNE perplexity estimation

Perplexity is one of the key parameters of dimensionality reduction algorithm of t-distributed stochastic neighbor embedding (t-SNE). In this paper, we investigated the relationship of t-SNE perplexity and graph layout evaluation metrics including …

Chun Xiao, Seokhee Hong, Weidong Huang

Open Access 25-07-2022 | Regular Paper

Data-driven versus a domain-led approach to k-means clustering on an open heart failure dataset

Domain-driven data mining of health care data poses unique challenges. The aim of this paper is to explore the advantages and the challenges of a ‘domain-led approach’ versus a data-driven approach to a k-means clustering experiment. For the …

A. Jasinska-Piadlo, R. Bond, P. Biglarbeigi, R. Brisk, P. Campbell, F. Browne, D. McEneaneny

24-07-2022 | Regular Paper

Semantic enhanced Markov model for sequential E-commerce product recommendation

To model sequential relationships between items, Markov Models build a transition probability matrix $$\mathbf {P}$$ P of size $$n \times n$$ n × n , where n represents number of states (items) and each matrix entry $$p_{(i,j)}$$ p ( i , j ) …

Mahreen Nasir, C. I. Ezeife

20-07-2022 | Regular Paper

PARAS: a parameter space-driven approach for complete association rule mining

To enable efficient association rule mining, existing techniques prestore intermediate results as itemsets. However, the actual rule generation is still performed at query-time. The response time thus tends to remain unacceptably long for …

Xika Lin, Abhishek Mukherji, Elke A. Rundensteiner, Matthew O. Ward

Open Access 16-07-2022 | Regular Paper

Bounding open space risk with decoupling autoencoders in open set recognition

One-vs-Rest (OVR) classification aims to distinguish a single class of interest (COI) from other classes. The concept of novelty detection and robustness to dataset shift becomes crucial in OVR when the scope of the rest class is extended from the …

Max Lübbering, Michael Gebauer, Rajkumar Ramamurthy, Christian Bauckhage, Rafet Sifa

14-07-2022 | Regular Paper

ScholarRec: a scholars’ recommender system that combines scholastic influence and social collaborations in academic social networks

Identifying and recommending influential scholars is one of the leading applications of scholarly data analytic. The existing methods to identify influential scholars focus on scholastic influence or social collaborations. In the former approach …

Mitali Desai, Rupa G. Mehta, Dipti P. Rana

11-07-2022 | Regular Paper

Domain-specific text dictionaries for text analytics

We investigate the use of sentiment dictionaries to estimate sentiment for large document collections. Our goal in this paper is a semiautomatic method for extending a general sentiment dictionary for a specific target domain in a way that …

Andrea Villanes, Christopher G. Healey

Open Access 10-07-2022 | Regular Paper

Multilingual hope speech detection in English and Dravidian languages

Recent work on language technology has aimed to identify negative language such as hate speech and cyberbullying as well as improve offensive language detection to mediate social media platforms. Most of these systems rely on using machine …

Bharathi Raja Chakravarthi

18-06-2022 | Regular Paper

Sample-selection-adjusted random forests

A predictive model that is trained with non-randomly selected samples can offer biased predictions for the population. This paper discusses when non-random selection is a problem. For the applications in which it is a problem, this paper presents …

Jonathan Cook

14-06-2022 | Regular Paper

Data-driven analytics of COVID-19 ‘infodemic’

The rampant of COVID-19 infodemic has almost been simultaneous with the outbreak of the pandemic. Many concerted efforts are made to mitigate its negative effect to information credibility and data legitimacy. Existing work mainly focuses on …

Minyu Wan, Qi Su, Rong Xiang, Chu-Ren Huang

10-06-2022 | Review

A survey on event and subevent detection from microblog data towards crisis management

Social media data analysis is a popular research domain since the last decade. Detecting the events and sub-events from social media posts that require special attention is one of the key research problem in this domain with wide range of …

Shatadru Roy Chowdhury, Srinka Basu, Ujjwal Maulik

06-06-2022 | Regular Paper

Exo-SIR: an epidemiological model to analyze the impact of exogenous spread of infection

Epidemics like Covid-19 and Ebola have impacted people’s lives significantly. The impact of mobility of people across the countries or states in the spread of epidemics has been significant. The spread of disease due to factors local to the …

Nirmal Kumar Sivaraman, Manas Gaur, Shivansh Baijal, Sakthi Balan Muthiah, Amit Sheth

Open Access 27-05-2022 | Original Paper

COVID-19 and 5G conspiracy theories: long term observation of a digital wildfire

The COVID-19 pandemic has severely affected the lives of people worldwide, and consequently, it has dominated world news since March 2020. Thus, it is no surprise that it has also been the topic of a massive amount of misinformation, which was …

Johannes Langguth, Petra Filkuková, Stefan Brenner, Daniel Thilo Schroeder, Konstantin Pogorelov

26-05-2022 | Regular Paper

Using big data and federated learning for generating energy efficiency recommendations

Internet of Things (IoT) devices are becoming popular solutions for smart home and office environments and contribute the most to energy efficiency. The most common implementation of such solutions relies on smart home systems that are hosted on …

Iraklis Varlamis, Christos Sardianos, Christos Chronis, George Dimitrakopoulos, Yassine Himeur, Abdullah Alsalemi, Faycal Bensaali, Abbes Amira

26-05-2022 | Regular Paper

Interactive planning of revisiting-free itinerary for signed-for delivery

The trend of online shopping has given rise to the growth of signed-for delivery services. Signed-for delivery is a reliable way of getting proof of delivery that ensures your parcel must be signed for upon its arrival with the recipient. However …

Lo Pang-Yun Ting, Shan-Yun Teng, Szu-Chan Wu, Kun-Ta Chuang

14-05-2022 | Regular Paper

Urban fire station location planning using predicted demand and service quality index

In this article, we propose a systematic approach for fire station location planning. We develop machine learning models, based on Random Forest and Extreme Gradient Boosting, for demand prediction and utilize the models further to define a …

Arnab Dey, Andrew Heger, Darin England

Open Access 06-05-2022 | Regular Paper

An iterative topic model filtering framework for short and noisy user-generated data: analyzing conspiracy theories on twitter

Conspiracy theories have seen a rise in popularity in recent years. Spreading quickly through social media, their disruptive effect can lead to a biased public view on policy decisions and events. We present a novel approach for LDA-pre-processing …

Gillian Kant, Levin Wiebelt, Christoph Weisser, Krisztina Kis-Katos, Mattias Luber, Benjamin Säfken

Open Access 30-04-2022 | Regular Paper

Explainability of the COVID-19 epidemiological model with nonnegative tensor factorization

The world is witnessing the devastating effects of the COVID-19 pandemic. Each country responded to contain the spread of the virus in the early stages through diverse response measures. Interpreting these responses and their patterns globally is …

Thirunavukarasu Balasubramaniam, David J. Warne, Richi Nayak, Kerrie Mengersen

04-04-2022 | Regular Paper

Eigenvalue analysis of SARS-CoV-2 viral load data: illustration for eight COVID-19 patients

Eigenvalue analysis is an important tool in economics and nonlinear physics to analyze industrial processes and instability phenomena, respectively. A model-based eigenvalue analysis of viral load data from eight symptomatic COVID-19 patients was …

Till D. Frank