Skip to main content
main-content

International Journal of Data Science and Analytics OnlineFirst articles

08.09.2022 | Regular Paper

What can scatterplots teach us about doing data science better?

A scatterplot is often the graph of choice for displaying the relationship between two variables. Scatterplots are useful for exploratory analysis, but can do much more than just identifying correlations. As data sets get larger and more complex …

verfasst von:
Wilson Wen Bin Goh, Reuben Jyong Kiat Foo, Limsoon Wong

07.09.2022 | Regular Paper

Performance measure for sparse recovery algorithms in compressed sensing perspective

The sparse signal recovery is of great interest in compressed sensed data recovery. Many sparse recovery algorithms were developed in the last decade. However, selection of an appropriate recovery algorithm is an important matter of concern in …

verfasst von:
V. Vivekanand, Deepak Mishra

Open Access 03.09.2022 | Regular Paper

Pull–push: a measure of over- or underpersonalization in recommendation

A recommender system imposes differences between users, by presenting to them different recommendation lists, which they respond to, resulting in different “reaction” lists. Comparison of the differences in the recommendation and reaction lists …

verfasst von:
Gebrekirstos G. Gebremeskel, Arjen P. de Vries

03.09.2022 | Regular Paper

Deep neural network-based spatiotemporal heterogeneous data reconstruction for landslide detection

Landslides could cause huge threats to lives and cause property damages. In the landslide prediction system, environmental information can be collected through sensors to detect the possibility of landslide occurrences. However, the data collected …

verfasst von:
Darmawan Utomo, Liang-Cheng Hu, Pao-Ann Hsiung

02.09.2022 | Regular Paper

Reservoir consisting of diverse dynamical behaviors and its application in time series classification

Time series classification (TSC) has been tackled through a wide range of algorithms. Seminal reservoir computing (s-RC) is composed of a recurrent neural network with random parameters serves as a dynamical memory and is a well-known end-to-end …

verfasst von:
Mohammad Modiri, Mohammad Mehdi Ebadzadeh, Mohammad Mehdi Homayounpour

01.09.2022 | Regular Paper

Dbias: detecting biases and ensuring fairness in news articles

Because of the increasing use of data-centric systems and algorithms in machine learning, the topic of fairness is receiving a lot of attention in the academic and broader literature. This paper introduces Dbias ( https://pypi.org/project/Dbias/ …

verfasst von:
Shaina Raza, Deepak John Reji, Chen Ding

Open Access 31.08.2022 | Regular Paper

From data to interpretable models: machine learning for soil moisture forecasting

Soil moisture is critical to agricultural business, ecosystem health, and certain hydrologically driven natural disasters. Monitoring data, though, is prone to instrumental noise, wide ranging extrema, and nonstationary response to rainfall where …

verfasst von:
Aniruddha Basak, Kevin M. Schmidt, Ole Jakob Mengshoel

Open Access 30.08.2022 | Regular Paper

Graph neural networks for multivariate time series regression with application to seismic data

Machine learning, with its advances in deep learning has shown great potential in analyzing time series. In many scenarios, however, additional information that can potentially improve the predictions is available. This is crucial for data that …

verfasst von:
Stefan Bloemheuvel, Jurgen van den Hoogen, Dario Jozinović, Alberto Michelini, Martin Atzmueller

30.08.2022 | Review

Toward a taxonomy for 2D non-paired General Line Coordinates: a comprehensive survey

Multidimensional data visualization is one of the primary foundations supporting data analysis used for understanding the hidden relationships between items and dimensions of complex data. The line-based visualization techniques are a fundamental …

verfasst von:
Antonella S. Antonini, Leandro Luque, María Luján Ganuza, Silvia M. Castro

Open Access 23.08.2022 | Regular Paper

DeepTLF: robust deep neural networks for heterogeneous tabular data

Although deep neural networks (DNNs) constitute the state of the art in many tasks based on visual, audio, or text data, their performance on heterogeneous, tabular data is typically inferior to that of decision tree ensembles. To bridge the gap …

verfasst von:
Vadim Borisov, Klaus Broelemann, Enkelejda Kasneci, Gjergji Kasneci

23.08.2022 | Regular Paper

Improved robust nonparallel support vector machines

Nonparallel Support Vector Machine (NPSVM) is a binary classification approach that combines the advantages of both support vector machine (SVM) and Twin SVM (TWSVM). It finds two nonparallel hyperplanes by solving two optimization problems such …

verfasst von:
Ali Sahleh, Maziar Salahi

20.08.2022 | Regular Paper

The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern mining

In big data science, the classic frequent pattern mining is fundamental to various pattern mining applications. Extensive research on this mining has been undertaken for nearly 30 years but left with no reliable mining approach. One of the main …

verfasst von:
Tongyuan Wang

16.08.2022 | Regular Paper

A novel recommendation system comprising WNMF with graph-based static and temporal similarity estimators

Users’ similarity plays a crucial role in the Collaborative Filtering (CF)-based Recommendation Systems (RS). The CF uses a user-item matrix to estimate this similarity. However, the user-item matrix-based similarity performs poorly during …

verfasst von:
Anshul Gupta, Pravin Shrinath

02.08.2022 | Regular Paper

Attention-like feature explanation for tabular data

A new method for local and global explanation of the machine learning black-box model predictions by tabular data is proposed. It is implemented as a system called AFEX (Attention-like Feature EXplanation) and consisting of two main parts. The …

verfasst von:
Andrei V. Konstantinov, Lev V. Utkin

Open Access 30.07.2022 | Regular Paper

Optimizing graph layout by t-SNE perplexity estimation

Perplexity is one of the key parameters of dimensionality reduction algorithm of t-distributed stochastic neighbor embedding (t-SNE). In this paper, we investigated the relationship of t-SNE perplexity and graph layout evaluation metrics including …

verfasst von:
Chun Xiao, Seokhee Hong, Weidong Huang

Open Access 25.07.2022 | Regular Paper

Data-driven versus a domain-led approach to k-means clustering on an open heart failure dataset

Domain-driven data mining of health care data poses unique challenges. The aim of this paper is to explore the advantages and the challenges of a ‘domain-led approach’ versus a data-driven approach to a k-means clustering experiment. For the …

verfasst von:
A. Jasinska-Piadlo, R. Bond, P. Biglarbeigi, R. Brisk, P. Campbell, F. Browne, D. McEneaneny

24.07.2022 | Regular Paper

Semantic enhanced Markov model for sequential E-commerce product recommendation

To model sequential relationships between items, Markov Models build a transition probability matrix $$\mathbf {P}$$ P of size $$n \times n$$ n × n , where n represents number of states (items) and each matrix entry $$p_{(i,j)}$$ p ( i , j ) …

verfasst von:
Mahreen Nasir, C. I. Ezeife

14.07.2022 | Regular Paper

ScholarRec: a scholars’ recommender system that combines scholastic influence and social collaborations in academic social networks

Identifying and recommending influential scholars is one of the leading applications of scholarly data analytic. The existing methods to identify influential scholars focus on scholastic influence or social collaborations. In the former approach …

verfasst von:
Mitali Desai, Rupa G. Mehta, Dipti P. Rana

11.07.2022 | Regular Paper

Domain-specific text dictionaries for text analytics

We investigate the use of sentiment dictionaries to estimate sentiment for large document collections. Our goal in this paper is a semiautomatic method for extending a general sentiment dictionary for a specific target domain in a way that …

verfasst von:
Andrea Villanes, Christopher G. Healey

14.06.2022 | Regular Paper

Data-driven analytics of COVID-19 ‘infodemic’

The rampant of COVID-19 infodemic has almost been simultaneous with the outbreak of the pandemic. Many concerted efforts are made to mitigate its negative effect to information credibility and data legitimacy. Existing work mainly focuses on …

verfasst von:
Minyu Wan, Qi Su, Rong Xiang, Chu-Ren Huang