Skip to main content

The VLDB Journal OnlineFirst articles

19-04-2024 | Regular Paper

Hyper-distance oracles in hypergraphs

We study point-to-point distance estimation in hypergraphs, where the query is parameterized by a positive integer s, which defines the required level of overlap for two hyperedges to be considered adjacent. To answer s-distance queries, we first …

Authors:
Giulia Preti, Gianmarco De Francisci Morales, Francesco Bonchi

17-04-2024 | Editorial

Special issue on “Machine learning and databases”

Authors:
Matthias Boehm, Nesime Tatbul

Open Access 12-04-2024 | Regular Paper

Stochastic gradient descent without full data shuffle: with applications to in-database machine learning and deep learning systems

Modern machine learning (ML) systems commonly use stochastic gradient descent (SGD) to train ML models. However, SGD relies on random data order to converge, which usually requires a full data shuffle. For in-DB ML systems and deep learning …

Authors:
Lijie Xu, Shuang Qiu, Binhang Yuan, Jiawei Jiang, Cedric Renggli, Shaoduo Gan, Kaan Kara, Guoliang Li, Ji Liu, Wentao Wu, Jieping Ye, Ce Zhang

12-04-2024 | Regular Paper

Data distribution tailoring revisited: cost-efficient integration of representative data

Data scientists often develop data sets for analysis by drawing upon available data sources. A major challenge is ensuring that the data set used for analysis adequately represents relevant demographic groups or other variables. Whether data is …

Authors:
Jiwon Chang, Bohan Cui, Fatemeh Nargesian, Abolfazl Asudeh, H. V. Jagadish

22-02-2024 | Editorial

Special issue: modern hardware

Authors:
Norman May, Spyros Blanas, Danica Porobic

16-02-2024 | Special Issue Paper

Speech-to-SQL: toward speech-driven SQL query generation from natural language question

Speech-based inputs have been gaining significant momentum with the popularity of smartphones and tablets in our daily lives, since voice is the most popular and efficient way for human–computer interaction. This paper works toward designing more …

Authors:
Yuanfeng Song, Raymond Chi-Wing Wong, Xuefang Zhao

Open Access 13-02-2024 | Special Issue Paper

Assisted design of data science pipelines

When designing data science (DS) pipelines, end-users can get overwhelmed by the large and growing set of available data preprocessing and modeling techniques. Intelligent discovery assistants (IDAs) and automated machine learning (AutoML) …

Authors:
Sergey Redyuk, Zoi Kaoudi, Sebastian Schelter, Volker Markl

Open Access 13-02-2024 | Special Issue Paper

A learning-based framework for spatial join processing: estimation, optimization and tuning

The importance and complexity of spatial join operation resulted in the availability of many join algorithms, some of which are tailored for big-data platforms like Hadoop and Spark. The choice among them is not trivial and depends on different …

Authors:
Tin Vu, Alberto Belussi, Sara Migliorini, Ahmed Eldawy

Open Access 11-01-2024 | Special Issue Paper

Towards flexibility and robustness of LSM trees

Log-structured merge trees (LSM trees) are increasingly used as part of the storage engine behind several data systems, and are frequently deployed in the cloud. As the number of applications relying on LSM-based storage backends increases, the …

Authors:
Andy Huynh, Harshal A. Chaudhari, Evimaria Terzi, Manos Athanassoulis

Open Access 27-12-2023 | Special Issue Paper

DB-BERT: making database tuning tools “read” the manual

DB-BERT is a database tuning tool that exploits information gained via natural language analysis of manuals and other relevant text documents. It uses text to identify database system parameters to tune as well as recommended parameter values.

Author:
Immanuel Trummer

Open Access 22-12-2023 | Special Issue Paper

HPCache: memory-efficient OLAP through proportional caching revisited

Analytical engines rely on in-memory data caching to avoid storage accesses and provide timely responses by keeping the most frequently accessed data in memory. Purely frequency- and time-based caching decisions, however, are a proxy of the …

Authors:
Hamish Nicholson, Periklis Chrysogelos, Anastasia Ailamaki

01-12-2023 | Special Issue Paper

Morphtree: a polymorphic main-memory learned index for dynamic workloads

Modern database systems rely on indexes to accelerate data access. The recently proposed learned indexes can offer higher search performance with lower space costs than traditional indexes like B+-tree. We observe that existing main-memory learned …

Authors:
Yongping Luo, Peiquan Jin, Zhaole Chu, Xiaoliang Wang, Yigui Yuan, Zhou Zhang, Yun Luo, Xufei Wu, Peng Zou

29-11-2023 | Special Issue Paper

A multi-facet analysis of BERT-based entity matching models

State-of-the-art Entity Matching approaches rely on transformer architectures, such as BERT, for generating highly contextualized embeddings of terms. The embeddings are then used to predict whether pairs of entity descriptions refer to the same …

Authors:
Matteo Paganelli, Donato Tiano, Francesco Guerra

Open Access 23-11-2023 | Special Issue Paper

Givens rotations for QR decomposition, SVD and PCA over database joins

This article introduces FiGaRo, an algorithm for computing the upper-triangular matrix in the QR decomposition of the matrix defined by the natural join over relational data. FiGaRo ’s main novelty is that it pushes the QR decomposition past the …

Authors:
Dan Olteanu, Nils Vortmeier, Ɖorđe Živanović

21-11-2023 | Special Issue Paper

Alfa: active learning for graph neural network-based semantic schema alignment

Semantic schema alignment aims to match elements across a pair of schemas based on their semantic representation. It is a key primitive for data integration that facilitates the creation of a common data fabric across heterogeneous data sources.

Authors:
Venkata Vamsikrishna Meduri, Abdul Quamar, Chuan Lei, Xiao Qin, Berthold Reinwald

Open Access 17-11-2023 | Special Issue Paper

AutoML in heavily constrained applications

Optimizing a machine learning pipeline for a task at hand requires careful configuration of various hyperparameters, typically supported by an AutoML system that optimizes the hyperparameters for the given training dataset. Yet, depending on the …

Authors:
Felix Neutatz, Marius Lindauer, Ziawasch Abedjan

16-11-2023 | Special Issue Paper

Efficient and robust active learning methods for interactive database exploration

There is an increasing gap between fast growth of data and the limited human ability to comprehend data. Consequently, there has been a growing demand of data management tools that can bridge this gap and help the user retrieve high-value content …

Authors:
Enhui Huang, Yanlei Diao, Anna Liu, Liping Peng, Luciano Di Palma

Open Access 14-11-2023 | Special issue Paper

F-IVM: analytics over relational databases under updates

This article describes F-IVM, a unified approach for maintaining analytics over changing relational data. We exemplify its versatility in four disciplines: processing queries with group-by aggregates and joins; learning linear regression models …

Authors:
Ahmet Kara, Milos Nikolic, Dan Olteanu, Haozhe Zhang