Skip to main content
Top

Dataset Similarity Estimation Using LLM-Based Metadata Embeddings

  • 2026
  • OriginalPaper
  • Chapter
Published in:

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This chapter explores the use of large language models (LLMs) to estimate dataset similarity based on metadata embeddings. The study implements and evaluates several methods, including PromptEOL, LastToken, and AvgPooling, using various LLMs and encoder-based models. The research demonstrates that LLMs can generate high-quality metadata embeddings, improving the estimation of dataset similarity. The experimental results show that PromptEOL achieves the best performance across all models, indicating its effectiveness for this task. The study also analyzes the characteristics of the words generated by LLMs, providing insights into their behavior and potential for future improvements. The findings suggest that LLMs can be a valuable tool for enhancing the accessibility and usability of research artifacts, particularly datasets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Business + Economics & Engineering + Technology"

Online-Abonnement

Springer Professional "Business + Economics & Engineering + Technology" gives you access to:

  • more than 130.000 books
  • more than 540 journals

from the following subject areas:

  • Automotive
  • Construction + Real Estate
  • Business IT + Informatics
  • Electrical Engineering + Electronics
  • Energy + Sustainability
  • Finance + Banking
  • Management + Leadership
  • Marketing + Sales
  • Mechanical Engineering + Materials
  • Surfaces + Materials Technology
  • Insurance + Risk


Secure your knowledge advantage now!

Springer Professional "Engineering + Technology"

Online-Abonnement

Springer Professional "Engineering + Technology" gives you access to:

  • more than 75.000 books
  • more than 390 journals

from the following specialised fileds:

  • Automotive
  • Business IT + Informatics
  • Construction + Real Estate
  • Electrical Engineering + Electronics
  • Energy + Sustainability
  • Mechanical Engineering + Materials
  • Surfaces + Materials Technology





 

Secure your knowledge advantage now!

Springer Professional "Business + Economics"

Online-Abonnement

Springer Professional "Business + Economics" gives you access to:

  • more than 100.000 books
  • more than 340 journals

from the following specialised fileds:

  • Construction + Real Estate
  • Business IT + Informatics
  • Finance + Banking
  • Management + Leadership
  • Marketing + Sales
  • Insurance + Risk



Secure your knowledge advantage now!

Title
Dataset Similarity Estimation Using LLM-Based Metadata Embeddings
Authors
Koichiro Ito
Shigeki Matsubara
Copyright Year
2026
Publisher
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-95-4861-3_11
This content is only visible if you are logged in and have the appropriate permissions.
This content is only visible if you are logged in and have the appropriate permissions.

Premium Partner

    Image Credits
    Neuer Inhalt/© ITandMEDIA, Nagarro GmbH/© Nagarro GmbH, AvePoint Deutschland GmbH/© AvePoint Deutschland GmbH, AFB Gemeinnützige GmbH/© AFB Gemeinnützige GmbH, USU GmbH/© USU GmbH, Ferrari electronic AG/© Ferrari electronic AG