Skip to main content
Top

Automatic Annotation of Clinicaltrials.gov Entities Using Large Language Models

  • 2025
  • OriginalPaper
  • Chapter
Published in:

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This chapter explores the automatic annotation of clinical trial entities using large language models, focusing on the extraction of biomedical entities from the free-text eligibility criteria of ClinicalTrials.gov. The research introduces a novel dataset containing 497,812 documents with over 4.7 million sentences and 4.6 million entities, categorized into types such as diseases, chemicals, and interventions. The dataset employs pseudo-labeling techniques to generate nuanced annotations with confidence scores, enhancing the accuracy and reliability of entity recognition. Challenges encountered during annotation, such as label inconsistencies, were addressed through a structured mapping strategy to ensure uniformity. The dataset's extensive entity distribution and document-level analysis provide valuable insights for various downstream tasks, including NER, information extraction, and text classification. This work contributes significantly to the field of biomedical NLP by offering a publicly available resource that supports advancements in healthcare data processing and AI-driven solutions.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Business + Economics & Engineering + Technology"

Online-Abonnement

Springer Professional "Business + Economics & Engineering + Technology" gives you access to:

  • more than 102.000 books
  • more than 537 journals

from the following subject areas:

  • Automotive
  • Construction + Real Estate
  • Business IT + Informatics
  • Electrical Engineering + Electronics
  • Energy + Sustainability
  • Finance + Banking
  • Management + Leadership
  • Marketing + Sales
  • Mechanical Engineering + Materials
  • Insurance + Risk


Secure your knowledge advantage now!

Springer Professional "Engineering + Technology"

Online-Abonnement

Springer Professional "Engineering + Technology" gives you access to:

  • more than 67.000 books
  • more than 390 journals

from the following specialised fileds:

  • Automotive
  • Business IT + Informatics
  • Construction + Real Estate
  • Electrical Engineering + Electronics
  • Energy + Sustainability
  • Mechanical Engineering + Materials





 

Secure your knowledge advantage now!

Springer Professional "Business + Economics"

Online-Abonnement

Springer Professional "Business + Economics" gives you access to:

  • more than 67.000 books
  • more than 340 journals

from the following specialised fileds:

  • Construction + Real Estate
  • Business IT + Informatics
  • Finance + Banking
  • Management + Leadership
  • Marketing + Sales
  • Insurance + Risk



Secure your knowledge advantage now!

Title
Automatic Annotation of Clinicaltrials.gov Entities Using Large Language Models
Authors
Pouyan Nahed
Sepideh Farivar
Kazem Taghva
Copyright Year
2025
Publisher
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-96-6929-5_35
This content is only visible if you are logged in and have the appropriate permissions.
This content is only visible if you are logged in and have the appropriate permissions.