Skip to main content
main-content

Tipp

Weitere Artikel dieser Ausgabe durch Wischen aufrufen

01.09.2018 | Regular Paper | Ausgabe 4/2018 Open Access

International Journal of Data Science and Analytics 4/2018

Inferring variable labels using outlines of data in Data Jackets by considering similarity and co-occurrence

Zeitschrift:
International Journal of Data Science and Analytics > Ausgabe 4/2018
Autoren:
Teruaki Hayashi, Yukio Ohsawa
Wichtige Hinweise
This paper is an extended version of “Matrix-based Method for Inferring Variable Labels Using Outlines of Data in Data Jackets,” read at the PAKDD’2017 Long Presentation.

Abstract

The Data Jacket (DJ) is a technique for sharing information related to data, where the data are hidden, by summarizing them in natural language. In DJs, variables are described by variable labels (VLs), which are the names/meanings of variables, and the utility of data is estimated through combinations of VLs. However, DJs do not always contain VLs because the rules describing DJs cannot compel data owners to enter all relevant information. Owing to a lack of VLs in some DJs, even if the DJs can be combined, their combinations cannot be implemented through the string matching of the VLs. In this paper, we propose a method for inferring VLs in DJs using the text in their outlines. We focus on similarity among the outlines of DJs and create two models for inferring VLs, i.e., based on the similarity of the outlines and the co-occurrence of the VLs. We implemented our models on a similarity and a co-occurrence matrix and applied the proposed method to two types of test data: the DJs of public data and business data. The results of experiments show that our method is significantly superior to the technique that uses only the string matching of the VLs.

Unsere Produktempfehlungen

Premium-Abo der Gesellschaft für Informatik

Sie erhalten uneingeschränkten Vollzugriff auf alle acht Fachgebiete von Springer Professional und damit auf über 45.000 Fachbücher und ca. 300 Fachzeitschriften.

Literatur
Über diesen Artikel

Weitere Artikel der Ausgabe 4/2018

International Journal of Data Science and Analytics 4/2018 Zur Ausgabe

Premium Partner

    Bildnachweise