Skip to main content
Top

2018 | OriginalPaper | Chapter

Fuzzy Semantic Labeling of Semi-structured Numerical Datasets

Authors : Ahmad Alobaid, Oscar Corcho

Published in: Knowledge Engineering and Knowledge Management

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

SPARQL endpoints provide access to rich sources of data (e.g. knowledge graphs), which can be used to classify other less structured datasets (e.g. CSV files or HTML tables on the Web). We propose an approach to suggest types for the numerical columns of a collection of input files available as CSVs. Our approach is based on the application of the fuzzy c-means clustering technique to numerical data in the input files, using existing SPARQL endpoints to generate training datasets. Our approach has three major advantages: it works directly with live knowledge graphs, it does not require knowledge-graph profiling beforehand, and it avoids tedious and costly manual training to match values with types. We evaluate our approach against manually annotated datasets. The results show that the proposed approach classifies most of the types correctly for our test sets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
2
We are not referring here to the gold standards that are built manually or the semantic models that are constructed by domain experts.
 
3
We use the same notation and variable names as in [3] (Bezdek et al.) 1984.
 
6
Two files related to the class person is missing from the classification.
 
7
Which means weight in Spanish.
 
Literature
3.
go back to reference Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984)CrossRef Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984)CrossRef
4.
go back to reference Cafarella, M.J., Halevy, A., Wang, D.Z., Wu, E., Zhang, Y.: WebTables: exploring the power of tables on the web. Proc. VLDB Endowment 1(1), 538–549 (2008)CrossRef Cafarella, M.J., Halevy, A., Wang, D.Z., Wu, E., Zhang, Y.: WebTables: exploring the power of tables on the web. Proc. VLDB Endowment 1(1), 538–549 (2008)CrossRef
5.
go back to reference Calvanese, D., et al.: OBDA with the ontop framework. In: SEBD, pp. 296–303. Citeseer (2015) Calvanese, D., et al.: OBDA with the ontop framework. In: SEBD, pp. 296–303. Citeseer (2015)
8.
go back to reference Goel, A., Knoblock, C.A., Lerman, K.: Exploiting structure within data for accurate labeling using conditional random fields. In: Proceedings on the International Conference on Artificial Intelligence (ICAI), The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), p. 1 (2012) Goel, A., Knoblock, C.A., Lerman, K.: Exploiting structure within data for accurate labeling using conditional random fields. In: Proceedings on the International Conference on Artificial Intelligence (ICAI), The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), p. 1 (2012)
9.
go back to reference Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. Proc. VLDB Endowment 3(1–2), 1338–1347 (2010)CrossRef Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. Proc. VLDB Endowment 3(1–2), 1338–1347 (2010)CrossRef
10.
go back to reference Mihindukulasooriya, N., Poveda-Villalón, M., García-Castro, R., Gómez-Pérez, A.: Loupe-an online tool for inspecting datasets in the linked data cloud. In: International Semantic Web Conference (Posters and Demos) (2015) Mihindukulasooriya, N., Poveda-Villalón, M., García-Castro, R., Gómez-Pérez, A.: Loupe-an online tool for inspecting datasets in the linked data cloud. In: International Semantic Web Conference (Posters and Demos) (2015)
11.
go back to reference Neumaier, S., Umbrich, J., Parreira, J.X., Polleres, A.: Multi-level semantic labelling of numerical values. In: Groth, P., et al. (eds.) The Semantic Web - ISWC 2016. ISWC 2016. Lecture Notes in Computer Science, vol. 9981, pp. 428–445. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46523-4_26 Neumaier, S., Umbrich, J., Parreira, J.X., Polleres, A.: Multi-level semantic labelling of numerical values. In: Groth, P., et al. (eds.) The Semantic Web - ISWC 2016. ISWC 2016. Lecture Notes in Computer Science, vol. 9981, pp. 428–445. Springer, Cham (2016). https://​doi.​org/​10.​1007/​978-3-319-46523-4_​26
12.
13.
go back to reference Priyatna, F., Alonso-Calvo, R., Paraiso-Medina, S., Padron-Sanchez, G., Corcho, O.: R2RML-based access and querying to relational clinical data with morph-RDB. In: SWAT4LS, pp. 142–151 (2015) Priyatna, F., Alonso-Calvo, R., Paraiso-Medina, S., Padron-Sanchez, G., Corcho, O.: R2RML-based access and querying to relational clinical data with morph-RDB. In: SWAT4LS, pp. 142–151 (2015)
15.
go back to reference Ritze, D., Lehmberg, O., Bizer, C.: Matching html tables to DBpedia. In: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics, p. 10. ACM (2015) Ritze, D., Lehmberg, O., Bizer, C.: Matching html tables to DBpedia. In: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics, p. 10. ACM (2015)
17.
go back to reference Syed, Z., Finin, T., Mulwad, V., Joshi, A.: Exploiting a web of semantic data for interpreting tables. In: Proceedings of the Second Web Science Conference, vol. 5 (2010) Syed, Z., Finin, T., Mulwad, V., Joshi, A.: Exploiting a web of semantic data for interpreting tables. In: Proceedings of the Second Web Science Conference, vol. 5 (2010)
18.
go back to reference Taheriyan, M., Knoblock, C.A., Szekely, P., Ambite, J.L.: Learning the semantics of structured data sources. Web Semant. Sci. Serv. Agents World Wide Web 37, 152–169 (2016)CrossRef Taheriyan, M., Knoblock, C.A., Szekely, P., Ambite, J.L.: Learning the semantics of structured data sources. Web Semant. Sci. Serv. Agents World Wide Web 37, 152–169 (2016)CrossRef
19.
go back to reference Venetis, P., et al.: Recovering semantics of tables on the web. Proc. VLDB Endowment 4(9), 528–538 (2011)CrossRef Venetis, P., et al.: Recovering semantics of tables on the web. Proc. VLDB Endowment 4(9), 528–538 (2011)CrossRef
21.
go back to reference Zhang, M., Chakrabarti, K.: Infogather+: semantic matching and annotation of numeric and time-varying attributes in web tables. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 145–156. ACM (2013) Zhang, M., Chakrabarti, K.: Infogather+: semantic matching and annotation of numeric and time-varying attributes in web tables. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 145–156. ACM (2013)
Metadata
Title
Fuzzy Semantic Labeling of Semi-structured Numerical Datasets
Authors
Ahmad Alobaid
Oscar Corcho
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-030-03667-6_2

Premium Partner