Skip to main content

2018 | OriginalPaper | Buchkapitel

Making Sense of Numerical Data - Semantic Labelling of Web Tables

verfasst von : Emilia Kacprzak, José M. Giménez-García, Alessandro Piscopo, Laura Koesten, Luis-Daniel Ibáñez, Jeni Tennison, Elena Simperl

Erschienen in: Knowledge Engineering and Knowledge Management

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the increasing amount of structured data on the web the need to understand and support search over this emerging data space is growing. Adding semantics to structured data can help address existing challenges in data discovery, as it facilitates understanding the values in their context. While there are approaches on how to lift structured data to semantic web formats to enrich it and facilitate discovery, most work to date focuses on textual fields rather than numerical data. In this paper, we propose a two level (row and column based) approach to add semantic meaning to numerical values in tables, called NUMER. We evaluate our approach using a benchmark (NumDB) generated for the purpose of this work. We show the influence of the different levels of analysis on the success of assigning semantic labels to numerical values in tables. Our approach outperforms the state of the art and is less affected by data structure and quality issues such as a small number of entities or deviations in the data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Kacprzak, E., Koesten, L., Heath, T., Tennison, J.: Position paper: Dataset profling for un-linked data. In: Proceedings of the 3rd International Workshop (PROFILES), The 13th ESWC Conference (2016) Kacprzak, E., Koesten, L., Heath, T., Tennison, J.: Position paper: Dataset profling for un-linked data. In: Proceedings of the 3rd International Workshop (PROFILES), The 13th ESWC Conference (2016)
2.
Zurück zum Zitat Mitlöhner, J., Neumaier, S., Umbrich, J., Polleres, A.: Characteristics of open data CSV files. In: 2nd International Conference on Open and Big Data, OBD 2016, Vienna, Austria, 22–24 August 2016, pp. 72–79 (2016). https://doi.org/10.1109/OBD.2016.18 Mitlöhner, J., Neumaier, S., Umbrich, J., Polleres, A.: Characteristics of open data CSV files. In: 2nd International Conference on Open and Big Data, OBD 2016, Vienna, Austria, 22–24 August 2016, pp. 72–79 (2016). https://​doi.​org/​10.​1109/​OBD.​2016.​18
3.
Zurück zum Zitat Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. PVLDB 3(1), 1338–1347 (2010) Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. PVLDB 3(1), 1338–1347 (2010)
4.
Zurück zum Zitat Mulwad, V., Finin, T., Syed, Z., Joshi, A.: Using linked data to interpret tables. In: Proceedings of the First International Workshop on Consuming Linked Data. CEUR Workshop Proceedings, vol. 665 (2010) Mulwad, V., Finin, T., Syed, Z., Joshi, A.: Using linked data to interpret tables. In: Proceedings of the First International Workshop on Consuming Linked Data. CEUR Workshop Proceedings, vol. 665 (2010)
5.
Zurück zum Zitat Syed, Z., Finin, T., Mulwad, V., Joshi, A.: Exploiting a web of semantic data for interpreting tables. In: Proceedings of the 2nd Web Science Conference (2010) Syed, Z., Finin, T., Mulwad, V., Joshi, A.: Exploiting a web of semantic data for interpreting tables. In: Proceedings of the 2nd Web Science Conference (2010)
8.
Zurück zum Zitat Taheriyan, M., Knoblock, C.A., Szekely, P., Ambite, J.L.: A scalable approach to learn semantic models of structured sources. In: Proceedings of the International Conference on Semantic Computing (2014) Taheriyan, M., Knoblock, C.A., Szekely, P., Ambite, J.L.: A scalable approach to learn semantic models of structured sources. In: Proceedings of the International Conference on Semantic Computing (2014)
9.
Zurück zum Zitat Ritze, D., Lehmberg, O., Bizer, C.: Matching HTML tables to DBpedia. In: Proceedings of the International Conference on Web Intelligence, Mining and Semantics, pp. 10:1–10:6 (2015) Ritze, D., Lehmberg, O., Bizer, C.: Matching HTML tables to DBpedia. In: Proceedings of the International Conference on Web Intelligence, Mining and Semantics, pp. 10:1–10:6 (2015)
14.
Zurück zum Zitat Ritze, D., Lehmberg, O., Oulabi, Y., Bizer, C.: Profiling the potential of web tables for augmenting cross-domain knowledge bases. In: WWW, pp. 251–261. ACM (2016) Ritze, D., Lehmberg, O., Oulabi, Y., Bizer, C.: Profiling the potential of web tables for augmenting cross-domain knowledge bases. In: WWW, pp. 251–261. ACM (2016)
22.
Zurück zum Zitat Koesten, L.M., Kacprzak, E., Tennison, J.F.A., Simperl, E.: The trials and tribulations of working with structured data:-a study on information seeking behaviour. In: Proceedings of the CHI Conference on Human Factors in Computing Systems, pp. 1277–1289 (2017). https://doi.org/10.1145/3025453.3025838 Koesten, L.M., Kacprzak, E., Tennison, J.F.A., Simperl, E.: The trials and tribulations of working with structured data:-a study on information seeking behaviour. In: Proceedings of the CHI Conference on Human Factors in Computing Systems, pp. 1277–1289 (2017). https://​doi.​org/​10.​1145/​3025453.​3025838
23.
Zurück zum Zitat Goel, A., Knoblock, C.A., Lerman, K.: Exploiting structure within data for accurate labeling using conditional random fields. In: Proceedings of the 14th International Conference on Artificial Intelligence (ICAI) (2012) Goel, A., Knoblock, C.A., Lerman, K.: Exploiting structure within data for accurate labeling using conditional random fields. In: Proceedings of the 14th International Conference on Artificial Intelligence (ICAI) (2012)
Metadaten
Titel
Making Sense of Numerical Data - Semantic Labelling of Web Tables
verfasst von
Emilia Kacprzak
José M. Giménez-García
Alessandro Piscopo
Laura Koesten
Luis-Daniel Ibáñez
Jeni Tennison
Elena Simperl
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-03667-6_11