Skip to main content
Top

2025 | OriginalPaper | Chapter

Exploiting Distant Supervision to Learn Semantic Descriptions of Tables with Overlapping Data

Authors : Binh Vu, Craig A. Knoblock, Basel Shbita, Fandel Lin

Published in: The Semantic Web – ISWC 2024

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Understanding the semantic structure of tabular data is essential for data integration and discovery. Specifically, the goal is to annotate columns in a tabular source with types and relationships between them using classes and predicates of a target ontology. Previous work that exploits the matches between entities in a knowledge graph and the table data does not perform well for tables with noisy or ambiguous data. A key reason for this poor performance is the limited amount of labeled data to train these methods. To address this problem, we propose a novel distant supervision approach that leverages existing Wikipedia tables and hyperlinks to automatically label tables with their semantic descriptions. Then, we use the labeled dataset to train neural network models to predict the semantic description of a new table. Our empirical evaluation shows that using the automatically labeled dataset provides approximately 5% improvement in column type prediction and 4.5% improvement in column relationship prediction in F1 scores over the state-of-the-art on a large set of real-world tables.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
A plain table does not contain any markup such as hyperlinks.
 
2
We normalize a header by masking numbers, removing special characters, etc.
 
3
We use the pretrained all-mpnet-base-v2 model.
 
4
The p-value of the sign test [11, 38] on the accuracies of the two systems is 0.086.
 
Literature
2.
go back to reference Bach, S.H., Broecheler, M., Huang, B., Getoor, L.: Hinge-loss Markov random fields and probabilistic soft logic. J. Mach. Learn. Res. 18(109), 1–67 (2017)MathSciNet Bach, S.H., Broecheler, M., Huang, B., Getoor, L.: Hinge-loss Markov random fields and probabilistic soft logic. J. Mach. Learn. Res. 18(109), 1–67 (2017)MathSciNet
4.
go back to reference Chen, J., Jiménez-Ruiz, E., Horrocks, I., Sutton, C.: ColNet: embedding the semantics of web tables for column type prediction. AAAI 33(01), 29–36 (2019)CrossRef Chen, J., Jiménez-Ruiz, E., Horrocks, I., Sutton, C.: ColNet: embedding the semantics of web tables for column type prediction. AAAI 33(01), 29–36 (2019)CrossRef
5.
go back to reference Chen, J., Jimenez-Ruiz, E., Horrocks, I., Sutton, C.: Learning semantic annotations for tabular data. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, California (2019). https://doi.org/10.24963/ijcai.2019/289 Chen, J., Jimenez-Ruiz, E., Horrocks, I., Sutton, C.: Learning semantic annotations for tabular data. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, California (2019). https://​doi.​org/​10.​24963/​ijcai.​2019/​289
9.
go back to reference Dasoulas, I., Yang, D., Duan, X., Dimou, A.: TorchicTab: semantic table annotation with Wikidata and language models. In: CEUR Workshop Proceedings, pp. 21–37 (2023) Dasoulas, I., Yang, D., Duan, X., Dimou, A.: TorchicTab: semantic table annotation with Wikidata and language models. In: CEUR Workshop Proceedings, pp. 21–37 (2023)
10.
go back to reference Deng, X., Sun, H., Lees, A., Wu, Y., Yu, C.: TURL: table understanding through representation learning (2020) Deng, X., Sun, H., Lees, A., Wu, Y., Yu, C.: TURL: table understanding through representation learning (2020)
11.
go back to reference Dixon, W.J., Mood, A.M.: The statistical sign test. J. Am. Stat. Assoc. 41(236), 557–566 (1946)CrossRef Dixon, W.J., Mood, A.M.: The statistical sign test. J. Am. Stat. Assoc. 41(236), 557–566 (1946)CrossRef
12.
go back to reference Efthymiou, V., Hassanzadeh, O., Rodriguez-Muro, M., Christophides, V.: Matching Web Tables with Knowledge Base Entities: From Entity Lookups to Entity Embeddings. In: d’Amato, C., Fernandez, M., Tamma, V., Lecue, F., Cudré-Mauroux, P., Sequeda, J., Lange, C., Heflin, J. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 260–277. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68288-4_16CrossRef Efthymiou, V., Hassanzadeh, O., Rodriguez-Muro, M., Christophides, V.: Matching Web Tables with Knowledge Base Entities: From Entity Lookups to Entity Embeddings. In: d’Amato, C., Fernandez, M., Tamma, V., Lecue, F., Cudré-Mauroux, P., Sequeda, J., Lange, C., Heflin, J. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 260–277. Springer, Cham (2017). https://​doi.​org/​10.​1007/​978-3-319-68288-4_​16CrossRef
13.
go back to reference Feng, Z.W., et al.: Automatic semantic modeling for structural data source with the prior knowledge from knowledge graph (2021) Feng, Z.W., et al.: Automatic semantic modeling for structural data source with the prior knowledge from knowledge graph (2021)
15.
go back to reference Hassanzadeh, O., et al.: Results of SemTab 2023. In: CEUR Workshop Proceedings, vol. 3557, pp. 1–14 (2023) Hassanzadeh, O., et al.: Results of SemTab 2023. In: CEUR Workshop Proceedings, vol. 3557, pp. 1–14 (2023)
18.
go back to reference Henriksen, E.G., Khorsid, A.M., Nielsen, E., Stück, A.M., Sørensen, A.S., Pelgrin, O.: Semtex: a hybrid approach for semantic table interpretation (2023) Henriksen, E.G., Khorsid, A.M., Nielsen, E., Stück, A.M., Sørensen, A.S., Pelgrin, O.: Semtex: a hybrid approach for semantic table interpretation (2023)
19.
go back to reference Hulsebos, M., et al.: Sherlock: a deep learning approach to semantic data type detection. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’19, pp. 1500–1508. Association for Computing Machinery, New York, NY, USA (2019) Hulsebos, M., et al.: Sherlock: a deep learning approach to semantic data type detection. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’19, pp. 1500–1508. Association for Computing Machinery, New York, NY, USA (2019)
20.
go back to reference Huynh, V.P., Chabot, Y., Labbé, T., Liu, J., Troncy, R.: From heuristics to language models: a journey through the universe of semantic table interpretation with DAGOBAH. In: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) (2022) Huynh, V.P., Chabot, Y., Labbé, T., Liu, J., Troncy, R.: From heuristics to language models: a journey through the universe of semantic table interpretation with DAGOBAH. In: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) (2022)
21.
22.
go back to reference Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014) Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014)
23.
go back to reference Korini, K., Peeters, R., Bizer, C.: Sotab: The WDC schema.org table annotation benchmark. In: CEUR Workshop Proceedings, vol. 3320, pp. 14–19. RWTH Aachen (2022) Korini, K., Peeters, R., Bizer, C.: Sotab: The WDC schema.org table annotation benchmark. In: CEUR Workshop Proceedings, vol. 3320, pp. 14–19. RWTH Aachen (2022)
25.
go back to reference Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. In: Proceedings of the VLDB Endowment, vol. 3, pp. 1338–1347. VLDB Endowment (2010) Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. In: Proceedings of the VLDB Endowment, vol. 3, pp. 1338–1347. VLDB Endowment (2010)
26.
go back to reference Liu, J., Chabot, Y., Troncy, R., Huynh, V.P., Labbé, T., Monnin, P.: From tabular data to knowledge graphs: a survey of semantic table interpretation tasks and methods. J. Web Semant. 76, 100761 (2023)CrossRef Liu, J., Chabot, Y., Troncy, R., Huynh, V.P., Labbé, T., Monnin, P.: From tabular data to knowledge graphs: a survey of semantic table interpretation tasks and methods. J. Web Semant. 76, 100761 (2023)CrossRef
27.
go back to reference Luzuriaga, J., Munoz, E., Rosales-Mendez, H., Hogan, A.: Merging web tables for relation extraction with knowledge graphs. IEEE Trans. Knowl. Data Eng. 1 (2021) Luzuriaga, J., Munoz, E., Rosales-Mendez, H., Hogan, A.: Merging web tables for relation extraction with knowledge graphs. IEEE Trans. Knowl. Data Eng. 1 (2021)
31.
go back to reference Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks (2019) Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks (2019)
32.
go back to reference Ritze, D., Lehmberg, O., Bizer, C.: Matching HTML tables to DBpedia. In: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics, pp. 1–6. No. Article 10 in WIMS ’15. Association for Computing Machinery, New York, NY, USA (2015) Ritze, D., Lehmberg, O., Bizer, C.: Matching HTML tables to DBpedia. In: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics, pp. 1–6. No. Article 10 in WIMS ’15. Association for Computing Machinery, New York, NY, USA (2015)
34.
go back to reference Suhara, Y., et al.: Annotating columns with pre-trained language models. In: Proceedings of the 2022 International Conference on Management of Data. ACM, New York, NY, USA (2022) Suhara, Y., et al.: Annotating columns with pre-trained language models. In: Proceedings of the 2022 International Conference on Management of Data. ACM, New York, NY, USA (2022)
35.
go back to reference Taheriyan, M., Knoblock, C.A., Szekely, P., Ambite, J.L.: Learning the semantics of structured data sources. J. Web Semant. 37–38, 152–169 (2016)CrossRef Taheriyan, M., Knoblock, C.A., Szekely, P., Ambite, J.L.: Learning the semantics of structured data sources. J. Web Semant. 37–38, 152–169 (2016)CrossRef
36.
go back to reference Vu, B., Knoblock, C., Pujara, J.: Learning semantic models of data sources using probabilistic graphical models. In: The World Wide Web Conference. WWW ’19, pp. 1944–1953. Association for Computing Machinery, New York, NY, USA (2019) Vu, B., Knoblock, C., Pujara, J.: Learning semantic models of data sources using probabilistic graphical models. In: The World Wide Web Conference. WWW ’19, pp. 1944–1953. Association for Computing Machinery, New York, NY, USA (2019)
38.
39.
go back to reference Zhang, Z.: Effective and efficient semantic table interpretation using TableMiner+. Semant. Web 8(6), 921–957 (2017)CrossRef Zhang, Z.: Effective and efficient semantic table interpretation using TableMiner+. Semant. Web 8(6), 921–957 (2017)CrossRef
Metadata
Title
Exploiting Distant Supervision to Learn Semantic Descriptions of Tables with Overlapping Data
Authors
Binh Vu
Craig A. Knoblock
Basel Shbita
Fandel Lin
Copyright Year
2025
DOI
https://doi.org/10.1007/978-3-031-77850-6_7

Premium Partner