Skip to main content
Top

2013 | OriginalPaper | Chapter

2. Towards Open Data for Linguistics: Linguistic Linked Data

Authors : Christian Chiarcos, John McCrae, Philipp Cimiano, Christiane Fellbaum

Published in: New Trends of Research in Ontologies and Lexical Resources

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

‘Open Data’ has become very important in a wide range of fields. However for linguistics, much data is still published in proprietary, closed formats and is not made available on the web. We propose the use of linked data principles to enable language resources to be published and interlinked openly on the web, and we describe the application of this paradigm to the modeling of two resources, WordNet and the MASC corpus. Here, WordNet and the MASC corpus serve as representative examples for two major classes of linguistic resources, lexical-semantic resources and annotated corpora, respectively.Furthermore, we argue that modeling and publishing language resources as linked data offers crucial advantages as compared to existing formalisms. In particular, it is explained how this can enhance the interoperability and the integration of linguistic resources. Further benefits of this approach include unambiguous identifiability of elements of linguistic description, the creation of dynamic, but unambiguous links between different resources, the possibility to query across distributed resources, and the availability of a mature technological infrastructure. Finally, recent community activities are described.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
The term ‘resource’ is ambiguous here. As understood in this chapter, resources are structured collections of data which can be represented, for example, in RDF. Hence, we prefer the terms ‘node’ or ‘concept’ whenever RDF resources are meant.
 
2
We provide a SPARQL endpoint under http://​monnetproject.​deri.​ie/​lemonsource_​query, which provides access to the examples discussed in this chapter.
 
7
Other domains where the linked data principles have been applied, include, e.g., geography [20], biomedicine [1], cultural history (http://​www.​europeana.​eu) or government data (e.g., http://​data.​gov and http://​data.​gov.​uk).
 
8
For example, the W3C Semantic Web Activity reported on developments for Media Resources, Data Provenance and Microdata in the first 2 weeks of February 2012
 
Literature
1.
go back to reference Ashburner, M., Ball, C.A., et al.: Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)CrossRef Ashburner, M., Ball, C.A., et al.: Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)CrossRef
2.
go back to reference Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics (ACL-1998), Montréal, pp. 86–90 (1998) Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics (ACL-1998), Montréal, pp. 86–90 (1998)
3.
go back to reference Bird, S., Liberman, M.: A formal framework for linguistic annotation. Speech Commun. 33(1), 23–60 (2001)MATHCrossRef Bird, S., Liberman, M.: A formal framework for linguistic annotation. Speech Commun. 33(1), 23–60 (2001)MATHCrossRef
4.
go back to reference Bizer, C., Heath, T., Berners-Lee, T.: Linked data – the story so far. Int. J. Semant. Web Inf. Syst. (IJSWIS) 5(3), 1–22 (2009) Bizer, C., Heath, T., Berners-Lee, T.: Linked data – the story so far. Int. J. Semant. Web Inf. Syst. (IJSWIS) 5(3), 1–22 (2009)
5.
go back to reference Brandes, U., Eiglsperger, M., et al.: Graph markup language (GraphML). In: Tamassia, R. (ed.) Handbook of Graph Drawing and Visualization. Chapman & Hall/CRC, London (2010) Brandes, U., Eiglsperger, M., et al.: Graph markup language (GraphML). In: Tamassia, R. (ed.) Handbook of Graph Drawing and Visualization. Chapman & Hall/CRC, London (2010)
6.
go back to reference Buil-Aranda, C., Arenas, M., Corcho, O.: Semantics and optimization of the SPARQL 1.1 federation extension. In: The Semantic Web: Research and Applications, pp. 1–15. Springer, Heraklion (2011) Buil-Aranda, C., Arenas, M., Corcho, O.: Semantics and optimization of the SPARQL 1.1 federation extension. In: The Semantic Web: Research and Applications, pp. 1–15. Springer, Heraklion (2011)
7.
go back to reference Carletta, J., Evert, S., et al.: The NITE XML Toolkit: data model and query. Lang. Resour. Eval. J. (LREJ) 39(4), 313–334 (2005) Carletta, J., Evert, S., et al.: The NITE XML Toolkit: data model and query. Lang. Resour. Eval. J. (LREJ) 39(4), 313–334 (2005)
8.
go back to reference Cassidy, S.: An RDF realisation of LAF in the DADA annotation server. In: Proceedings of the 5th Joint ISO-ACL/SIGSEM Workshop on Interoperable Semantic Annotation (ISO-5), Hong Kong (2010) Cassidy, S.: An RDF realisation of LAF in the DADA annotation server. In: Proceedings of the 5th Joint ISO-ACL/SIGSEM Workshop on Interoperable Semantic Annotation (ISO-5), Hong Kong (2010)
9.
go back to reference Chiarcos, C.: An ontology of linguistic annotations. LDV Forum 23(1), 1–16 (2008) Chiarcos, C.: An ontology of linguistic annotations. LDV Forum 23(1), 1–16 (2008)
10.
go back to reference Chiarcos, C.: Interoperability of corpora and annotations. In Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics, pp. 161–179. Springer, Heidelberg (2012)CrossRef Chiarcos, C.: Interoperability of corpora and annotations. In Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics, pp. 161–179. Springer, Heidelberg (2012)CrossRef
11.
go back to reference Chiarcos, C., Dipper, S., et al.: A flexible framework for integrating annotations from different tools and tagsets. TAL (Traitement automatique des langues) 49(2), 217–246 (2008) Chiarcos, C., Dipper, S., et al.: A flexible framework for integrating annotations from different tools and tagsets. TAL (Traitement automatique des langues) 49(2), 217–246 (2008)
12.
go back to reference Chiarcos, C., Hellmann, S., et al.: The open linguistics working group. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC-2012), Istanbul (2012a) Chiarcos, C., Hellmann, S., et al.: The open linguistics working group. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC-2012), Istanbul (2012a)
13.
go back to reference Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.): Linked Data in Linguistics. Representing Language Data and Metadata. Springer, Heidelberg (2012b) Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.): Linked Data in Linguistics. Representing Language Data and Metadata. Springer, Heidelberg (2012b)
14.
go back to reference Chiarcos, C., Ritz, J., Stede, M.: By all these lovely tokens …Merging conflicting tokenizations. J. Lang. Resour. Eval. (LREJ) 4(45), 53–74 (2012c) Chiarcos, C., Ritz, J., Stede, M.: By all these lovely tokens …Merging conflicting tokenizations. J. Lang. Resour. Eval. (LREJ) 4(45), 53–74 (2012c)
15.
go back to reference Dipper, S.: XML-based stand-off representation and exploitation of multi-level linguistic annotation. In: Eckstein, R., Tolksdorf, R. (eds.) Proceedings of Berliner XML Tage 2005 (BXML-2005), Berlin, pp. 39–50 (2005) Dipper, S.: XML-based stand-off representation and exploitation of multi-level linguistic annotation. In: Eckstein, R., Tolksdorf, R. (eds.) Proceedings of Berliner XML Tage 2005 (BXML-2005), Berlin, pp. 39–50 (2005)
16.
go back to reference Farrar, S., Langendoen, D.T.: An OWL-DL implementation of GOLD: an ontology for the Semantic Web. In: Witt, A., Metzing, D. (eds.) Linguistic Modeling of Information and Markup Languages. Springer, Dordrecht (2010) Farrar, S., Langendoen, D.T.: An OWL-DL implementation of GOLD: an ontology for the Semantic Web. In: Witt, A., Metzing, D. (eds.) Linguistic Modeling of Information and Markup Languages. Springer, Dordrecht (2010)
18.
go back to reference Fielding, R., Gettys, J., et al.: Hypertext transfer protocol – HTTP/1.1. Internet RFC 2068 (1997) Fielding, R., Gettys, J., et al.: Hypertext transfer protocol – HTTP/1.1. Internet RFC 2068 (1997)
19.
go back to reference Francopoulo, G., George, M., et al.: Lexical markup framework (LMF). In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC-2006), Genoa (2006) Francopoulo, G., George, M., et al.: Lexical markup framework (LMF). In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC-2006), Genoa (2006)
20.
go back to reference Goodwin, J., Dolbear, C., Hart, G.: Geographical linked data: the administrative geography of Great Britain on the Semantic Web. Trans. GIS 12, 19–30 (2008)CrossRef Goodwin, J., Dolbear, C., Hart, G.: Geographical linked data: the administrative geography of Great Britain on the Semantic Web. Trans. GIS 12, 19–30 (2008)CrossRef
21.
go back to reference Guéret, C., Kotoulas, S., Groth, P.: TripleCloud: an infrastructure for exploratory querying over web-scale RDF data. In: Proceedings of the 2011 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT 2011), Lyon, pp. 245–248 (2011) Guéret, C., Kotoulas, S., Groth, P.: TripleCloud: an infrastructure for exploratory querying over web-scale RDF data. In: Proceedings of the 2011 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT 2011), Lyon, pp. 245–248 (2011)
22.
go back to reference Gurevych, I., Eckle-Kohler, J., et al.: Uby – a large-scale unified lexical semantic resource based on LMF. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL-2012), Avignon, pp. 580–590 (2012) Gurevych, I., Eckle-Kohler, J., et al.: Uby – a large-scale unified lexical semantic resource based on LMF. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL-2012), Avignon, pp. 580–590 (2012)
23.
go back to reference Hartig, O., Bizer, C., Freytag, J.C.: Executing SPARQL queries over the web of linked data. In: The Semantic Web – ISWC 2009, Heraklion, pp. 293–309 (2009) Hartig, O., Bizer, C., Freytag, J.C.: Executing SPARQL queries over the web of linked data. In: The Semantic Web – ISWC 2009, Heraklion, pp. 293–309 (2009)
24.
go back to reference Holtman, K., Mutz, A.: Transparent content negotiation in HTTP. Internet RFC 2295 (1998) Holtman, K., Mutz, A.: Transparent content negotiation in HTTP. Internet RFC 2295 (1998)
25.
go back to reference Ide, N., Pustejovsky, J.: What does interoperability mean, anyway? Toward an operational definition of interoperability. In: Proceedings of the 2nd International Conference on Global Interoperability for Language Resources (ICGL 2010), Hong Kong (2010) Ide, N., Pustejovsky, J.: What does interoperability mean, anyway? Toward an operational definition of interoperability. In: Proceedings of the 2nd International Conference on Global Interoperability for Language Resources (ICGL 2010), Hong Kong (2010)
26.
go back to reference Ide, N., Suderman, K.: GrAF: A graph-based format for linguistic annotations. In: Proceedings of the First Linguistic Annotation Workshop (LAW 2007), Prague, pp. 1–8 (2007) Ide, N., Suderman, K.: GrAF: A graph-based format for linguistic annotations. In: Proceedings of the First Linguistic Annotation Workshop (LAW 2007), Prague, pp. 1–8 (2007)
27.
go back to reference Ide, N., Le Maitre, J., Véronis, J.: Outline of a model for lexical databases. In: Zampolli, A., Calzolari, N., Palmer, M.S. (eds.) Current Issues in Computational Linguistics: In Honour of Don Walker, Giardini, pp. 283–320 (1995) Ide, N., Le Maitre, J., Véronis, J.: Outline of a model for lexical databases. In: Zampolli, A., Calzolari, N., Palmer, M.S. (eds.) Current Issues in Computational Linguistics: In Honour of Don Walker, Giardini, pp. 283–320 (1995)
28.
go back to reference Ide, N., Fellbaum, C., et al.: The manually annotated sub-corpus: a community resource for and by the people. In: Proceedings of the ACL 2010 Conference Short Papers, Uppsala, pp. 68–73 (2010) Ide, N., Fellbaum, C., et al.: The manually annotated sub-corpus: a community resource for and by the people. In: Proceedings of the ACL 2010 Conference Short Papers, Uppsala, pp. 68–73 (2010)
29.
go back to reference Klyne, G., Carroll, J.J, McBride, B.: Resource description framework (RDF): concepts and abstract syntax. Technical report, W3C Recommendation (2004) Klyne, G., Carroll, J.J, McBride, B.: Resource description framework (RDF): concepts and abstract syntax. Technical report, W3C Recommendation (2004)
30.
go back to reference Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: the penn treebank. Comput. Linguist. 19(2), 313–330 (1994) Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: the penn treebank. Comput. Linguist. 19(2), 313–330 (1994)
31.
go back to reference McCrae, J., Spohr, D., Cimiano, P.: Linking lexical resources and ontologies on the Semantic Web with Lemon. In: The Semantic Web: Research and Applications, Heraklion, pp. 245–259 (2011) McCrae, J., Spohr, D., Cimiano, P.: Linking lexical resources and ontologies on the Semantic Web with Lemon. In: The Semantic Web: Research and Applications, Heraklion, pp. 245–259 (2011)
32.
go back to reference McCrae, J., Montiel-Ponsoda, E., Cimiano, P.: Collaborative semantic editing of linked data lexica. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC-2012), Istanbul (2012a) McCrae, J., Montiel-Ponsoda, E., Cimiano, P.: Collaborative semantic editing of linked data lexica. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC-2012), Istanbul (2012a)
33.
go back to reference McCrae, J., Montiel-Ponsoda, E., Cimiano, P.: Integrating WordNet and wiktionary with lemon. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics, pp. 25–34, Springer, Heidelberg (2012b) McCrae, J., Montiel-Ponsoda, E., Cimiano, P.: Integrating WordNet and wiktionary with lemon. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics, pp. 25–34, Springer, Heidelberg (2012b)
34.
go back to reference Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRef Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRef
35.
go back to reference Prud’Hommeaux, E., Seaborne, A.: SPARQL query language for RDF. W3C working draft (2008) Prud’Hommeaux, E., Seaborne, A.: SPARQL query language for RDF. W3C working draft (2008)
36.
go back to reference Quilitz, B., Leser, U.: Querying distributed RDF data sources with SPARQL. In: The Semantic Web: Research and Applications, pp. 524–538. Springer, Berlin/Heidelberg (2008) Quilitz, B., Leser, U.: Querying distributed RDF data sources with SPARQL. In: The Semantic Web: Research and Applications, pp. 524–538. Springer, Berlin/Heidelberg (2008)
37.
go back to reference Schenk, S., Petrák, J.: Sesame RDF repository extensions for remote querying. In: Proceedings of the 7th Znalosti Conference (Znalosti-2008), Bratislava (2008) Schenk, S., Petrák, J.: Sesame RDF repository extensions for remote querying. In: Proceedings of the 7th Znalosti Conference (Znalosti-2008), Bratislava (2008)
38.
go back to reference Shadbolt, N., Hall, W., Berners-Lee, T.: The semantic web revisited. IEEE Intell. Syst. 21(3), 96–101 (2006)CrossRef Shadbolt, N., Hall, W., Berners-Lee, T.: The semantic web revisited. IEEE Intell. Syst. 21(3), 96–101 (2006)CrossRef
39.
go back to reference Van Assem, M., Gangemi, A., Schreiber, G.: Conversion of WordNet to a standard RDF/OWL representation. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC-2006), Genoa, pp. 237–242 (2006) Van Assem, M., Gangemi, A., Schreiber, G.: Conversion of WordNet to a standard RDF/OWL representation. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC-2006), Genoa, pp. 237–242 (2006)
40.
go back to reference Véronis, J., Ide, N.: A feature-based model for lexical databases. In: Proceedings of the 14th International Conference on Computational Linguistics (COLING-1992), Nantes, pp. 588–594 (1992) Véronis, J., Ide, N.: A feature-based model for lexical databases. In: Proceedings of the 14th International Conference on Computational Linguistics (COLING-1992), Nantes, pp. 588–594 (1992)
41.
go back to reference Windhouwer, M., Wright, S.E.: Linking to linguistic data categories in ISOcat. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics, pp. 99–107. Springer, Heidelberg (2012)CrossRef Windhouwer, M., Wright, S.E.: Linking to linguistic data categories in ISOcat. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics, pp. 99–107. Springer, Heidelberg (2012)CrossRef
Metadata
Title
Towards Open Data for Linguistics: Linguistic Linked Data
Authors
Christian Chiarcos
John McCrae
Philipp Cimiano
Christiane Fellbaum
Copyright Year
2013
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-31782-8_2

Premium Partner