Skip to main content

2013 | OriginalPaper | Buchkapitel

2. Towards Open Data for Linguistics: Linguistic Linked Data

verfasst von : Christian Chiarcos, John McCrae, Philipp Cimiano, Christiane Fellbaum

Erschienen in: New Trends of Research in Ontologies and Lexical Resources

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

‘Open Data’ has become very important in a wide range of fields. However for linguistics, much data is still published in proprietary, closed formats and is not made available on the web. We propose the use of linked data principles to enable language resources to be published and interlinked openly on the web, and we describe the application of this paradigm to the modeling of two resources, WordNet and the MASC corpus. Here, WordNet and the MASC corpus serve as representative examples for two major classes of linguistic resources, lexical-semantic resources and annotated corpora, respectively.Furthermore, we argue that modeling and publishing language resources as linked data offers crucial advantages as compared to existing formalisms. In particular, it is explained how this can enhance the interoperability and the integration of linguistic resources. Further benefits of this approach include unambiguous identifiability of elements of linguistic description, the creation of dynamic, but unambiguous links between different resources, the possibility to query across distributed resources, and the availability of a mature technological infrastructure. Finally, recent community activities are described.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The term ‘resource’ is ambiguous here. As understood in this chapter, resources are structured collections of data which can be represented, for example, in RDF. Hence, we prefer the terms ‘node’ or ‘concept’ whenever RDF resources are meant.
 
2
We provide a SPARQL endpoint under http://​monnetproject.​deri.​ie/​lemonsource_​query, which provides access to the examples discussed in this chapter.
 
7
Other domains where the linked data principles have been applied, include, e.g., geography [20], biomedicine [1], cultural history (http://​www.​europeana.​eu) or government data (e.g., http://​data.​gov and http://​data.​gov.​uk).
 
8
For example, the W3C Semantic Web Activity reported on developments for Media Resources, Data Provenance and Microdata in the first 2 weeks of February 2012
 
Literatur
1.
Zurück zum Zitat Ashburner, M., Ball, C.A., et al.: Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)CrossRef Ashburner, M., Ball, C.A., et al.: Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)CrossRef
2.
Zurück zum Zitat Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics (ACL-1998), Montréal, pp. 86–90 (1998) Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics (ACL-1998), Montréal, pp. 86–90 (1998)
3.
Zurück zum Zitat Bird, S., Liberman, M.: A formal framework for linguistic annotation. Speech Commun. 33(1), 23–60 (2001)MATHCrossRef Bird, S., Liberman, M.: A formal framework for linguistic annotation. Speech Commun. 33(1), 23–60 (2001)MATHCrossRef
4.
Zurück zum Zitat Bizer, C., Heath, T., Berners-Lee, T.: Linked data – the story so far. Int. J. Semant. Web Inf. Syst. (IJSWIS) 5(3), 1–22 (2009) Bizer, C., Heath, T., Berners-Lee, T.: Linked data – the story so far. Int. J. Semant. Web Inf. Syst. (IJSWIS) 5(3), 1–22 (2009)
5.
Zurück zum Zitat Brandes, U., Eiglsperger, M., et al.: Graph markup language (GraphML). In: Tamassia, R. (ed.) Handbook of Graph Drawing and Visualization. Chapman & Hall/CRC, London (2010) Brandes, U., Eiglsperger, M., et al.: Graph markup language (GraphML). In: Tamassia, R. (ed.) Handbook of Graph Drawing and Visualization. Chapman & Hall/CRC, London (2010)
6.
Zurück zum Zitat Buil-Aranda, C., Arenas, M., Corcho, O.: Semantics and optimization of the SPARQL 1.1 federation extension. In: The Semantic Web: Research and Applications, pp. 1–15. Springer, Heraklion (2011) Buil-Aranda, C., Arenas, M., Corcho, O.: Semantics and optimization of the SPARQL 1.1 federation extension. In: The Semantic Web: Research and Applications, pp. 1–15. Springer, Heraklion (2011)
7.
Zurück zum Zitat Carletta, J., Evert, S., et al.: The NITE XML Toolkit: data model and query. Lang. Resour. Eval. J. (LREJ) 39(4), 313–334 (2005) Carletta, J., Evert, S., et al.: The NITE XML Toolkit: data model and query. Lang. Resour. Eval. J. (LREJ) 39(4), 313–334 (2005)
8.
Zurück zum Zitat Cassidy, S.: An RDF realisation of LAF in the DADA annotation server. In: Proceedings of the 5th Joint ISO-ACL/SIGSEM Workshop on Interoperable Semantic Annotation (ISO-5), Hong Kong (2010) Cassidy, S.: An RDF realisation of LAF in the DADA annotation server. In: Proceedings of the 5th Joint ISO-ACL/SIGSEM Workshop on Interoperable Semantic Annotation (ISO-5), Hong Kong (2010)
9.
Zurück zum Zitat Chiarcos, C.: An ontology of linguistic annotations. LDV Forum 23(1), 1–16 (2008) Chiarcos, C.: An ontology of linguistic annotations. LDV Forum 23(1), 1–16 (2008)
10.
Zurück zum Zitat Chiarcos, C.: Interoperability of corpora and annotations. In Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics, pp. 161–179. Springer, Heidelberg (2012)CrossRef Chiarcos, C.: Interoperability of corpora and annotations. In Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics, pp. 161–179. Springer, Heidelberg (2012)CrossRef
11.
Zurück zum Zitat Chiarcos, C., Dipper, S., et al.: A flexible framework for integrating annotations from different tools and tagsets. TAL (Traitement automatique des langues) 49(2), 217–246 (2008) Chiarcos, C., Dipper, S., et al.: A flexible framework for integrating annotations from different tools and tagsets. TAL (Traitement automatique des langues) 49(2), 217–246 (2008)
12.
Zurück zum Zitat Chiarcos, C., Hellmann, S., et al.: The open linguistics working group. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC-2012), Istanbul (2012a) Chiarcos, C., Hellmann, S., et al.: The open linguistics working group. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC-2012), Istanbul (2012a)
13.
Zurück zum Zitat Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.): Linked Data in Linguistics. Representing Language Data and Metadata. Springer, Heidelberg (2012b) Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.): Linked Data in Linguistics. Representing Language Data and Metadata. Springer, Heidelberg (2012b)
14.
Zurück zum Zitat Chiarcos, C., Ritz, J., Stede, M.: By all these lovely tokens …Merging conflicting tokenizations. J. Lang. Resour. Eval. (LREJ) 4(45), 53–74 (2012c) Chiarcos, C., Ritz, J., Stede, M.: By all these lovely tokens …Merging conflicting tokenizations. J. Lang. Resour. Eval. (LREJ) 4(45), 53–74 (2012c)
15.
Zurück zum Zitat Dipper, S.: XML-based stand-off representation and exploitation of multi-level linguistic annotation. In: Eckstein, R., Tolksdorf, R. (eds.) Proceedings of Berliner XML Tage 2005 (BXML-2005), Berlin, pp. 39–50 (2005) Dipper, S.: XML-based stand-off representation and exploitation of multi-level linguistic annotation. In: Eckstein, R., Tolksdorf, R. (eds.) Proceedings of Berliner XML Tage 2005 (BXML-2005), Berlin, pp. 39–50 (2005)
16.
Zurück zum Zitat Farrar, S., Langendoen, D.T.: An OWL-DL implementation of GOLD: an ontology for the Semantic Web. In: Witt, A., Metzing, D. (eds.) Linguistic Modeling of Information and Markup Languages. Springer, Dordrecht (2010) Farrar, S., Langendoen, D.T.: An OWL-DL implementation of GOLD: an ontology for the Semantic Web. In: Witt, A., Metzing, D. (eds.) Linguistic Modeling of Information and Markup Languages. Springer, Dordrecht (2010)
17.
18.
Zurück zum Zitat Fielding, R., Gettys, J., et al.: Hypertext transfer protocol – HTTP/1.1. Internet RFC 2068 (1997) Fielding, R., Gettys, J., et al.: Hypertext transfer protocol – HTTP/1.1. Internet RFC 2068 (1997)
19.
Zurück zum Zitat Francopoulo, G., George, M., et al.: Lexical markup framework (LMF). In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC-2006), Genoa (2006) Francopoulo, G., George, M., et al.: Lexical markup framework (LMF). In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC-2006), Genoa (2006)
20.
Zurück zum Zitat Goodwin, J., Dolbear, C., Hart, G.: Geographical linked data: the administrative geography of Great Britain on the Semantic Web. Trans. GIS 12, 19–30 (2008)CrossRef Goodwin, J., Dolbear, C., Hart, G.: Geographical linked data: the administrative geography of Great Britain on the Semantic Web. Trans. GIS 12, 19–30 (2008)CrossRef
21.
Zurück zum Zitat Guéret, C., Kotoulas, S., Groth, P.: TripleCloud: an infrastructure for exploratory querying over web-scale RDF data. In: Proceedings of the 2011 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT 2011), Lyon, pp. 245–248 (2011) Guéret, C., Kotoulas, S., Groth, P.: TripleCloud: an infrastructure for exploratory querying over web-scale RDF data. In: Proceedings of the 2011 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT 2011), Lyon, pp. 245–248 (2011)
22.
Zurück zum Zitat Gurevych, I., Eckle-Kohler, J., et al.: Uby – a large-scale unified lexical semantic resource based on LMF. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL-2012), Avignon, pp. 580–590 (2012) Gurevych, I., Eckle-Kohler, J., et al.: Uby – a large-scale unified lexical semantic resource based on LMF. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL-2012), Avignon, pp. 580–590 (2012)
23.
Zurück zum Zitat Hartig, O., Bizer, C., Freytag, J.C.: Executing SPARQL queries over the web of linked data. In: The Semantic Web – ISWC 2009, Heraklion, pp. 293–309 (2009) Hartig, O., Bizer, C., Freytag, J.C.: Executing SPARQL queries over the web of linked data. In: The Semantic Web – ISWC 2009, Heraklion, pp. 293–309 (2009)
24.
Zurück zum Zitat Holtman, K., Mutz, A.: Transparent content negotiation in HTTP. Internet RFC 2295 (1998) Holtman, K., Mutz, A.: Transparent content negotiation in HTTP. Internet RFC 2295 (1998)
25.
Zurück zum Zitat Ide, N., Pustejovsky, J.: What does interoperability mean, anyway? Toward an operational definition of interoperability. In: Proceedings of the 2nd International Conference on Global Interoperability for Language Resources (ICGL 2010), Hong Kong (2010) Ide, N., Pustejovsky, J.: What does interoperability mean, anyway? Toward an operational definition of interoperability. In: Proceedings of the 2nd International Conference on Global Interoperability for Language Resources (ICGL 2010), Hong Kong (2010)
26.
Zurück zum Zitat Ide, N., Suderman, K.: GrAF: A graph-based format for linguistic annotations. In: Proceedings of the First Linguistic Annotation Workshop (LAW 2007), Prague, pp. 1–8 (2007) Ide, N., Suderman, K.: GrAF: A graph-based format for linguistic annotations. In: Proceedings of the First Linguistic Annotation Workshop (LAW 2007), Prague, pp. 1–8 (2007)
27.
Zurück zum Zitat Ide, N., Le Maitre, J., Véronis, J.: Outline of a model for lexical databases. In: Zampolli, A., Calzolari, N., Palmer, M.S. (eds.) Current Issues in Computational Linguistics: In Honour of Don Walker, Giardini, pp. 283–320 (1995) Ide, N., Le Maitre, J., Véronis, J.: Outline of a model for lexical databases. In: Zampolli, A., Calzolari, N., Palmer, M.S. (eds.) Current Issues in Computational Linguistics: In Honour of Don Walker, Giardini, pp. 283–320 (1995)
28.
Zurück zum Zitat Ide, N., Fellbaum, C., et al.: The manually annotated sub-corpus: a community resource for and by the people. In: Proceedings of the ACL 2010 Conference Short Papers, Uppsala, pp. 68–73 (2010) Ide, N., Fellbaum, C., et al.: The manually annotated sub-corpus: a community resource for and by the people. In: Proceedings of the ACL 2010 Conference Short Papers, Uppsala, pp. 68–73 (2010)
29.
Zurück zum Zitat Klyne, G., Carroll, J.J, McBride, B.: Resource description framework (RDF): concepts and abstract syntax. Technical report, W3C Recommendation (2004) Klyne, G., Carroll, J.J, McBride, B.: Resource description framework (RDF): concepts and abstract syntax. Technical report, W3C Recommendation (2004)
30.
Zurück zum Zitat Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: the penn treebank. Comput. Linguist. 19(2), 313–330 (1994) Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: the penn treebank. Comput. Linguist. 19(2), 313–330 (1994)
31.
Zurück zum Zitat McCrae, J., Spohr, D., Cimiano, P.: Linking lexical resources and ontologies on the Semantic Web with Lemon. In: The Semantic Web: Research and Applications, Heraklion, pp. 245–259 (2011) McCrae, J., Spohr, D., Cimiano, P.: Linking lexical resources and ontologies on the Semantic Web with Lemon. In: The Semantic Web: Research and Applications, Heraklion, pp. 245–259 (2011)
32.
Zurück zum Zitat McCrae, J., Montiel-Ponsoda, E., Cimiano, P.: Collaborative semantic editing of linked data lexica. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC-2012), Istanbul (2012a) McCrae, J., Montiel-Ponsoda, E., Cimiano, P.: Collaborative semantic editing of linked data lexica. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC-2012), Istanbul (2012a)
33.
Zurück zum Zitat McCrae, J., Montiel-Ponsoda, E., Cimiano, P.: Integrating WordNet and wiktionary with lemon. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics, pp. 25–34, Springer, Heidelberg (2012b) McCrae, J., Montiel-Ponsoda, E., Cimiano, P.: Integrating WordNet and wiktionary with lemon. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics, pp. 25–34, Springer, Heidelberg (2012b)
34.
Zurück zum Zitat Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRef Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRef
35.
Zurück zum Zitat Prud’Hommeaux, E., Seaborne, A.: SPARQL query language for RDF. W3C working draft (2008) Prud’Hommeaux, E., Seaborne, A.: SPARQL query language for RDF. W3C working draft (2008)
36.
Zurück zum Zitat Quilitz, B., Leser, U.: Querying distributed RDF data sources with SPARQL. In: The Semantic Web: Research and Applications, pp. 524–538. Springer, Berlin/Heidelberg (2008) Quilitz, B., Leser, U.: Querying distributed RDF data sources with SPARQL. In: The Semantic Web: Research and Applications, pp. 524–538. Springer, Berlin/Heidelberg (2008)
37.
Zurück zum Zitat Schenk, S., Petrák, J.: Sesame RDF repository extensions for remote querying. In: Proceedings of the 7th Znalosti Conference (Znalosti-2008), Bratislava (2008) Schenk, S., Petrák, J.: Sesame RDF repository extensions for remote querying. In: Proceedings of the 7th Znalosti Conference (Znalosti-2008), Bratislava (2008)
38.
Zurück zum Zitat Shadbolt, N., Hall, W., Berners-Lee, T.: The semantic web revisited. IEEE Intell. Syst. 21(3), 96–101 (2006)CrossRef Shadbolt, N., Hall, W., Berners-Lee, T.: The semantic web revisited. IEEE Intell. Syst. 21(3), 96–101 (2006)CrossRef
39.
Zurück zum Zitat Van Assem, M., Gangemi, A., Schreiber, G.: Conversion of WordNet to a standard RDF/OWL representation. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC-2006), Genoa, pp. 237–242 (2006) Van Assem, M., Gangemi, A., Schreiber, G.: Conversion of WordNet to a standard RDF/OWL representation. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC-2006), Genoa, pp. 237–242 (2006)
40.
Zurück zum Zitat Véronis, J., Ide, N.: A feature-based model for lexical databases. In: Proceedings of the 14th International Conference on Computational Linguistics (COLING-1992), Nantes, pp. 588–594 (1992) Véronis, J., Ide, N.: A feature-based model for lexical databases. In: Proceedings of the 14th International Conference on Computational Linguistics (COLING-1992), Nantes, pp. 588–594 (1992)
41.
Zurück zum Zitat Windhouwer, M., Wright, S.E.: Linking to linguistic data categories in ISOcat. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics, pp. 99–107. Springer, Heidelberg (2012)CrossRef Windhouwer, M., Wright, S.E.: Linking to linguistic data categories in ISOcat. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics, pp. 99–107. Springer, Heidelberg (2012)CrossRef
Metadaten
Titel
Towards Open Data for Linguistics: Linguistic Linked Data
verfasst von
Christian Chiarcos
John McCrae
Philipp Cimiano
Christiane Fellbaum
Copyright-Jahr
2013
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-31782-8_2