Skip to main content
Erschienen in: Journal on Data Semantics 1/2014

01.03.2014 | Original Article

Assessing and Improving the Quality of SKOS Vocabularies

verfasst von: Osma Suominen, Christian Mader

Erschienen in: Journal on Data Semantics | Ausgabe 1/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Controlled vocabularies are increasingly made available on the Web of Data using the Simple Knowledge Organization System (SKOS) ontology. Assessment of vocabulary quality is important for determining the suitability of vocabularies for reuse in applications and for improving vocabulary development processes. We define 26 quality issues, i.e., computable functions that expose potential quality problems. In an analysis of a representative set of 24 SKOS vocabularies, we found all of them to contain structural errors and/or other quality problems. We propose a set of correction heuristics which we have used to automatically correct a significant proportion of the identified problems. Our reference implementations of these methods, the quality assessment tool qSKOS and the quality improvement tool Skosify, are available for reuse as open-source software.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
9
In particular, neither OWL nor OWL 2 include any means to express the integrity condition S14: “A resource has no more than one value of skos:prefLabel per language tag.”
 
11
e.g., public-esw-thes@w3.org and public-lod@w3.org.
 
17
The script, sparqldump.py, is included in the Skosify  distribution.
 
19
Missing namespace declarations were added manually for UMBEL. In NYTL, the invalid language tag fr_1793 was manually changed into fr-1793 in order to comply with BCP47 and the Turtle specification. In Reegle, an unparseable line in the original RDF dump was manually removed. For GEMET, the source file containing Arabic labels was excluded as it contained labels with improper Unicode encoding that caused the Jena toolkit to fail in parsing it.
 
20
The Turtle files were condensed by removing extra whitespace, including all indentation, and using short 0–2 character namespace prefixes.
 
21
Typographical note: words set in typewriter style that do not include a namespace prefix, such as Concept and prefLabel, refer to terms defined by SKOS [28].
 
25
http://​sindice.​com/​ indexes the Web of Data, which is composed of pages with semantic markup in RDF, RDFa, Microformats or Microdata. Currently, it covers approximately 230 M documents with over 11 billion triples.
 
26
http://​datahub.​io/​ is a “community-run catalogue” of currently 5,045 datasets, many of them following the Linked Data guidelines.
 
30
SKOS-XL is an extension schema to SKOS that enhances the labeling capabilities by treating labels as resources and not as literals.
 
31
TheSoz Thesaurus for the Social Sciences, http://​datahub.​io/​dataset/​gesis-thesoz
 
32
In the most common case, there is only one concept scheme (often the one created in the previous step), and that will be selected as the default concept scheme; otherwise, the default concept scheme will be chosen arbitrarily and a warning message shown by Skosify.
 
Literatur
1.
Zurück zum Zitat ISO 25964–1 (2011) Information and documentation—Thesauri and interoperability with other vocabularies—Part 1: Thesauri for information retrieval. Norm, International Organization for Standardization ISO 25964–1 (2011) Information and documentation—Thesauri and interoperability with other vocabularies—Part 1: Thesauri for information retrieval. Norm, International Organization for Standardization
2.
Zurück zum Zitat Abdul Manaf NA, Bechhofer S, Stevens R (2012) Common modelling slips in SKOS vocabularies. In: Klinov P, Horridge M (eds) Proceedings of OWL: experiences and directions workshop (OWLED 2012), CEUR Workshop Proceedings, vol 849. CEUR-WS.org. http://ceur-ws.org/Vol-849/paper_2.pdf Abdul Manaf NA, Bechhofer S, Stevens R (2012) Common modelling slips in SKOS vocabularies. In: Klinov P, Horridge M (eds) Proceedings of OWL: experiences and directions workshop (OWLED 2012), CEUR Workshop Proceedings, vol 849. CEUR-WS.org. http://​ceur-ws.​org/​Vol-849/​paper_​2.​pdf
3.
Zurück zum Zitat Abdul Manaf NA, Bechhofer S, Stevens R (2012) The current state of SKOS vocabularies on the Web. In: Simperl E, Cimiano P, Polleres A, Corcho O, Presutti V (eds) Proceedings of the 9th extended semantic web conference (ESWC 2012), Lecture notes in computer science, vol 7295. Springer, Berlin, pp 270–284 Abdul Manaf NA, Bechhofer S, Stevens R (2012) The current state of SKOS vocabularies on the Web. In: Simperl E, Cimiano P, Polleres A, Corcho O, Presutti V (eds) Proceedings of the 9th extended semantic web conference (ESWC 2012), Lecture notes in computer science, vol 7295. Springer, Berlin, pp 270–284
4.
Zurück zum Zitat Aitchison J, Gilchrist A, Bawden D (2000) Thesaurus construction and use: a practical manual. Aslib IMI, London Aitchison J, Gilchrist A, Bawden D (2000) Thesaurus construction and use: a practical manual. Aslib IMI, London
5.
Zurück zum Zitat Allemang D, Hendler J (2011) Semantic web for the working ontologist: effective modeling in RDFS and OWL. Morgan Kaufmann, Los Altos Allemang D, Hendler J (2011) Semantic web for the working ontologist: effective modeling in RDFS and OWL. Morgan Kaufmann, Los Altos
6.
Zurück zum Zitat van Assem M, Malaisé V, Miles A, Schreiber G (2006) A method to convert thesauri to SKOS. In: Sure Y, Domingue J (eds) Proceedings of the third European semantic web conference (ESWC’06). Lecture notes in computer science, vol 4011. Springer, Berlin, pp 95–109 van Assem M, Malaisé V, Miles A, Schreiber G (2006) A method to convert thesauri to SKOS. In: Sure Y, Domingue J (eds) Proceedings of the third European semantic web conference (ESWC’06). Lecture notes in computer science, vol 4011. Springer, Berlin, pp 95–109
7.
Zurück zum Zitat Batini C, Cappiello C, Francalanci C, Maurino A (2009) Methodologies for data quality assessment and improvement. ACM Comput Surv 41(3):16CrossRef Batini C, Cappiello C, Francalanci C, Maurino A (2009) Methodologies for data quality assessment and improvement. ACM Comput Surv 41(3):16CrossRef
8.
Zurück zum Zitat Berrueta D, Fernández S, Frade I (2008) Cooking HTTP content negotiation with Vapour. In: Bizer C, Auer S, Aastrand Grimnes G, Heath T (eds) Proceedings of the 4th workshop on scripting for the semantic web (SFSW 2008). CEUR Workshop Proceedings, vol 368. CEUR-WS.org. http://CEUR-WS.org/Vol-368/paper3.pdf Berrueta D, Fernández S, Frade I (2008) Cooking HTTP content negotiation with Vapour. In: Bizer C, Auer S, Aastrand Grimnes G, Heath T (eds) Proceedings of the 4th workshop on scripting for the semantic web (SFSW 2008). CEUR Workshop Proceedings, vol 368. CEUR-WS.org. http://​CEUR-WS.​org/​Vol-368/​paper3.​pdf
10.
Zurück zum Zitat Borst T, Fingerle B, Neubert J, Seiler A (2010) How do libraries find their way onto the semantic web? Liber Q 19(3/4) Borst T, Fingerle B, Neubert J, Seiler A (2010) How do libraries find their way onto the semantic web? Liber Q 19(3/4)
12.
Zurück zum Zitat de Coronado S, Wright LW, Fragoso G, Haber MW, Hahn-Dantona EA, Hartel FW, Quan SL, Safran T, Thomas N, Whiteman L (2009) The NCI thesaurus quality assurance life cycle. J Biomed Inform 42(3):530–539CrossRef de Coronado S, Wright LW, Fragoso G, Haber MW, Hahn-Dantona EA, Hartel FW, Quan SL, Safran T, Thomas N, Whiteman L (2009) The NCI thesaurus quality assurance life cycle. J Biomed Inform 42(3):530–539CrossRef
13.
Zurück zum Zitat Ding L, Finin T (2006) Characterizing the semantic web on the web. Electr Eng 4273(August):5–9 Ding L, Finin T (2006) Characterizing the semantic web on the web. Electr Eng 4273(August):5–9
14.
Zurück zum Zitat Fürber C, Hepp M (2010) Using semantic web resources for data quality management. In: Proceedings of the 17th international conference on knowledge engineering and management by the masses (EKAW 2010). Lecture notes in computer science, vol 6317. Springer, Berlin, pp 211–225 Fürber C, Hepp M (2010) Using semantic web resources for data quality management. In: Proceedings of the 17th international conference on knowledge engineering and management by the masses (EKAW 2010). Lecture notes in computer science, vol 6317. Springer, Berlin, pp 211–225
15.
Zurück zum Zitat Harpring P (2010) Introduction to controlled vocabularies: terminology for art, architecture, and other cultural works. Getty Publications, Los Angeles Harpring P (2010) Introduction to controlled vocabularies: terminology for art, architecture, and other cultural works. Getty Publications, Los Angeles
17.
Zurück zum Zitat Hedden H (2010) The accidental taxonomist. Inf Today Hedden H (2010) The accidental taxonomist. Inf Today
18.
Zurück zum Zitat Hogan A, Harth A, Passant A, Decker S, Polleres A (2010) Weaving the pedantic web. In: Bizer C, Heath T, Berners-Lee T, Hausenblas M (eds) Proceedings of WWW2010 workshop on linked data on the web (LDOW 2010). CEUR Workshop Proceedings, vol 628. EUR-WS.org. http://ceurws.org/Vol-628/ldow2010_paper04.pdf Hogan A, Harth A, Passant A, Decker S, Polleres A (2010) Weaving the pedantic web. In: Bizer C, Heath T, Berners-Lee T, Hausenblas M (eds) Proceedings of WWW2010 workshop on linked data on the web (LDOW 2010). CEUR Workshop Proceedings, vol 628. EUR-WS.org. http://​ceurws.​org/​Vol-628/​ldow2010_​paper04.​pdf
19.
Zurück zum Zitat Hogan A, Umbrich J, Harth A, Cyganiak R, Polleres A, Decker S (2012) An empirical survey of linked data conformance. Web Semant Sci Serv Agents World Wide Web 14:14–44CrossRef Hogan A, Umbrich J, Harth A, Cyganiak R, Polleres A, Decker S (2012) An empirical survey of linked data conformance. Web Semant Sci Serv Agents World Wide Web 14:14–44CrossRef
20.
Zurück zum Zitat Hopcroft JE, Tarjan RE (1973) Algorithm 447: efficient algorithms for graph manipulation. Commun ACM 16(6):372–378CrossRef Hopcroft JE, Tarjan RE (1973) Algorithm 447: efficient algorithms for graph manipulation. Commun ACM 16(6):372–378CrossRef
21.
Zurück zum Zitat Horridge M, Parsia B, Sattler U (2009) Explaining inconsistencies in OWL ontologies. In: Godo L, Pugliese A (eds) Proceedings of the 3rd international conference on scalable uncertainty management (SUM ’09). Lecture notes in computer science, vol 5785. Springer, Berlin, pp 124–137. doi:10.1007/978-3-642-04388-8_11 Horridge M, Parsia B, Sattler U (2009) Explaining inconsistencies in OWL ontologies. In: Godo L, Pugliese A (eds) Proceedings of the 3rd international conference on scalable uncertainty management (SUM ’09). Lecture notes in computer science, vol 5785. Springer, Berlin, pp 124–137. doi:10.​1007/​978-3-642-04388-8_​11
23.
Zurück zum Zitat Kalyanpur A (2006) Debugging and repair of OWL ontologies. Ph.D. thesis, University of Maryland, College Park, MD, USA Kalyanpur A (2006) Debugging and repair of OWL ontologies. Ph.D. thesis, University of Maryland, College Park, MD, USA
24.
Zurück zum Zitat Kless D, Milton S (2010) Towards quality measures for evaluating thesauri. In: Sánchez-Alonso S, Athanasiadis I (eds) Proceedings of the 4th metadata and semantics research conference (MTSR 2010) Communications in computer and information science, vol 108. Springer, Berlin, pp 312–319. doi:10.1007/978-3-642-16552-8_28 Kless D, Milton S (2010) Towards quality measures for evaluating thesauri. In: Sánchez-Alonso S, Athanasiadis I (eds) Proceedings of the 4th metadata and semantics research conference (MTSR 2010) Communications in computer and information science, vol 108. Springer, Berlin, pp 312–319. doi:10.​1007/​978-3-642-16552-8_​28
26.
Zurück zum Zitat Mader C, Haslhofer B, Isaac A (2012) Finding quality issues in SKOS vocabularies. In: Zaphiris P, Buchanan G, Rasmussen E, Loizides F (eds) Proceedings of the second international conference on theory and practice of digital libraries (TPDL 2012). Lecture notes in computer science, vol 7489. Springer, Berlin, pp 222–233. doi:10.1007/978-3-642-33290-6_25 Mader C, Haslhofer B, Isaac A (2012) Finding quality issues in SKOS vocabularies. In: Zaphiris P, Buchanan G, Rasmussen E, Loizides F (eds) Proceedings of the second international conference on theory and practice of digital libraries (TPDL 2012). Lecture notes in computer science, vol 7489. Springer, Berlin, pp 222–233. doi:10.​1007/​978-3-642-33290-6_​25
27.
Zurück zum Zitat Malmsten M (2008) Making a library catalogue part of the semantic web. In: Greenberg J, Klas W (eds) Metadata for semantic and social applications. Proceedings of the international conference on Dublin core and metadata applications (DC-2008). Universitätsverlag Göttingen, Göttingen, Germany, pp 146–152 Malmsten M (2008) Making a library catalogue part of the semantic web. In: Greenberg J, Klas W (eds) Metadata for semantic and social applications. Proceedings of the international conference on Dublin core and metadata applications (DC-2008). Universitätsverlag Göttingen, Göttingen, Germany, pp 146–152
30.
Zurück zum Zitat Mougin F, Bodenreider O (2005) Approaches to eliminating cycles in the UMLS Metathesaurus: naïve vs. formal. In: Proceedings of the AMIA annual symposium, vol 2005. American Medical Informatics Association, pp 550–554 Mougin F, Bodenreider O (2005) Approaches to eliminating cycles in the UMLS Metathesaurus: naïve vs. formal. In: Proceedings of the AMIA annual symposium, vol 2005. American Medical Informatics Association, pp 550–554
31.
Zurück zum Zitat Nagy H, Pellegrini T, Mader C (2011) Exploring structural differences in thesauri for SKOS-based applications. In: Chidini C, Ngonga Ngomo Ac, Lindstaedt S, Pellegrini T (eds) Proceedings of the 7th international conference on semantic systems (I-Semantics ’11), New York, pp 187–190. doi:10.1145/2063518.2063546 Nagy H, Pellegrini T, Mader C (2011) Exploring structural differences in thesauri for SKOS-based applications. In: Chidini C, Ngonga Ngomo Ac, Lindstaedt S, Pellegrini T (eds) Proceedings of the 7th international conference on semantic systems (I-Semantics ’11), New York, pp 187–190. doi:10.​1145/​2063518.​2063546
33.
Zurück zum Zitat NISO (2005) ANSI/NISO Z39.19—guidelines for the construction, format, and management of monolingual controlled vocabularies. Standard, National Information Standards Organization NISO (2005) ANSI/NISO Z39.19—guidelines for the construction, format, and management of monolingual controlled vocabularies. Standard, National Information Standards Organization
34.
Zurück zum Zitat Ovchinnikova E, Wandmacher T, Kühnberger K (2007) Solving terminological inconsistency problems in ontology design. Int J Interoperabil Bus Inf Syst 2(1):65–80 Ovchinnikova E, Wandmacher T, Kühnberger K (2007) Solving terminological inconsistency problems in ontology design. Int J Interoperabil Bus Inf Syst 2(1):65–80
35.
Zurück zum Zitat Pipino L, Lee Y, Wang R (2002) Data quality assessment. Commun ACM 45(4):211–218CrossRef Pipino L, Lee Y, Wang R (2002) Data quality assessment. Commun ACM 45(4):211–218CrossRef
36.
Zurück zum Zitat Popitsch NP, Haslhofer B (2010) DSNotify: handling broken links in the web of data. In: Proceedings of the 19th international conference on World Wide Web (WWW 2010). ACM, New York, pp 761–770. doi:10.1145/1772690.1772768 Popitsch NP, Haslhofer B (2010) DSNotify: handling broken links in the web of data. In: Proceedings of the 19th international conference on World Wide Web (WWW 2010). ACM, New York, pp 761–770. doi:10.​1145/​1772690.​1772768
37.
Zurück zum Zitat Poveda-Villalón M, Suárez-Figueroa M, Gómez-Pérez A (2012) Validating ontologies with OOPS! In: Teije A, Völker J, Handschuh S, Stuckenschmidt H, d’Aquin M, Nikolov A, Aussenac-Gilles N, Hernandez N (eds) Proceedings of the 18th international conference on knowledge engineering and knowledge management (EKAW 2012). Lecture notes in computer science, vol 7603. Springer, Berlin, pp 267–281. doi:10.1007/978-3-642-33876-2_24 Poveda-Villalón M, Suárez-Figueroa M, Gómez-Pérez A (2012) Validating ontologies with OOPS! In: Teije A, Völker J, Handschuh S, Stuckenschmidt H, d’Aquin M, Nikolov A, Aussenac-Gilles N, Hernandez N (eds) Proceedings of the 18th international conference on knowledge engineering and knowledge management (EKAW 2012). Lecture notes in computer science, vol 7603. Springer, Berlin, pp 267–281. doi:10.​1007/​978-3-642-33876-2_​24
38.
Zurück zum Zitat Schandl T, Blumauer A (2010) PoolParty: SKOS thesaurus management utilizing linked data. In: Aroyo L, Antoniou G, Hyvönen E, ten Teije A, Stuckenschmidt H, Cabral L, Tudorache T (eds) Proceedings of the 7th extended semantic web conference (ESWC2010). Lecture notes in computer science, vol 6088. Springer, Berlin, pp 421–425 Schandl T, Blumauer A (2010) PoolParty: SKOS thesaurus management utilizing linked data. In: Aroyo L, Antoniou G, Hyvönen E, ten Teije A, Stuckenschmidt H, Cabral L, Tudorache T (eds) Proceedings of the 7th extended semantic web conference (ESWC2010). Lecture notes in computer science, vol 6088. Springer, Berlin, pp 421–425
39.
Zurück zum Zitat Soergel D (2002) Thesauri and ontologies in digital libraries: tutorial. In: Proceedings of the 2nd ACM/IEEE-CS joint conference on digital libraries (JCDL 2002). ACM, New York, p 415 Soergel D (2002) Thesauri and ontologies in digital libraries: tutorial. In: Proceedings of the 2nd ACM/IEEE-CS joint conference on digital libraries (JCDL 2002). ACM, New York, p 415
40.
Zurück zum Zitat Summers E, Isaac A, Redding C, Krech D (2008) LCSH, SKOS and Linked Data. In: Greenberg J, Klas W (eds) Metadata for semantic and social applications. Proceedings of the International Conference on Dublin Core and Metadata Applications (DC-2008). Universitätsverlag Göttingen, Göttingen, pp 25–33 Summers E, Isaac A, Redding C, Krech D (2008) LCSH, SKOS and Linked Data. In: Greenberg J, Klas W (eds) Metadata for semantic and social applications. Proceedings of the International Conference on Dublin Core and Metadata Applications (DC-2008). Universitätsverlag Göttingen, Göttingen, pp 25–33
41.
Zurück zum Zitat Suominen O, Hyvönen E (2012) Improving the quality of SKOS vocabularies with Skosify. In: Aroyo L, Antoniou G, Hyvönen E, ten Teije A, Stuckenschmidt H, Cabral L, Tudorache T (eds) Proceedings of the 18th international conference on knowledge engineering and knowledge management, (EKAW 2012). Lecture notes in computer science, vol 7603. Springer, Berlin, pp 383–397. doi:10.1007/978-3-642-33876-2_34 Suominen O, Hyvönen E (2012) Improving the quality of SKOS vocabularies with Skosify. In: Aroyo L, Antoniou G, Hyvönen E, ten Teije A, Stuckenschmidt H, Cabral L, Tudorache T (eds) Proceedings of the 18th international conference on knowledge engineering and knowledge management, (EKAW 2012). Lecture notes in computer science, vol 7603. Springer, Berlin, pp 383–397. doi:10.​1007/​978-3-642-33876-2_​34
42.
Zurück zum Zitat Svenonius E (1997) Definitional approaches in the design of classification and thesauri and their implications for retrieval and for automatic classification. In: Knowledge Organization for Information Retrieval: Proceedings of the 6th international study conference on classification research. International Federation for information and documentation, pp 12–16 Svenonius E (1997) Definitional approaches in the design of classification and thesauri and their implications for retrieval and for automatic classification. In: Knowledge Organization for Information Retrieval: Proceedings of the 6th international study conference on classification research. International Federation for information and documentation, pp 12–16
43.
Zurück zum Zitat Tuominen J, Frosterus M, Viljanen K, Hyvönen E (2009) ONKI SKOS server for publishing and utilizing SKOS vocabularies and ontologies as services. In: Aroyo L, Traverso P, Ciravegna F, Cimiano P, Heath T, Hyvönen E, Mizoguchi R, Oren E, Sabou M, Simperl E (eds) Proceedings of the 6th European semantic web conference (ESWC 2009). Lecture notes in computer science, vol 5554. Springer, Berlin, pp 768–780 Tuominen J, Frosterus M, Viljanen K, Hyvönen E (2009) ONKI SKOS server for publishing and utilizing SKOS vocabularies and ontologies as services. In: Aroyo L, Traverso P, Ciravegna F, Cimiano P, Heath T, Hyvönen E, Mizoguchi R, Oren E, Sabou M, Simperl E (eds) Proceedings of the 6th European semantic web conference (ESWC 2009). Lecture notes in computer science, vol 5554. Springer, Berlin, pp 768–780
44.
Zurück zum Zitat Vrandecic D (2010) Ontology evaluation. Ph.D. thesis, KIT, Fakultät für Wirtschaftswissenschaften, Karlsruhe Vrandecic D (2010) Ontology evaluation. Ph.D. thesis, KIT, Fakultät für Wirtschaftswissenschaften, Karlsruhe
Metadaten
Titel
Assessing and Improving the Quality of SKOS Vocabularies
verfasst von
Osma Suominen
Christian Mader
Publikationsdatum
01.03.2014
Verlag
Springer Berlin Heidelberg
Erschienen in
Journal on Data Semantics / Ausgabe 1/2014
Print ISSN: 1861-2032
Elektronische ISSN: 1861-2040
DOI
https://doi.org/10.1007/s13740-013-0026-0