Skip to main content
Erschienen in: International Journal on Digital Libraries 2/2019

30.01.2018

Benchmarking and evaluating the interpretation of bibliographic records

verfasst von: Trond Aalberg, Fabien Duchateau, Naimdjon Takhirov, Joffrey Decourselle, Nicolas Lumineau

Erschienen in: International Journal on Digital Libraries | Ausgabe 2/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In a global context which promotes the use of explicit semantics for sharing information and developing new services, the MAchine Readable Cataloguing (MARC) format that is commonly used by libraries worldwide has demonstrated its many limitations. The conceptual reference model for bibliographic information presented in the Functional Requirements for Bibliographic Records (FRBR) is expected to be the foundation for a new generation of catalogs that will replace MARC and the digital card catalog. The need for transformation of legacy MARC records to FRBR representation (FRBRization) has led to the proposal of various tools and approaches. However, these projects and the results they achieve are difficult to compare due to lack of common datasets and well defined and appropriate metrics. Our contributions fill this gap by proposing BIB-R, the first public benchmark for the FRBRization process. It is composed of two datasets that enable the identification of the strengths and weaknesses of a FRBRization tool. It also defines a set of well defined metrics that evaluate the different steps of the FRBRization process. Those resources, as well as the results of a large experiment involving three FRBRization tools tested against our benchmark, are available to the community under an open licence.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
6
We use “Person” in our examples for the sake of readability. The initial FRBR model also includes a Corporate Body entity type. In the revised Library Reference Model the proper supertype is “Agent”.
 
7
“Concept” is used as a categorical supertype for anything that can be the subject.
 
10
Note that the extraction can have a higher complexity in specific cases, such as when records contain references to other record(s) which needs to be looked up during the extraction.
 
11
Presentation at code4lib 2011 about improving the performance of eXtensible Catalog’s deduplication module, http://​www.​extensiblecatalo​g.​org/​learnmore/​publications.
 
12
Our expert collections include specific annotations for each element of the patterns, else it would not be possible to compute the metrics MEND, MRND and ESE.
 
14
BIB-RCAT is a recursive acronym that stands for “BIB-RCAT Is Basically a Real-world CATalogue”.
 
16
https://​github.​com/​naimdjon/​marc2frbr FRBR-ML tool, previously named marc2frbr.
 
18
https://​github.​com/​naimdjon/​vfrbr-frbrize-marc Variations VFRBR tool (adjusted version, only to facilitate compilation).
 
19
The tests have been chosen according to a sequential order (remind that test 5.4 does not exist). The analysis of the results is, however, not limited to this subset.
 
20
Note that XC does not create Agent and Concepts entities, but it rather adds properties within the main Work or Expression. Our evaluation takes this specificity into account and XC is not penalized when a property and its associated value correctly represent the Agent or the Concept.
 
21
Note that the category patterns 4.x (aggregations) and 5.x (complementary works) do not have secondary elements and all tools achieve a 0% ESE score for these tests.
 
22
Note that the expert had knowledge about the proposed metrics, and the given time may increase for people who need to understand the concepts behind these metrics.
 
Literatur
3.
Zurück zum Zitat Aalberg, T., Žumer, M.: The value of MARC Data, or, challenges of FRBRisation. J. Doc. 69, 851–872 (2013)CrossRef Aalberg, T., Žumer, M.: The value of MARC Data, or, challenges of FRBRisation. J. Doc. 69, 851–872 (2013)CrossRef
4.
Zurück zum Zitat Alemu, G., Stevens, B., Ross, P., Chandler, J.: Linked data for libraries: benefits of a conceptual shift from library-specific record structures to RDF-based data models. New Libr. World 113, 549–570 (2012)CrossRef Alemu, G., Stevens, B., Ross, P., Chandler, J.: Linked data for libraries: benefits of a conceptual shift from library-specific record structures to RDF-based data models. New Libr. World 113, 549–570 (2012)CrossRef
5.
Zurück zum Zitat Alexe, B., Tan, W.C., Velegrakis, Y.: STBenchmark: towards a benchmark for mapping systems. Proc. VLDB 1(1), 230–244 (2008)CrossRef Alexe, B., Tan, W.C., Velegrakis, Y.: STBenchmark: towards a benchmark for mapping systems. Proc. VLDB 1(1), 230–244 (2008)CrossRef
7.
Zurück zum Zitat Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol) 57, 289–300 (1995)MathSciNetMATH Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol) 57, 289–300 (1995)MathSciNetMATH
9.
Zurück zum Zitat Bowen, J.: Moving library metadata toward linked data: opportunities provided by the eXtensible catalog. In: International Conference on Dublin Core and Metadata Applications (2010) Bowen, J.: Moving library metadata toward linked data: opportunities provided by the eXtensible catalog. In: International Conference on Dublin Core and Metadata Applications (2010)
11.
Zurück zum Zitat Chang, N., Tsai, Y., Dunsire, G., Hopkinson, A.: Experimenting with implementing FRBR in a Chinese Koha system. Libr. Hi Tech News 30, 10–20 (2013)CrossRef Chang, N., Tsai, Y., Dunsire, G., Hopkinson, A.: Experimenting with implementing FRBR in a Chinese Koha system. Libr. Hi Tech News 30, 10–20 (2013)CrossRef
12.
Zurück zum Zitat Christen, P., Goiser, K.: Quality and complexity measures for data linkage and deduplication. Qual. Meas. Data Min. 43, 127–151 (2007) Christen, P., Goiser, K.: Quality and complexity measures for data linkage and deduplication. Qual. Meas. Data Min. 43, 127–151 (2007)
13.
Zurück zum Zitat Committee, S., Group, I.S.: Functional Requirements for Bibliographic Records: Final Report, vol. 19. K. G. Saur (1998) Committee, S., Group, I.S.: Functional Requirements for Bibliographic Records: Final Report, vol. 19. K. G. Saur (1998)
14.
Zurück zum Zitat Coyle, K.: FRBR, twenty years on. Cat. Classif. Q. 57, 1–21 (2014) Coyle, K.: FRBR, twenty years on. Cat. Classif. Q. 57, 1–21 (2014)
17.
Zurück zum Zitat Decourselle, J., Duchateau, F., Aalberg, T., Takhirov, N., Lumineau, N.: Open datasets for evaluating the interpretation of bibliographic records. In: Proceedings of Joint Conference on Digital Libraries. ACM (2016) Decourselle, J., Duchateau, F., Aalberg, T., Takhirov, N., Lumineau, N.: Open datasets for evaluating the interpretation of bibliographic records. In: Proceedings of Joint Conference on Digital Libraries. ACM (2016)
19.
Zurück zum Zitat Denton, W.: FRBR and the History of Cataloging. In: Taylor, A.G. (ed.) Understanding FRBR: What it is and How it Will Affect Our Retrieval Tools (2007) Libraries Unlimited, Westport Denton, W.: FRBR and the History of Cataloging. In: Taylor, A.G. (ed.) Understanding FRBR: What it is and How it Will Affect Our Retrieval Tools (2007) Libraries Unlimited, Westport
20.
Zurück zum Zitat Dickey, T.J.: FRBRization of a library catalog: better collocation of records, leading to enhanced search, retrieval, and display. Inf. Technol. Libr. 27, 23–32 (2008) Dickey, T.J.: FRBRization of a library catalog: better collocation of records, leading to enhanced search, retrieval, and display. Inf. Technol. Libr. 27, 23–32 (2008)
21.
Zurück zum Zitat Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007)CrossRef Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007)CrossRef
22.
Zurück zum Zitat Euzenat, J., Rosoiu, M.E., Trojahn, C.: Ontology matching benchmarks: generation, stability, and discriminability. Web Semant. Sci. Serv. Agents World Wide Web 21, 30–48 (2013)CrossRef Euzenat, J., Rosoiu, M.E., Trojahn, C.: Ontology matching benchmarks: generation, stability, and discriminability. Web Semant. Sci. Serv. Agents World Wide Web 21, 30–48 (2013)CrossRef
23.
Zurück zum Zitat Hickey, T., Vizine-Goetz, D.: Implementing FRBR on Large Databases. OCLC, Dublin (2002) Hickey, T., Vizine-Goetz, D.: Implementing FRBR on Large Databases. OCLC, Dublin (2002)
24.
Zurück zum Zitat Hickey, T.B., O’Neill, E.T.: FRBRizing OCLC’s WorldCat. Cat. Classif. Q. 39, 239–251 (2005) Hickey, T.B., O’Neill, E.T.: FRBRizing OCLC’s WorldCat. Cat. Classif. Q. 39, 239–251 (2005)
27.
Zurück zum Zitat Kilner, K.: The AustLit gateway and scholarly bibliography: a specialist implementation of the FRBR. Cat. Classif. Q. 39, 87–102 (2005) Kilner, K.: The AustLit gateway and scholarly bibliography: a specialist implementation of the FRBR. Cat. Classif. Q. 39, 87–102 (2005)
28.
Zurück zum Zitat Kroeger, A.: The road to bibframe: the evolution of the idea of bibliographic transition into a post-marc future. Cat. Classif. Q. 51(8), 873–890 (2013) Kroeger, A.: The road to bibframe: the evolution of the idea of bibliographic transition into a post-marc future. Cat. Classif. Q. 51(8), 873–890 (2013)
29.
Zurück zum Zitat Le Bœuf, P.: Customized OPACs on the Semantic Web: the OpenCat prototype. IFLA World Library and Information Congress, pp. 1–15 (2013) Le Bœuf, P.: Customized OPACs on the Semantic Web: the OpenCat prototype. IFLA World Library and Information Congress, pp. 1–15 (2013)
30.
Zurück zum Zitat Leopold, C.: Parallel and Distributed Computing: A Survey of Models, Paradigms and Approaches. Wiley, Hoboken (2001) Leopold, C.: Parallel and Distributed Computing: A Survey of Models, Paradigms and Approaches. Wiley, Hoboken (2001)
32.
Zurück zum Zitat Manguinhas, H.M.A., Freire, N.M.A., Borbinha, J.L.B.: FRBRization of MARC records in multiple catalogs. In: Hunter, J., Lagoze, C., Giles, C.L., Li, Y.F. (eds.) JCDL, pp. 225–234. ACM (2010) Manguinhas, H.M.A., Freire, N.M.A., Borbinha, J.L.B.: FRBRization of MARC records in multiple catalogs. In: Hunter, J., Lagoze, C., Giles, C.L., Li, Y.F. (eds.) JCDL, pp. 225–234. ACM (2010)
33.
Zurück zum Zitat Minadakis, N., Marketakis, Y., Kondylakis, H., Flouris, G., Theodoridou, M., Doerr, M., de Jong, G.: X3ml framework: an effective suite for supporting data mappings. In: Workshop for Extending, Mapping and Focusing the CRM—Co-located with TPDL (2015) Minadakis, N., Marketakis, Y., Kondylakis, H., Flouris, G., Theodoridou, M., Doerr, M., de Jong, G.: X3ml framework: an effective suite for supporting data mappings. In: Workshop for Extending, Mapping and Focusing the CRM—Co-located with TPDL (2015)
34.
Zurück zum Zitat Norman, D.A.: The Design of Everyday Things: Revised and, Expanded edn. Basic Books, New York (2013) Norman, D.A.: The Design of Everyday Things: Revised and, Expanded edn. Basic Books, New York (2013)
35.
Zurück zum Zitat Notess, M., Dunn, J.W., Hardesty, J.L.: Scherzo: A FRBR-Based Music Discovery System. In: International Conference on Dublin Core and Metadata Applications, pp. 182–183 (2011) Notess, M., Dunn, J.W., Hardesty, J.L.: Scherzo: A FRBR-Based Music Discovery System. In: International Conference on Dublin Core and Metadata Applications, pp. 182–183 (2011)
38.
Zurück zum Zitat Putz, M., Schaffner, V., Seidler, W.: FRBR: the MAB2 perspective. Cat. Classif. Q. 50, 387–401 (2012) Putz, M., Schaffner, V., Seidler, W.: FRBR: the MAB2 perspective. Cat. Classif. Q. 50, 387–401 (2012)
39.
Zurück zum Zitat Riley, J.: Enhancing interoperability of FRBR-based metadata. In: International Conference on Dublin Core and Metadata Applications (2010) Riley, J.: Enhancing interoperability of FRBR-based metadata. In: International Conference on Dublin Core and Metadata Applications (2010)
40.
Zurück zum Zitat Riva, P.: Mapping MARC 21 linking entry fields to FRBR and Tillett’s taxonomy of bibliographic relationships. Libr. Resour. Tech. Serv. 48(2), 130–143 (2013) Riva, P.: Mapping MARC 21 linking entry fields to FRBR and Tillett’s taxonomy of bibliographic relationships. Libr. Resour. Tech. Serv. 48(2), 130–143 (2013)
44.
Zurück zum Zitat Takhirov, N., Aalberg, T., Duchateau, F., Žumer, M.: FRBR-ML: a FRBR-based framework for semantic interoperability. Semant. Web J. 3, 23–43 (2012) Takhirov, N., Aalberg, T., Duchateau, F., Žumer, M.: FRBR-ML: a FRBR-based framework for semantic interoperability. Semant. Web J. 3, 23–43 (2012)
46.
Zurück zum Zitat Vila-Suero, D., Villazón-Terrazas, B.,: Gómez-Pérez, A.: datos. bne. es: a library linked dataset. Semant. Web 4(3), 307–313 (2013) Vila-Suero, D., Villazón-Terrazas, B.,: Gómez-Pérez, A.: datos. bne. es: a library linked dataset. Semant. Web 4(3), 307–313 (2013)
47.
Zurück zum Zitat Zhang, Y., Salaba, A.: Implementing FRBR in Libraries: Key Issues and Future Directions. Neal-Schuman Publishers, New York (2009) Zhang, Y., Salaba, A.: Implementing FRBR in Libraries: Key Issues and Future Directions. Neal-Schuman Publishers, New York (2009)
Metadaten
Titel
Benchmarking and evaluating the interpretation of bibliographic records
verfasst von
Trond Aalberg
Fabien Duchateau
Naimdjon Takhirov
Joffrey Decourselle
Nicolas Lumineau
Publikationsdatum
30.01.2018
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Digital Libraries / Ausgabe 2/2019
Print ISSN: 1432-5012
Elektronische ISSN: 1432-1300
DOI
https://doi.org/10.1007/s00799-018-0233-2

Weitere Artikel der Ausgabe 2/2019

International Journal on Digital Libraries 2/2019 Zur Ausgabe