2004 | OriginalPaper | Buchkapitel
Automated Resolution of Noisy Bibliographic References
verfasst von : Markus Demleitner, Michael Kurtz, Alberto Accomazzi, Günther Eichhorn, Carolyn S. Grant, Steven S. Murray
Erschienen in: Classification, Clustering, and Data Mining Applications
Verlag: Springer Berlin Heidelberg
Enthalten in: Professional Book Archive
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
We describe a system used by the NASA Astrophysics Data System to identify bibliographic references obtained from scanned article pages by OCR methods with records in a bibliographic database. We analyze the process generating the noisy references and conclude that the three-step procedure of correcting the OCR results, parsing the corrected string and matching it against the database provides unsatisfactory results. Instead, we propose a method that allows a controlled merging of correction, parsing and matching, inspired by dependency grammars. We also report on the effectiveness of various heuristics that we have employed to improve recall.