1 Introduction
2 System Architecture
Handwritten Text Recognition
, NE Recognition
, KW-NE Indexing
and KW-NE-Based IR Model
. Firstly, the historical handwritten document images are digitised to transcriptions through the Handwritten Text Recognition
module. Then, the transcriptions are annotated by NEs through the NE Recognition
module. This module needs to connect to the Knowledge Graph
to extract the classes and identifiers of NEs. Next, KWs and NEs of the annotated transcriptions and the respective original images are presented and indexed by the KW-NE indexing
module and stored in KW-NE Annotated Text and Image Repository
. The raw text query is also annotated NEs through the NE Recognition
module to become a KW-NE annotated query. Finally, the KW-NE-Based IR Model
module compares the annotated query and the annotated documents to return the ranked transcriptions and images.3 Image Representation and Knowledge Graph
CIDOC-CRM:E21_Person
) named “William Sutton”, who was member of a few relevant offices in Ireland.
4 Information Retrieval Model and Demo
-
If our NER can determine its identifier, the NE will be presented by its identifier in d. For example, occu_sheriff, coun_meath and occu_clerk are identifiers of entities named sheriff, Meath and clerk, and added into d.
-
If our NER only determines its most specific class, the NE will be presented by a combined information including its name and class. For example, the entity named William Sutton does not exist in our historical KG, so its identifier cannot be extracted. However, the NER determines its most specific class being Person. So william_sutton/person is added into d.