1 Introduction
2 History of Arabic writing style and manuscripts
3 Dataset description
Name of sources | Number of manuscripts |
---|---|
BRILL through QNL | 57 |
University of Tubingen, Berlin | 1 |
Dar al Makhtotat Sana’a Yemen | 1 |
Institute of oriental culture, University of Tokyo | 8 |
Princeton University Library | 4 |
Wellcome Library | 2 |
Yale University | 4 |
Cambridge | |
University Library | 3 |
Islamic Awarness Web site | 13 |
The Royal Library, National Library of Denmark | 2 |
4 Comparison of KERTAS with existing datasets
4.1 Existing datasets
Name | Language | Size |
---|---|---|
Syriac character [10] | Syriac | 60 K characters |
IBN SINA [19] | Arabic | 51 folios, 20722 CCs |
Barcelona Historical Marriages dataset (BH2 M) [20] | Spanish | 244 book, 174 images |
Medieval Paleographic Scale (MPS) [7] | Medieval Dutch | 2858 charters |
KERTAS dataset | Arabic | 2505 images, 135 books |
4.2 KERTAS dataset
5 Features extraction
5.1 Sparse representation-based approach
5.2 Handwriting style-based features
6 Results, analysis and discussion
Image size | Accuracy with predefined folds (%) | Accuracy with random Train/Test split |
---|---|---|
12 × 12 | 80.62 | 28.46 |
25 × 25 | 92.00 | 41.11 |
50 × 50 |
94.77
|
42.31
|
100 × 100 | 92.62 | 42.31 |
200 × 200 | 92.32 | 41.56 |
250 × 250 | 92.32 | 41.41 |