Published July 4, 2017 | Version v1
Dataset Open

The Annotated Corpus of Classical Tibetan (ACTib), Part II - POS-tagged version, based on the BDRC digitised text collection, tagged with the Memory-Based Tagger from TiMBL

  • 1. Cambridge
  • 2. SOAS, University of London

Description

This corpus is a part-of-speech tagged version of

Wallman, Jeff, Rowinski, Zach, Ngawang Trinley, Tomlinson, Chris, & Keutzer, Kurt. (2017). Collection of Tibetan etexts compiled by the Buddhist Digital Resource Center [Data set]. Zenodo. http://doi.org/10.5281/zenodo.821218

using the training data of

Hill, Nathan W., & Garrett, Edward. (2017). A part-of-speech (POS) tagged corpus of Classical Tibetan [Data set]. Zenodo. http://doi.org/10.5281/zenodo.574878

Please note that the files are not post-processed or manually corrected and that a small number of files in the KarmaDelek directory were still annotated, although the original xml-input was corrupted already.

 

using the memory based tagger of

https://languagemachines.github.io/mbt/

Files

DharmaDownloadtagged.zip

Files (783.0 MB)

Name Size Download all
md5:58a258e4a26bf117516d5108daeb4960
52.0 MB Preview Download
md5:b2b9ae591a4079023e94d527feae2d49
24.1 MB Preview Download
md5:b5b3a88d16cf913a0339123e0b268e71
44.5 MB Preview Download
md5:698849f34dc129b09bfb1457f71710a4
127.7 MB Preview Download
md5:c871e6999c032981800cffdc8f78ec7d
49.9 MB Preview Download
md5:a6176dfd396d2f252d7e7f4cb3ed34fc
27.2 MB Preview Download
md5:f2ed8ad329776390634effecc74b9ed7
10.4 MB Preview Download
md5:0b008246e23176484544dddf2f5c5816
38.7 MB Preview Download
md5:12d9e8aa916c7ec599241465f2e5ebdf
8.6 MB Preview Download
md5:6354fb1e24527b877869c503d8608a68
382.2 MB Preview Download
md5:41d8c6e4be99e02fc7b70da8b891e4f4
7.4 MB Preview Download
md5:dbc04c7cb1282ed982291a585a300d4a
10.2 MB Preview Download

Additional details

Related works