Skip to main content
Top

2022 | OriginalPaper | Chapter

On the Entropy of Written Afan Oromo

Authors : Dereje Hailemariam Woldegebreal, Tsegamlak Terefe Debella, Kalkidan Dejenie Molla

Published in: e-Infrastructure and e-Services for Developing Countries

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Afan Oromo is the language of the Oromo people, the largest ethnolinguistic group in Ethiopia. Written Afan Oromo uses Latin alphabet. In electronic communication systems letters in the alphabet are represented with standard ASCII-8 code, which uses 8 bits/letter, or UTF-8 fixed length encoding, which uses 16 bits/letter. Moreover, the language uses gemination (i.e., doubling of a consonant) and long vowels are represented by double letters, e.g., “dammee” to mean sweet potato. From information theoretic perspective, this doubling and fixed length encoding schemes add redundancy in written Afan Oromo. This redundancy, in turn, contributes for inefficient use of communication resources, such as bandwidth and energy, during transmission and storage of texts written in Afan Oromo. This paper aims at utilizing information theory to estimate entropy of written Afan Oromo. We use higher-order Markov chain, also called N-gram model, to compute the entropy of a sample text corpora (or written source) by capturing the dependencies among sequence of letters generated from the corpora. Entropy measures average information in bits per letter or block of letters, depending on the N-gram considered. Entropy also indicates the achievable lower bound for compression when using lossless compressions such as Huffman coding. When modeled as a first order Markov chain (i.e., assuming memoryless source where sequence of letters from the source are occurring independent of each other), the entropy of the language is 4.31 bits/letter. When compared with ASCII-8, the achievable compression level is about 46%. When N = 19 the estimated entropy is as low as 0.85 bits/letter; this corresponds to about 89% compression level. Huffman and Arithmetic source coding algorithms are implemented to check the achievable compression level. For the collected sample corpora, the average compression by Huffman algorithm varies from 42.2%−64.9% for N = 1 − 5. These compression levels are closer to the theoretical entropy. With increasing demand of the language in telecom services and storage systems, the entropy results show the need to further investigate language specific applications, like compression algorithms.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Source coding is a term used communication engineering literatures, while compression is terms used computer science and information systems literatures. In this paper we may be using these terminologies interchangeably.
 
Literature
3.
go back to reference Bedane, I.: The origin of Afaan Oromo: mother language. Glob. J. Human-Social Sci. 15(12), 51–61 (2015) Bedane, I.: The origin of Afaan Oromo: mother language. Glob. J. Human-Social Sci. 15(12), 51–61 (2015)
4.
go back to reference Terefe, T., Hailemariam, D.: Entropy estimation and entropy-based encoding of written amharic language for efficient transmission in telecom networks. In Proceeding of IEEE AFRICON, 2017, pp. 238–244, Cape Town, South Africa (2017) Terefe, T., Hailemariam, D.: Entropy estimation and entropy-based encoding of written amharic language for efficient transmission in telecom networks. In Proceeding of IEEE AFRICON, 2017, pp. 238–244, Cape Town, South Africa (2017)
6.
go back to reference Stabno, M. and Wrembel, R., RLH: bitmap compression technique based on run-length and huffman encoding. In Proceeding of Information Systems, pp. 400–414 (2009) Stabno, M. and Wrembel, R., RLH: bitmap compression technique based on run-length and huffman encoding. In Proceeding of Information Systems, pp. 400–414 (2009)
7.
go back to reference Wegari, G., Meshesha, M.: Parts of speech tagging for Afaan Oromo. Int. J. Adv. Comput. Sci. Appl. 1, 1–15 (2015) Wegari, G., Meshesha, M.: Parts of speech tagging for Afaan Oromo. Int. J. Adv. Comput. Sci. Appl. 1, 1–15 (2015)
9.
go back to reference Proakis, J., Salehi, M.: Communication Systems Engineering, 2nd edn. Prentice-Hall, New Jersey (2002)MATH Proakis, J., Salehi, M.: Communication Systems Engineering, 2nd edn. Prentice-Hall, New Jersey (2002)MATH
10.
go back to reference Sayood, K.: Introduction to Data Compression. Morgan Kaufmann, Burlington (2017)MATH Sayood, K.: Introduction to Data Compression. Morgan Kaufmann, Burlington (2017)MATH
11.
go back to reference Shannon, C.: Prediction and entropy of printed english. Bell Syst. Tech. J. 30(1), 50–64 (1951)CrossRef Shannon, C.: Prediction and entropy of printed english. Bell Syst. Tech. J. 30(1), 50–64 (1951)CrossRef
12.
go back to reference Markov, A.: The Extension of the Law of Large Numbers onto Quantities Depending on Each other. Translation into English (2004) Markov, A.: The Extension of the Law of Large Numbers onto Quantities Depending on Each other. Translation into English (2004)
14.
go back to reference Sayood, K.: Lossless Compression Handbook. Academic Press Series, USA (2003) Sayood, K.: Lossless Compression Handbook. Academic Press Series, USA (2003)
Metadata
Title
On the Entropy of Written Afan Oromo
Authors
Dereje Hailemariam Woldegebreal
Tsegamlak Terefe Debella
Kalkidan Dejenie Molla
Copyright Year
2022
DOI
https://doi.org/10.1007/978-3-031-06374-9_3

Premium Partner