Skip to main content
Top

2016 | OriginalPaper | Chapter

An Automatic Construction of Malay Stop Words Based on Aggregation Method

Authors : Khalifa Chekima, Rayner Alfred

Published in: Soft Computing in Data Science

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In information retrieval, the key to an effective indexing can be achieved through the removal of stop words. Despite having many theories and algorithms related to the construction of stop words in many languages, yet, most of the Malay stop words used are either utilized/borrowed from English stop words, or constructed manually by different researchers which happen to be costly, time consuming and susceptible to error. In other words, no standard stop word list has been constructed for Malay language yet. In this study, we propose an aggregation technique using three different approaches for an automatic construction of general Malay Stop words. The first approach based on statistical method, by considering words’ frequencies (highest and lowest) against their ranks, this method inspired by zipf’s law. The second approach by considering words’ distribution against documents using variance measure. The third approach by computing how informative a word is by using Entropy measure. As a result, a total of 339 Malay stop words were produced. The discussion and implication of these findings are further elaborated.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Business + Economics & Engineering + Technology"

Online-Abonnement

Springer Professional "Business + Economics & Engineering + Technology" gives you access to:

  • more than 102.000 books
  • more than 537 journals

from the following subject areas:

  • Automotive
  • Construction + Real Estate
  • Business IT + Informatics
  • Electrical Engineering + Electronics
  • Energy + Sustainability
  • Finance + Banking
  • Management + Leadership
  • Marketing + Sales
  • Mechanical Engineering + Materials
  • Insurance + Risk


Secure your knowledge advantage now!

Springer Professional "Engineering + Technology"

Online-Abonnement

Springer Professional "Engineering + Technology" gives you access to:

  • more than 67.000 books
  • more than 390 journals

from the following specialised fileds:

  • Automotive
  • Business IT + Informatics
  • Construction + Real Estate
  • Electrical Engineering + Electronics
  • Energy + Sustainability
  • Mechanical Engineering + Materials





 

Secure your knowledge advantage now!

Springer Professional "Business + Economics"

Online-Abonnement

Springer Professional "Business + Economics" gives you access to:

  • more than 67.000 books
  • more than 340 journals

from the following specialised fileds:

  • Construction + Real Estate
  • Business IT + Informatics
  • Finance + Banking
  • Management + Leadership
  • Marketing + Sales
  • Insurance + Risk



Secure your knowledge advantage now!

Literature
This content is only visible if you are logged in and have the appropriate permissions.
Metadata
Title
An Automatic Construction of Malay Stop Words Based on Aggregation Method
Authors
Khalifa Chekima
Rayner Alfred
Copyright Year
2016
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-2777-2_16

Premium Partner