Abstract
In natural language processing, conflation is the process of merging or lumping together nonidentical words which refer to the same principal concept. This can relate both to words which are entirely different in form (e.g., "group" and "collection"), and to words which share some common root (e.g., "group", "grouping", "subgroups"). In the former case the words can only be mapped by referring to a dictionary or thesaurus, but in the latter case use can be made of the orthographic similarities between the forms. One popular approach is to remove affixes from the input words, thus reducing them to a stem; if this could be done correctly, all the variant forms of a word would be converted to the same standard form. Since the process is aimed at mapping for retrieval purposes, the stem need not be a linguistically correct lemma or root (see also Frakes 1982).
- Dawson, J. L. 1974: "Suffix removal and word conflation," ALLC Bulletin, 2(3), 33--46 (1974).Google Scholar
- Frakes, W. B., 1982: Term Conflation for Information Retrieval, Ph.D. dissertation, Syracuse University, August 1982.Google Scholar
- Lennon, M., Pierce, D. S., Tarry, B. D. and Willett, P. 1981: "An evaluation of some conflation algorithms for information retrieval", Journal of Information Science, 3, 177--183 (1981).Google ScholarCross Ref
- Lovins, J. B. 1968: "Development of a stemming algorithm", Mechanical Translation and Computational Linguistics, 11, 22--31 (1968).Google Scholar
- Paice, C. D. 1977: Information Retrieval and the Computer, London: MacDonald & Jane's, 1977; chapter 4.Google Scholar
- Porter, M. F. 1980: "An algorithm for suffix stripping", Program, 14, 130--137 (1980).Google ScholarDigital Library
- Ulmschneider, J. and Doszkocs, T. 1983: "A practical stemming algorithm for online search assistance", Online Review, 7(4), (1983).Google Scholar
Index Terms
- Another stemmer
Recommendations
Hindi Stemmer @ FIRE-2013
FIRE '12 & '13: Proceedings of the 4th and 5th Annual Meetings of the Forum for Information Retrieval EvaluationThis paper describes a language independent approach for extracting Hindi morpheme from a given list of Hindi words of Morpheme Extraction Task (MET) at FIRE 2013. In this approach list of Hindi word is submitted to the system and it generates stemmed ...
The Rule-Based Sundanese Stemmer
Our research proposed an iterative Sundanese stemmer by removing the derivational affixes prior to the inflexional. This scheme was chosen because, in the Sundanese affixation, a confix (one of derivational affix) is applied in the last phase of a ...
An unsupervised Hindi stemmer with heuristic improvements
AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text dataStemmers are used to convert inflected words into their root or stem. Stem does not necessarily correspond to linguistic root of a word. Stemming improve performance by reducing morphologically variants into same words. This paper presents an approach ...
Comments