ABSTRACT
We propose a new method of text classification using stochastic decision lists. A stochastic decision list is an ordered sequence of IF-THEN rules, and our method can be viewed as a rule-based method for text classification having advantages of readability and refinability of acquired knowledge. Our method is unique in that decision lists are automatically constructed on the basis of the principle of minimizing Extended Stochastic Complexity (ESC), and with it we are able to construct decision lists that have fewer errors in classification. The accuracy of classification achieved with our method appears better than or comparable to those of existing rule-based methods.
- 1.Chidanand Apte, Fred Damerau, and Sholom M. Weiss. Automated learning of decision rules for text categorization. A CM Transactions on Information Systems, 12(3):233-251, 1994. Google ScholarDigital Library
- 2.William W. Cohen and Yoram Singer. Contextsensitive learning methods for text categorization. http://www, research, ait. com/ginger, 1998.Google Scholar
- 3.Susan Dumais, John Platt, David Heckerman, and Mehran Sahami. Inductive learning algorithms and representations for text categorization. Proc. of CIKM'98, 1998. Google ScholarDigital Library
- 4.Thorsten Joachims. Text categorization with support vector machines: Learning with many relevant features. Proc. of ECML '98, 1998. Google ScholarDigital Library
- 5.Gautam Kar and Lee J. White. A distance measure for automatic document classification by sequential analysis. Information Processing and Managemeni, 14:57-69, 1978.Google ScholarCross Ref
- 6.Daphne Koller and Mehran Sahami. Hierarchically classifying documents using very few words. Proc. o/ICML '97, pages 170-178, 1997. Google ScholarDigital Library
- 7.David D. Lewis and Marc Ringuette. A comparison of two learning algorithms for test categorization. Proceedings of 3rd Annual Symposium on Document Analysis and Informalion Retrieval, pages 81-93, 1994.Google Scholar
- 8.David D. Lewis, Robert E. Schapire, James P. Callan, and Ron Papka. Training algorithms for linear text classifiers. Proc. of SIGIR'96, 1996. Google ScholarDigital Library
- 9.Hang Li and Kenji Yamanishi. Document classification using a finite mixture model. Proc. of A CL'97, pages 39-47, 1997. Google ScholarDigital Library
- 10.Jorma Rissanen. Fisher information and stochastic complexity. IEEE Transaction on Information Theory, 42(1):40-47, 1996. Google ScholarDigital Library
- 11.S.E. Robertson and K. Sparck Jones. Relevance weighting of search terms. Journal of the American Society/or Information Science, 27:129-146, 1976.Google Scholar
- 12.J. Rocchio. Relevance feedback information retrieval, in Gerard Slaton, editor, The Smart Retrieval System -Experiments in Automatic Document Processing, pages 313-323. Prentice-Hall, 1971.Google Scholar
- 13.Robert E. Schapire, Yoram Singer, and Amit Singhal. Boosting and rocchio applied to text filtering. Proc. of $IGIR'98, 1998. Google ScholarDigital Library
- 14.I-Iinrich Schutze, David A. Hull, and Jan O. Pedersen. A comparison of classifiers and document representations for the routing problem. Proc. of SIGIR '95, 1995. Google ScholarDigital Library
- 15.Vladimir N. Vapnik. The Nature of Statistical Learning Theory. New York: Springer, 1995. Google ScholarDigital Library
- 16.Kenji Yamanishi. A learning criterion for stochastic rules. Machine Learning, 9:165-203, 1992. Google ScholarDigital Library
- 17.Kenji Yamanishi. A decision-theoretic extension of stochastic complexity and its applications to learning. iEEE Transactions on Information Theory, 44(4):1424-1439, 1998. Google ScholarDigital Library
- 18.Yiming Yang and Jan O. Pedersen. A comparative study on feature selection in text categorization. Proc. o/ICML '97, pages 412-420, 1997. Google ScholarDigital Library
Index Terms
- Text classification using ESC-based stochastic decision lists
Recommendations
Text classification using ESC-based stochastic decision lists
We propose a new method of text classification using stochastic decision lists. A stochastic decision list is an ordered sequence of IF-THEN-ELSE rules, and our method can be viewed as a rule-based method for text classification having advantages of ...
Learning decision lists using homogeneous rules
AAAI'94: Proceedings of the Twelfth AAAI National Conference on Artificial IntelligenceA decision list is an ordered list of conjunctive rules (Rivest 1987). Inductive algorithms such as AQ and CN2 learn decision lists incrementally, one rule at a time. Such algorithms face the rule overlap problem — the classification accuracy of the ...
Decision tree based induction of decision lists
This paper addresses the problem of using decision lists for building machine learning algorithms. In this work, we first highlight the expressive power of Decision Lists DL, which were already known to generalize decision trees. We also present ICDL, a ...
Comments