skip to main content
10.1145/319950.319966acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article
Free Access

Text classification using ESC-based stochastic decision lists

Published:01 November 1999Publication History

ABSTRACT

We propose a new method of text classification using stochastic decision lists. A stochastic decision list is an ordered sequence of IF-THEN rules, and our method can be viewed as a rule-based method for text classification having advantages of readability and refinability of acquired knowledge. Our method is unique in that decision lists are automatically constructed on the basis of the principle of minimizing Extended Stochastic Complexity (ESC), and with it we are able to construct decision lists that have fewer errors in classification. The accuracy of classification achieved with our method appears better than or comparable to those of existing rule-based methods.

References

  1. 1.Chidanand Apte, Fred Damerau, and Sholom M. Weiss. Automated learning of decision rules for text categorization. A CM Transactions on Information Systems, 12(3):233-251, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.William W. Cohen and Yoram Singer. Contextsensitive learning methods for text categorization. http://www, research, ait. com/ginger, 1998.Google ScholarGoogle Scholar
  3. 3.Susan Dumais, John Platt, David Heckerman, and Mehran Sahami. Inductive learning algorithms and representations for text categorization. Proc. of CIKM'98, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.Thorsten Joachims. Text categorization with support vector machines: Learning with many relevant features. Proc. of ECML '98, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.Gautam Kar and Lee J. White. A distance measure for automatic document classification by sequential analysis. Information Processing and Managemeni, 14:57-69, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  6. 6.Daphne Koller and Mehran Sahami. Hierarchically classifying documents using very few words. Proc. o/ICML '97, pages 170-178, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.David D. Lewis and Marc Ringuette. A comparison of two learning algorithms for test categorization. Proceedings of 3rd Annual Symposium on Document Analysis and Informalion Retrieval, pages 81-93, 1994.Google ScholarGoogle Scholar
  8. 8.David D. Lewis, Robert E. Schapire, James P. Callan, and Ron Papka. Training algorithms for linear text classifiers. Proc. of SIGIR'96, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.Hang Li and Kenji Yamanishi. Document classification using a finite mixture model. Proc. of A CL'97, pages 39-47, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.Jorma Rissanen. Fisher information and stochastic complexity. IEEE Transaction on Information Theory, 42(1):40-47, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.S.E. Robertson and K. Sparck Jones. Relevance weighting of search terms. Journal of the American Society/or Information Science, 27:129-146, 1976.Google ScholarGoogle Scholar
  12. 12.J. Rocchio. Relevance feedback information retrieval, in Gerard Slaton, editor, The Smart Retrieval System -Experiments in Automatic Document Processing, pages 313-323. Prentice-Hall, 1971.Google ScholarGoogle Scholar
  13. 13.Robert E. Schapire, Yoram Singer, and Amit Singhal. Boosting and rocchio applied to text filtering. Proc. of $IGIR'98, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.I-Iinrich Schutze, David A. Hull, and Jan O. Pedersen. A comparison of classifiers and document representations for the routing problem. Proc. of SIGIR '95, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.Vladimir N. Vapnik. The Nature of Statistical Learning Theory. New York: Springer, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.Kenji Yamanishi. A learning criterion for stochastic rules. Machine Learning, 9:165-203, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.Kenji Yamanishi. A decision-theoretic extension of stochastic complexity and its applications to learning. iEEE Transactions on Information Theory, 44(4):1424-1439, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.Yiming Yang and Jan O. Pedersen. A comparative study on feature selection in text categorization. Proc. o/ICML '97, pages 412-420, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Text classification using ESC-based stochastic decision lists

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              CIKM '99: Proceedings of the eighth international conference on Information and knowledge management
              November 1999
              564 pages
              ISBN:1581131461
              DOI:10.1145/319950

              Copyright © 1999 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 November 1999

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              Overall Acceptance Rate1,861of8,427submissions,22%

              Upcoming Conference

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader