skip to main content
10.1145/1097047.1097063acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

A search result clustering method using informatively named entities

Published:04 November 2005Publication History

ABSTRACT

Clustering the results of a search helps the user to overview the information returned. In this paper, we regard the clustering task as indexing the search results. Here, an index means a structured label list that can makes it easier for the user to comprehend the labels and search results. To realize this goal, we make three proposals. First is to use Named Entity Extraction for term extraction. Second is a new label selecting criterion based on importance in the search result and the relation between terms and search queries. The third is label categorization using category information of labels, which is generated by NE extraction. We implement a prototype system based on these proposals and find that it offers much higher performance than existing methods; we focus on news articles in this paper.

References

  1. Belkin, N. J.:"Anomalous states of knowledge as a basis for information." Canadian Journal of Information, Vol. 5, pp. 133--143, 1980.]]Google ScholarGoogle Scholar
  2. Brin, S. and Page, L.: "The anatomy of a large-scale hypertextual(Web) Search Engine." Proceedings of WWW7, pp.107--117, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Salton, G. and Yang, C. G.: "On the Specification of Term Values in Automatic Indexing." Journal of Documentation 29, 1973.]]Google ScholarGoogle Scholar
  4. Baeza-Yates, R. and Ribeiro-Neto, B.: "Modern Information Retrieval." ACM Press, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Zamir, O., Etzioni, O. and Grouper, A.: "Grouper: A Dynamic Clustering Interface to Web Search Results." Proceedings of WWW8, pp.1361--1374, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Zeng, H. J., He, Q. C., Chen, Z., Ma, W. Y. and Ma, J.: "Learning to Cluster Web Search Results." Proceedings of SIGIR'04, pp.210--217, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Kummamuru, K., Lotlikar, R., Roy, S., Signal, K. and Krishnapuram, R.: "A hierarchical monothetic document clustering algorithm for summarization and browsing search results." Proceedings of WWW'04, pp.658--665, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ohta, M., Narita, H. and Ohno, S.: "Overlapping Clustering Method Using Local and Global Importance of Feature Terms at NTCIR-4 Web Task." Working Notes of NTCIR(NII-NACSIS Test Collection for IR Systems)-4 Vol. Supl. 1, pp.37--44, 2004.]]Google ScholarGoogle Scholar
  9. Hearst, M., and Pedersen, J.: "Reexamining the cluster hypothesis: scatter/gather on retrieval results." Proceedings of SIGIR'96, pp.76--84, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Leuski, A.: "Evaluating Document Clustering for Interactive Information Retrieval." Proceedings of CIKM'01, pp.33--40, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hisamitsu, T., Niwa, Y. and Tsujii, J.: "Measuring Representativeness of Terms." Proceedings of IRAL'99, pp.83--90, 1999.]]Google ScholarGoogle Scholar
  12. Grishman, R. and Sundheim B.: "Message Understanding Conference - 6: A Brief History." Proceedings of COLING'96, pp.466--471, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Sekine, S.: "Named Entity: History and Future." http://cs.nyu.edu/\~sekine/papers/NEsurvey200402.pdf, 2004.]]Google ScholarGoogle Scholar
  14. Sekine, S. and Nobata, C.: "Definition, Dictionary and Tagger for Extended Named Entities." Proceedings of LREC'04, 2004.]]Google ScholarGoogle Scholar
  15. Kim, J. D., Ohta, T., Tsuruoka, Y., Tateisi Y. and Collier, N.: "Introduction to the Bio-Entity Recognition Task at JNLPBA." Proceedings of JNLPBA-04. pp.70--75, 2004.]]Google ScholarGoogle Scholar
  16. Shinzato, K. and Torisawa, K.: "Extracting Hyponyms of Prespecified Hypernyms from Itemizations and Headings in Web Documents." Proceedings of COLING'04, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Pasca, M.: "Acquisition of Categorized Named Entities for Web Search." Proceedings of CIKM'04, pp.137--145, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Takata, Y., Nakagawa, K. and Seki, H.: "Flexible Category Structure for Supporting WWW Retrieval." Proceedings of 2nd International Workshop on the WWW and Conceptual Modeling, pp.165--177, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hayashi, Y., Tomita, J. and Kikui, G.: "Searching text-rich XML documents." ACM SIGIR Workshop on XML and Information Retrieval, pp.27--35, 2000.]]Google ScholarGoogle Scholar
  20. Isozaki, H. and Kazawa, H.: "Efficient Support Vector Classifiers for Named Entity Recognition." Proceedings of COLING'02, pp390--396, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sekine, S. and Isahara, H.: IREX Project Overview." Proceedings of the IREX Workshop, pp.7--12, 1999.]]Google ScholarGoogle Scholar

Index Terms

  1. A search result clustering method using informatively named entities

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          WIDM '05: Proceedings of the 7th annual ACM international workshop on Web information and data management
          November 2005
          96 pages
          ISBN:1595931945
          DOI:10.1145/1097047

          Copyright © 2005 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 November 2005

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader