ABSTRACT
Clustering the results of a search helps the user to overview the information returned. In this paper, we regard the clustering task as indexing the search results. Here, an index means a structured label list that can makes it easier for the user to comprehend the labels and search results. To realize this goal, we make three proposals. First is to use Named Entity Extraction for term extraction. Second is a new label selecting criterion based on importance in the search result and the relation between terms and search queries. The third is label categorization using category information of labels, which is generated by NE extraction. We implement a prototype system based on these proposals and find that it offers much higher performance than existing methods; we focus on news articles in this paper.
- Belkin, N. J.:"Anomalous states of knowledge as a basis for information." Canadian Journal of Information, Vol. 5, pp. 133--143, 1980.]]Google Scholar
- Brin, S. and Page, L.: "The anatomy of a large-scale hypertextual(Web) Search Engine." Proceedings of WWW7, pp.107--117, 1998.]] Google ScholarDigital Library
- Salton, G. and Yang, C. G.: "On the Specification of Term Values in Automatic Indexing." Journal of Documentation 29, 1973.]]Google Scholar
- Baeza-Yates, R. and Ribeiro-Neto, B.: "Modern Information Retrieval." ACM Press, 1999.]] Google ScholarDigital Library
- Zamir, O., Etzioni, O. and Grouper, A.: "Grouper: A Dynamic Clustering Interface to Web Search Results." Proceedings of WWW8, pp.1361--1374, 1999.]] Google ScholarDigital Library
- Zeng, H. J., He, Q. C., Chen, Z., Ma, W. Y. and Ma, J.: "Learning to Cluster Web Search Results." Proceedings of SIGIR'04, pp.210--217, 2004.]] Google ScholarDigital Library
- Kummamuru, K., Lotlikar, R., Roy, S., Signal, K. and Krishnapuram, R.: "A hierarchical monothetic document clustering algorithm for summarization and browsing search results." Proceedings of WWW'04, pp.658--665, 2004.]] Google ScholarDigital Library
- Ohta, M., Narita, H. and Ohno, S.: "Overlapping Clustering Method Using Local and Global Importance of Feature Terms at NTCIR-4 Web Task." Working Notes of NTCIR(NII-NACSIS Test Collection for IR Systems)-4 Vol. Supl. 1, pp.37--44, 2004.]]Google Scholar
- Hearst, M., and Pedersen, J.: "Reexamining the cluster hypothesis: scatter/gather on retrieval results." Proceedings of SIGIR'96, pp.76--84, 1996.]] Google ScholarDigital Library
- Leuski, A.: "Evaluating Document Clustering for Interactive Information Retrieval." Proceedings of CIKM'01, pp.33--40, 2001.]] Google ScholarDigital Library
- Hisamitsu, T., Niwa, Y. and Tsujii, J.: "Measuring Representativeness of Terms." Proceedings of IRAL'99, pp.83--90, 1999.]]Google Scholar
- Grishman, R. and Sundheim B.: "Message Understanding Conference - 6: A Brief History." Proceedings of COLING'96, pp.466--471, 1996.]] Google ScholarDigital Library
- Sekine, S.: "Named Entity: History and Future." http://cs.nyu.edu/\~sekine/papers/NEsurvey200402.pdf, 2004.]]Google Scholar
- Sekine, S. and Nobata, C.: "Definition, Dictionary and Tagger for Extended Named Entities." Proceedings of LREC'04, 2004.]]Google Scholar
- Kim, J. D., Ohta, T., Tsuruoka, Y., Tateisi Y. and Collier, N.: "Introduction to the Bio-Entity Recognition Task at JNLPBA." Proceedings of JNLPBA-04. pp.70--75, 2004.]]Google Scholar
- Shinzato, K. and Torisawa, K.: "Extracting Hyponyms of Prespecified Hypernyms from Itemizations and Headings in Web Documents." Proceedings of COLING'04, 2004.]] Google ScholarDigital Library
- Pasca, M.: "Acquisition of Categorized Named Entities for Web Search." Proceedings of CIKM'04, pp.137--145, 2004.]] Google ScholarDigital Library
- Takata, Y., Nakagawa, K. and Seki, H.: "Flexible Category Structure for Supporting WWW Retrieval." Proceedings of 2nd International Workshop on the WWW and Conceptual Modeling, pp.165--177, 2000.]] Google ScholarDigital Library
- Hayashi, Y., Tomita, J. and Kikui, G.: "Searching text-rich XML documents." ACM SIGIR Workshop on XML and Information Retrieval, pp.27--35, 2000.]]Google Scholar
- Isozaki, H. and Kazawa, H.: "Efficient Support Vector Classifiers for Named Entity Recognition." Proceedings of COLING'02, pp390--396, 2002.]] Google ScholarDigital Library
- Sekine, S. and Isahara, H.: IREX Project Overview." Proceedings of the IREX Workshop, pp.7--12, 1999.]]Google Scholar
Index Terms
- A search result clustering method using informatively named entities
Recommendations
Named entity recognition and disambiguation using linked data and graph-based centrality scoring
SWIM '12: Proceedings of the 4th International Workshop on Semantic Web Information ManagementNamed Entity Recognition (NER) is a subtask of information extraction and aims to identify atomic entities in text that fall into predefined categories such as person, location, organization, etc. Recent efforts in NER try to extract entities and link ...
Finite-state transducer cascades to extract named entities in texts
Implementation and application automataA lot of Named Entity Extraction Systems were created in English thanks to the impulse of MUC conferences. This article describes a Finite-State Transducer Cascade for the extraction of named entities in French journalistic texts. Finite-State Cascades ...
Weakly-supervised discovery of named entities using web search queries
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge managementA seed-based framework for textual information extraction allows for weakly supervised extraction of named entities from anonymized Web search queries. The extraction is guided by a small set of seed named entities, without any need for handcrafted ...
Comments