Article

A search result clustering method using informatively named entities

Authors:
Hiroyuki Toda

NTT Corporation, Kanagawa, Japan

NTT Corporation, Kanagawa, Japan
View Profile

,
Ryoji Kataoka

NTT Corporation, Kanagawa, Japan

NTT Corporation, Kanagawa, Japan
View Profile

WIDM '05: Proceedings of the 7th annual ACM international workshop on Web information and data managementNovember 2005Pages 81–86https://doi.org/10.1145/1097047.1097063

Published:04 November 2005Publication History

WIDM '05: Proceedings of the 7th annual ACM international workshop on Web information and data management

Pages 81–86

ABSTRACT

Clustering the results of a search helps the user to overview the information returned. In this paper, we regard the clustering task as indexing the search results. Here, an index means a structured label list that can makes it easier for the user to comprehend the labels and search results. To realize this goal, we make three proposals. First is to use Named Entity Extraction for term extraction. Second is a new label selecting criterion based on importance in the search result and the relation between terms and search queries. The third is label categorization using category information of labels, which is generated by NE extraction. We implement a prototype system based on these proposals and find that it offers much higher performance than existing methods; we focus on news articles in this paper.

References

Belkin, N. J.:"Anomalous states of knowledge as a basis for information." Canadian Journal of Information, Vol. 5, pp. 133--143, 1980.]]Google Scholar
Brin, S. and Page, L.: "The anatomy of a large-scale hypertextual(Web) Search Engine." Proceedings of WWW7, pp.107--117, 1998.]] Google ScholarDigital Library
Salton, G. and Yang, C. G.: "On the Specification of Term Values in Automatic Indexing." Journal of Documentation 29, 1973.]]Google Scholar
Baeza-Yates, R. and Ribeiro-Neto, B.: "Modern Information Retrieval." ACM Press, 1999.]] Google ScholarDigital Library
Zamir, O., Etzioni, O. and Grouper, A.: "Grouper: A Dynamic Clustering Interface to Web Search Results." Proceedings of WWW8, pp.1361--1374, 1999.]] Google ScholarDigital Library
Zeng, H. J., He, Q. C., Chen, Z., Ma, W. Y. and Ma, J.: "Learning to Cluster Web Search Results." Proceedings of SIGIR'04, pp.210--217, 2004.]] Google ScholarDigital Library
Kummamuru, K., Lotlikar, R., Roy, S., Signal, K. and Krishnapuram, R.: "A hierarchical monothetic document clustering algorithm for summarization and browsing search results." Proceedings of WWW'04, pp.658--665, 2004.]] Google ScholarDigital Library
Ohta, M., Narita, H. and Ohno, S.: "Overlapping Clustering Method Using Local and Global Importance of Feature Terms at NTCIR-4 Web Task." Working Notes of NTCIR(NII-NACSIS Test Collection for IR Systems)-4 Vol. Supl. 1, pp.37--44, 2004.]]Google Scholar
Hearst, M., and Pedersen, J.: "Reexamining the cluster hypothesis: scatter/gather on retrieval results." Proceedings of SIGIR'96, pp.76--84, 1996.]] Google ScholarDigital Library
Leuski, A.: "Evaluating Document Clustering for Interactive Information Retrieval." Proceedings of CIKM'01, pp.33--40, 2001.]] Google ScholarDigital Library
Hisamitsu, T., Niwa, Y. and Tsujii, J.: "Measuring Representativeness of Terms." Proceedings of IRAL'99, pp.83--90, 1999.]]Google Scholar
Grishman, R. and Sundheim B.: "Message Understanding Conference - 6: A Brief History." Proceedings of COLING'96, pp.466--471, 1996.]] Google ScholarDigital Library
Sekine, S.: "Named Entity: History and Future." http://cs.nyu.edu/\~sekine/papers/NEsurvey200402.pdf, 2004.]]Google Scholar
Sekine, S. and Nobata, C.: "Definition, Dictionary and Tagger for Extended Named Entities." Proceedings of LREC'04, 2004.]]Google Scholar
Kim, J. D., Ohta, T., Tsuruoka, Y., Tateisi Y. and Collier, N.: "Introduction to the Bio-Entity Recognition Task at JNLPBA." Proceedings of JNLPBA-04. pp.70--75, 2004.]]Google Scholar
Shinzato, K. and Torisawa, K.: "Extracting Hyponyms of Prespecified Hypernyms from Itemizations and Headings in Web Documents." Proceedings of COLING'04, 2004.]] Google ScholarDigital Library
Pasca, M.: "Acquisition of Categorized Named Entities for Web Search." Proceedings of CIKM'04, pp.137--145, 2004.]] Google ScholarDigital Library
Takata, Y., Nakagawa, K. and Seki, H.: "Flexible Category Structure for Supporting WWW Retrieval." Proceedings of 2nd International Workshop on the WWW and Conceptual Modeling, pp.165--177, 2000.]] Google ScholarDigital Library
Hayashi, Y., Tomita, J. and Kikui, G.: "Searching text-rich XML documents." ACM SIGIR Workshop on XML and Information Retrieval, pp.27--35, 2000.]]Google Scholar
Isozaki, H. and Kazawa, H.: "Efficient Support Vector Classifiers for Named Entity Recognition." Proceedings of COLING'02, pp390--396, 2002.]] Google ScholarDigital Library
Sekine, S. and Isahara, H.: IREX Project Overview." Proceedings of the IREX Workshop, pp.7--12, 1999.]]Google Scholar

Index Terms

A search result clustering method using informatively named entities
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
    2. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Named entity recognition and disambiguation using linked data and graph-based centrality scoring
SWIM '12: Proceedings of the 4th International Workshop on Semantic Web Information Management

Named Entity Recognition (NER) is a subtask of information extraction and aims to identify atomic entities in text that fall into predefined categories such as person, location, organization, etc. Recent efforts in NER try to extract entities and link ...
Read More
Finite-state transducer cascades to extract named entities in texts
Implementation and application automata

A lot of Named Entity Extraction Systems were created in English thanks to the impulse of MUC conferences. This article describes a Finite-State Transducer Cascade for the extraction of named entities in French journalistic texts. Finite-State Cascades ...
Read More
Weakly-supervised discovery of named entities using web search queries
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

A seed-based framework for textual information extraction allows for weakly supervised extraction of named entities from anonymized Web search queries. The extraction is guided by a small set of seed named entities, without any need for handcrafted ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WIDM '05: Proceedings of the 7th annual ACM international workshop on Web information and data management
November 2005
96 pages
ISBN:1595931945
DOI:10.1145/1097047
Program Chairs:
Angela Bonifati
Icar CNR, Italy
,
Dongwon Lee
Penn State University, USA
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 November 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
named entity
search result clustering
Qualifiers
- Article
Conference
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 45
  Total Citations
  View Citations
- 955
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A search result clustering method using informatively named entities

WIDM '05: Proceedings of the 7th annual ACM international workshop on Web information and data management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Named entity recognition and disambiguation using linked data and graph-based centrality scoring

Finite-state transducer cascades to extract named entities in texts

Weakly-supervised discovery of named entities using web search queries