Article

Measuring semantic similarity between words using web search engines

WWW '07: Proceedings of the 16th international conference on World Wide WebMay 2007Pages 757–766https://doi.org/10.1145/1242572.1242675

Published:08 May 2007Publication History

WWW '07: Proceedings of the 16th international conference on World Wide Web

Pages 757–766

References

{1} A. Bagga and B. Baldwin. Entity-based cross document coreferencing using the vector space model. In Proc. of 36th COLING-ACL, pages 79-85, 1998. Google ScholarDigital Library
{2} Z. Bar-Yossef and M. Gurevich. Random sampling from a search engine's index. In Proceedings of 15th International World Wide Web Conference, 2006. Google ScholarDigital Library
{3} R. Bekkerman and A. McCallum. Disambiguating web appearances of people in a social network. In Proceedings of the World Wide Web Conference (WWW), pages 463-470, 2005. Google ScholarDigital Library
{4} D. Bollegala, Y. Matsuo, and M. Ishizuka. Disambiguating personal names on the web using automatically extracted key phrases. In Proc. of the 17th European Conference on Artificial Intelligence, pages 553-557, 2006. Google ScholarDigital Library
{5} C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using smart: Trec 3. In Proc. of 3rd Text REtreival Conference, pages 69-80, 1994.Google Scholar
{6} H. Chen, M. Lin, and Y. Wei. Novel association measures using web search with double checking. In Proc. of the COLING/ACL 2006, pages 1009-1016, 2006. Google ScholarDigital Library
{7} P. Cimano, S. Handschuh, and S. Staab. Towards the self-annotating web. In Proc. of 13th WWW, 2004. Google ScholarDigital Library
{8} J. Curran. Ensemble menthods for automatic thesaurus extraction. In Proc. of EMNLP, 2002. Google ScholarDigital Library
{9} D. R. Cutting, J. O. Pedersen, D. Karger, and J. W. Tukey. Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings SIGIR '92, pages 318-329, 1992. Google ScholarDigital Library
{10} M. Fleischman and E. Hovy. Multi-document person name resolution. In Proceedings of 42nd Annual Meeting of the Association for Computational Linguistics (ACL), Reference Resolution Workshop, 2004.Google Scholar
{11} H. Han, H. Zha, and C. L. Giles. Name disambiguation in author citations using a k-way spectral clustering method. In Proceedings of the International Conference on Digital Libraries, 2005. Google ScholarDigital Library
{12} M. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proc. of 14th COLING, pages 539-545, 1992. Google ScholarDigital Library
{13} J. Jiang and D. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. In Proc. of the International Conference on Research in Computational Linguistics ROCLING X, 1998.Google Scholar
{14} F. Keller and M. Lapata. Using the web to obtain frequencies for unseen bigrams. Computational Linguistics, 29(3):459-484, 2003. Google ScholarDigital Library
{15} M. Lapata and F. Keller. Web-based models of natural language processing. ACM Transactions on Speech and Language Processing, 2(1):1-31, 2005. Google ScholarDigital Library
{16} D. Lin. Automatic retreival and clustering of similar words. In Proc. of the 17th COLING, pages 768-774, 1998. Google ScholarDigital Library
{17} D. Lin. An information-theoretic definition of similarity. In Proc. of the 15th ICML, pages 296-304, 1998. Google ScholarDigital Library
{18} C. D. Manning and H. Schäutze. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, Massachusetts, 2002. Google ScholarDigital Library
{19} Y. Matsuo, J. Mori, M. Hamasaki, K. Ishida, T. Nishimura, H. Takeda, K. Hasida, and M. Ishizuka. Polyphonet: An advanced social network extraction system. In Proc. of 15th International World Wide Web Conference, 2006. Google ScholarDigital Library
{20} Y. Matsuo, T. Sakaki, K. Uchiyama, and M. Ishizuka. Graph-based word clustering using web search engine. In Proc. of EMNLP 2006, 2006. Google ScholarDigital Library
{21} D. McCarthy, R. Koeling, J. Weeds, and J. Carroll. Finding predominant word senses in untagged text. In Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL'04), pages 279-286, 2004. Google ScholarDigital Library
{22} D. Medin, R. Goldstone, and D. Gentner. Respects for similarity. Psychological Review, 6(1):1-28, 1991.Google Scholar
{23} P. Mika. Ontologies are us: A unified model of social networks and semantics. In Proc. of ISWC2005, 2005. Google ScholarDigital Library
{24} G. Miller and W. Charles. Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1):1-28, 1998.Google ScholarCross Ref
{25} M. Mitra, A. Singhal, and C. Buckley. Improving automatic query expansion. In Proc. of 21st Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 206-214, 1998. Google ScholarDigital Library
{26} J. Mori, Y. Matsuo, and M. Ishizuka. Extracting keyphrases to represent relations in social networks from web. In Proc. of 20th IJCAI, 2007. Google ScholarDigital Library
{27} M. Pasca, D. Lin, J. Bigham, A. Lifchits, and A. Jain. Organizing and searching the world wide web of facts - step one: the one-million fact extraction challenge. In Proc. of AAAI-2006, 2006. Google ScholarDigital Library
{28} X.-H. Phan, L.-M. Nguyen, and S. Horiguchi. Personal name resolution crossover documents by a semantics-based approach. IEICE Transactions on Information and Systems, E89-D:825-836, 2005. Google ScholarDigital Library
{29} J. Platt. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. Advances in Large Margin Classifiers, pages 61-74, 2000.Google Scholar
{30} R. Rada, H. Mili, E. Bichnell, and M. Blettner. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics, 9(1):17-30, 1989.Google ScholarCross Ref
{31} P. Resnik. Using information content to evaluate semantic similarity in a taxonomy. In Proc. of 14th International Joint Conference on Aritificial Intelligence, 1995.Google ScholarDigital Library
{32} P. Resnik. Semantic similarity in a taxonomy: An information based measure and its application to problems of ambiguity in natural language. Journal of Aritificial Intelligence Research, 11:95-130, 1999.Google ScholarDigital Library
{33} P. Resnik and N. A. Smith. The web as a parallel corpus. Computational Linguistics, 29(3):349-380, 2003. Google ScholarDigital Library
{34} R. Rosenfield. A maximum entropy approach to adaptive statistical modelling. Computer Speech and Language, 10:187-228, 1996.Google ScholarCross Ref
{35} H. Rubenstein and J. Goodenough. Contextual correlates of synonymy. Communications of the ACM, 8:627-633, 1965. Google ScholarDigital Library
{36} M. Sahami and T. Heilman. A web-based kernel function for measuring the similarity of short text snippets. In Proc. of 15th International World Wide Web Conference, 2006. Google ScholarDigital Library
{37} H. Schutze. Automatic word sense discrimination. Computational Linguistics, 24(1):97-123, 1998. Google ScholarDigital Library
{38} P. D. Turney. Minning the web for synonyms: Pmi-ir versus lsa on toefl. In Proc. of ECML-2001, pages 491-502, 2001. Google ScholarDigital Library
{39} A. Tversky. Features of similarity. Psychological Review, 84(4):327-352, 1997.Google ScholarCross Ref
{40} B. Vlez, R. Wiess, M. Sheldon, and D. Gifford. Fast and effective query refinement. In Proc. of 20th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 6-15, 1997. Google ScholarDigital Library
{41} D. M. Y. Li, Zuhair A. Bandar. An approch for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering, 15(4):871-882, 2003.Google ScholarDigital Library

Index Terms

Measuring semantic similarity between words using web search engines
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
2. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection
      2. Dictionaries

Recommendations

Overlap Among Major Web Search Engines
ITNG '06: Proceedings of the Third International Conference on Information Technology: New Generations

Our study examined the overlap among results retrieved by three major Web search engines for a large set of more than 10,316 queries. Previous smaller studies have discussed the lack of overlap in results returned by Web search engines for the same ...
Read More
A study of results overlap and uniqueness among major web search engines

The performance and capabilities of Web search engines is an important and significant area of research. Millions of people world wide use Web search engines very day. This paper reports the results of a major study examining the overlap among results ...
Read More
An Empirical Evaluation on Semantic Search Performance of Keyword-Based and Semantic Search Engines: Google, Yahoo, Msn and Hakia
ICIMP '09: Proceedings of the 2009 Fourth International Conference on Internet Monitoring and Protection

This paper investigates the semantic search performance of search engines. Initially, three keyword-based search engines (Google, Yahoo and Msn) and a semantic search engine (Hakia) were selected. Then, ten queries, from various topics, and four phrases,...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '07: Proceedings of the 16th international conference on World Wide Web
May 2007
1382 pages
ISBN:9781595936547
DOI:10.1145/1242572
General Chairs:
Carey Williamson
University of Calgary, Canada
,
Mary Ellen Zurko
IBM, USA
,
Program Chairs:
Peter Patel-Schneider
Bell Labs Research, USA
,
Prashant Shenoy
University of Massachusetts at Amherst, USA
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 May 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 124
  Total Citations
  View Citations
- 4,085
  Total Downloads
- Downloads (Last 12 months)67
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Measuring semantic similarity between words using web search engines

WWW '07: Proceedings of the 16th international conference on World Wide Web

References

Cited By

Index Terms

Recommendations

Overlap Among Major Web Search Engines

A study of results overlap and uniqueness among major web search engines

An Empirical Evaluation on Semantic Search Performance of Keyword-Based and Semantic Search Engines: Google, Yahoo, Msn and Hakia

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Measuring semantic similarity between words using web search engines

WWW '07: Proceedings of the 16th international conference on World Wide Web

References

Cited By

Index Terms

Recommendations

Overlap Among Major Web Search Engines

A study of results overlap and uniqueness among major web search engines

An Empirical Evaluation on Semantic Search Performance of Keyword-Based and Semantic Search Engines: Google, Yahoo, Msn and Hakia

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media