Article

The earth mover's distance as a semantic measure for document similarity

Authors:
Xiaojun Wan

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Yuxin Peng

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge managementOctober 2005Pages 301–302https://doi.org/10.1145/1099554.1099637

Published:31 October 2005Publication History

CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

Pages 301–302

ABSTRACT

Different words are usually assumed to be semantically independent in most existing similarity measures, which is not often true in practice. The semantic relatedness between words cannot be conveniently employed in the existing measures. We propose a novel similarity measure based on the earth mover's distance (EMD). In the proposed measure, the semantic distances between words are computed based on the electronic lexical database-WordNet and then the EMD is employed to calculate the document similarity with a many-to-many matching between words. Experiments and results demonstrate the effectiveness of the proposed similarity measure.

References

Allan, J., Carbonell, J., Doddington, G., Yamron, J. P., and Yang, Y. Topic detection and tracking pilot study: final report. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, 194--218,1998.Google Scholar
Aslam, J. A., and Frost, M. An information-theoretic measure for document similarity. In Proc. of SIGIR'03, Toronto, Canada, 2003. Google ScholarDigital Library
Patwardhan, S. Incorporating dictionary and corpus information into a context vector measure of semantic relatedness. Master's thesis, Univ. of Minnesota, Duluth, 2003.Google Scholar
Rubner, Y., Tomasi, C. and Guibas, L. The Earth Mover's Distance as a Metric for Image Retrieval. Int. Journal of Computer Vision, Vol. 40, No. 2, pp. 99--121, 2000. Google ScholarDigital Library

Index Terms

The earth mover's distance as a semantic measure for document similarity
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

A novel document similarity measure based on earth mover's distance

In this paper we propose a novel measure based on the earth mover's distance (EMD) to evaluate document similarity by allowing many-to-many matching between subtopics. First, each document is decomposed into a set of subtopics, and then the EMD is ...
Read More
A semantic similarity measure in document databases: an earth mover's distance-based approach
RACS '13: Proceedings of the 2013 Research in Adaptive and Convergent Systems

Measuring document similarity is important in order to find documents which are similar to a given query document from a user. Text-based document similarity is measured by comparing the words in two documents. The representative text-based document ...
Read More
A novel similarity/dissimilarity measure for intuitionistic fuzzy sets and its application in pattern recognition

Among the most interesting measures in intuitionistic fuzzy sets (IFSs) theory, the similarity measure is an essential tool to compare and determine degree of similarity between IFSs. Although there exist many similarity measures for IFSs, most of them ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management
October 2005
854 pages
ISBN:1595931406
DOI:10.1145/1099554
General Chair:
Otthein Herzog
University of Bremen, Germany
,
Program Chairs:
Hans-Jörg Schek
University for Health Sciences, Medical Informatics and Technology, Austria
,
Norbert Fuhr
University of Duisburg-Essen, Germany
,
Abdur Chowdhury
America Online, USA
,
Wilfried Teiken
IBM T.J. Watson Research Center, USA
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 October 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
earth mover's distance
similarity measure
Qualifiers
- Article
Conference

Acceptance Rates
CIKM '05 Paper Acceptance Rate77of425submissions,18%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 909
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The earth mover's distance as a semantic measure for document similarity

CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

A novel document similarity measure based on earth mover's distance

A semantic similarity measure in document databases: an earth mover's distance-based approach

A novel similarity/dissimilarity measure for intuitionistic fuzzy sets and its application in pattern recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

The earth mover's distance as a semantic measure for document similarity

CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

A novel document similarity measure based on earth mover's distance

A semantic similarity measure in document databases: an earth mover's distance-based approach

A novel similarity/dissimilarity measure for intuitionistic fuzzy sets and its application in pattern recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media