Skip to main content
Erschienen in: Information Systems Frontiers 3/2013

01.07.2013

Semantic similarity measurement using historical google search patterns

verfasst von: Jorge Martinez-Gil, José F. Aldana-Montes

Erschienen in: Information Systems Frontiers | Ausgabe 3/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Computing the semantic similarity between terms (or short text expressions) that have the same meaning but which are not lexicographically similar is an important challenge in the information integration field. The problem is that techniques for textual semantic similarity measurement often fail to deal with words not covered by synonym dictionaries. In this paper, we try to solve this problem by determining the semantic similarity for terms using the knowledge inherent in the search history logs from the Google search engine. To do this, we have designed and evaluated four algorithmic methods for measuring the semantic similarity between terms using their associated history search patterns. These algorithmic methods are: a) frequent co-occurrence of terms in search patterns, b) computation of the relationship between search patterns, c) outlier coincidence on search patterns, and d) forecasting comparisons. We have shown experimentally that some of these methods correlate well with respect to human judgment when evaluating general purpose benchmark datasets, and significantly outperform existing methods when evaluating datasets containing terms that do not usually appear in dictionaries.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Aitken, A. (2007). Statistical mathematics. Oliver & Boyd. Aitken, A. (2007). Statistical mathematics. Oliver & Boyd.
Zurück zum Zitat Badea, B., & Vlad, A. (2006). Revealing Statistical Independence of Two Experimental Data Sets: An Improvement on Spearman’s Algorithm. In ICCSA (pp. 1166–1176). Badea, B., & Vlad, A. (2006). Revealing Statistical Independence of Two Experimental Data Sets: An Improvement on Spearman’s Algorithm. In ICCSA (pp. 1166–1176).
Zurück zum Zitat Banek, M., Vrdoljak, B., Min Tjoa, A., Skocir, Z. (2007). Automating the Schema Matching Process for Heterogeneous Data Warehouses. In DaWaK (pp. 45–54). 596 Banek, M., Vrdoljak, B., Min Tjoa, A., Skocir, Z. (2007). Automating the Schema Matching Process for Heterogeneous Data Warehouses. In DaWaK (pp. 45–54). 596
Zurück zum Zitat Banek, M., Vrdoljak, B., Tjoa, A.M. (2007). Using Ontologies for Measuring Semantic Similarity in Data Warehouse Schema Matching Process. In CONTEL (pp. 227–234). Banek, M., Vrdoljak, B., Tjoa, A.M. (2007). Using Ontologies for Measuring Semantic Similarity in Data Warehouse Schema Matching Process. In CONTEL (pp. 227–234).
Zurück zum Zitat Banerjee, S., & Pedersen, T. (2003). Extended Gloss Overlaps as a Measure of Semantic Relatedness. In IJCAI (pp. 805–810). Banerjee, S., & Pedersen, T. (2003). Extended Gloss Overlaps as a Measure of Semantic Relatedness. In IJCAI (pp. 805–810).
Zurück zum Zitat Bollegala, D., Matsuo, Y., Ishizuka, M. (2007). Measuring semantic similarity between words using web search engines. In WWW (pp. 757–766). Bollegala, D., Matsuo, Y., Ishizuka, M. (2007). Measuring semantic similarity between words using web search engines. In WWW (pp. 757–766).
Zurück zum Zitat Bollegala, D., Honma, T., Matsuo, Y., Ishizuka, M. (2008). Mining for personal name aliases on the web. In WWW (pp. 1107–1108). Bollegala, D., Honma, T., Matsuo, Y., Ishizuka, M. (2008). Mining for personal name aliases on the web. In WWW (pp. 1107–1108).
Zurück zum Zitat Brin, S., & Page, L. (1998). The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks, 30(1–7), 107–117. Brin, S., & Page, L. (1998). The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks, 30(1–7), 107–117.
Zurück zum Zitat Budanitsky, A., & Hirst, G. (2006). Evaluating WordNet-based Measures of Lexical Semantic Relatedness. Computational Linguistics, 32(1), 13–47.CrossRef Budanitsky, A., & Hirst, G. (2006). Evaluating WordNet-based Measures of Lexical Semantic Relatedness. Computational Linguistics, 32(1), 13–47.CrossRef
Zurück zum Zitat Choi, H., & Varian, H. (2009). Predicting the present with Google Trends. Technical Report, Economics Research Group, Google. Choi, H., & Varian, H. (2009). Predicting the present with Google Trends. Technical Report, Economics Research Group, Google.
Zurück zum Zitat Cilibrasi, R., & Vitányi, P.M. (2007). The Google Similarity Distance. IEEE Transactions on Knowledge and Data Engineering, 19(3), 370–383.CrossRef Cilibrasi, R., & Vitányi, P.M. (2007). The Google Similarity Distance. IEEE Transactions on Knowledge and Data Engineering, 19(3), 370–383.CrossRef
Zurück zum Zitat Dhurandhar, A. (2011). Improving predictions using aggregate information. In KDD (pp. 1118–1126). Dhurandhar, A. (2011). Improving predictions using aggregate information. In KDD (pp. 1118–1126).
Zurück zum Zitat Egghe, L., & Leydesdorff, L. (2009). The relation between Pearson’s correlation coefficient r and Salton’s cosine measure CoRR abs/0911.1318. Egghe, L., & Leydesdorff, L. (2009). The relation between Pearson’s correlation coefficient r and Salton’s cosine measure CoRR abs/0911.1318.
Zurück zum Zitat Fong, J., Shiu, H., Cheung, D. (2009). A relational-XML data warehouse for data aggregation with SQL and XQuery. Software, Practice and Experience, 38(11), 1183–1213.CrossRef Fong, J., Shiu, H., Cheung, D. (2009). A relational-XML data warehouse for data aggregation with SQL and XQuery. Software, Practice and Experience, 38(11), 1183–1213.CrossRef
Zurück zum Zitat Grubbs, F. (1969). Procedures for Detecting Outlying Observations in Samples. Technometrics, 11(1), 1–21.CrossRef Grubbs, F. (1969). Procedures for Detecting Outlying Observations in Samples. Technometrics, 11(1), 1–21.CrossRef
Zurück zum Zitat Hliaoutakis, A., Varelas, G., Petrakis, E.G.M.,Milios, E. (2006). Med-Search: A Retrieval System for Medical Information Based on Semantic Similarity. In ECDL (pp. 512–515). Hliaoutakis, A., Varelas, G., Petrakis, E.G.M.,Milios, E. (2006). Med-Search: A Retrieval System for Medical Information Based on Semantic Similarity. In ECDL (pp. 512–515).
Zurück zum Zitat Hu, N., Bose, I., Koh, N.S., Liu, L. (2012). Manipulation of online reviews: An analysis of ratings, readability, and sentiments. Decision Support Systems (DSS), 52(3), 674–684.CrossRef Hu, N., Bose, I., Koh, N.S., Liu, L. (2012). Manipulation of online reviews: An analysis of ratings, readability, and sentiments. Decision Support Systems (DSS), 52(3), 674–684.CrossRef
Zurück zum Zitat Hjorland, H. (2007). Semantics and knowledge organization. ARIST, 41(1), 367–405. Hjorland, H. (2007). Semantics and knowledge organization. ARIST, 41(1), 367–405.
Zurück zum Zitat Jung, J.J., & Thanh Nguyen, N. (2008). Collective Intelligence for Semantic and Knowledge Grid. Journal of Universal Computer Science (JUCS), 14(7), 1016–1019. Jung, J.J., & Thanh Nguyen, N. (2008). Collective Intelligence for Semantic and Knowledge Grid. Journal of Universal Computer Science (JUCS), 14(7), 1016–1019.
Zurück zum Zitat Kopcke, H., Thor, A., Rahm, E. (2010). Evaluation of entity resolution approaches on real-world match problems. PVLDB, 3(1), 484–493. Kopcke, H., Thor, A., Rahm, E. (2010). Evaluation of entity resolution approaches on real-world match problems. PVLDB, 3(1), 484–493.
Zurück zum Zitat Leacock, C., Chodorow, M., Miller, G.A. (1998). Using Corpus Statistics and WordNet Relations for Sense Identification. Computational Linguistics, 24(1), 147–165. Leacock, C., Chodorow, M., Miller, G.A. (1998). Using Corpus Statistics and WordNet Relations for Sense Identification. Computational Linguistics, 24(1), 147–165.
Zurück zum Zitat Lesk, M. (1986). Information in Data: Using the Oxford English Dictionary on a Computer. SIGIR Forum, 20(1–4), 18–21.CrossRef Lesk, M. (1986). Information in Data: Using the Oxford English Dictionary on a Computer. SIGIR Forum, 20(1–4), 18–21.CrossRef
Zurück zum Zitat Li, J., Alan Wang, G., Chen, H. (2011). Identity matching using personal and social identity features. Information Systems Frontiers, 13(1), 101–113.CrossRef Li, J., Alan Wang, G., Chen, H. (2011). Identity matching using personal and social identity features. Information Systems Frontiers, 13(1), 101–113.CrossRef
Zurück zum Zitat Li, Y., Bandar, A., McLean, D. (2003). An approach for Measuring Semantic Similarity between Words Using Multiple Information Sources. IEEE Transactions on Knowledge and Data Engineering, 15(4), 871–882.CrossRef Li, Y., Bandar, A., McLean, D. (2003). An approach for Measuring Semantic Similarity between Words Using Multiple Information Sources. IEEE Transactions on Knowledge and Data Engineering, 15(4), 871–882.CrossRef
Zurück zum Zitat Liu, B., & Zhang, L. (2012). A Survey of Opinion Mining and Sentiment Analysis. In Mining Text Data (pp. 415–463). Liu, B., & Zhang, L. (2012). A Survey of Opinion Mining and Sentiment Analysis. In Mining Text Data (pp. 415–463).
Zurück zum Zitat Miller, G., & Charles, W. (1991). Contextual Correlates of Semantic Similarity. Language and Cognitive Processes, 6(1), 1–28.CrossRef Miller, G., & Charles, W. (1991). Contextual Correlates of Semantic Similarity. Language and Cognitive Processes, 6(1), 1–28.CrossRef
Zurück zum Zitat Nandi, A., & Bernstein, P.A. (2009). HAMSTER: Using Search Click- logs for Schema and Taxonomy Matching. PVLDB, 2(1), 181–192. Nandi, A., & Bernstein, P.A. (2009). HAMSTER: Using Search Click- logs for Schema and Taxonomy Matching. PVLDB, 2(1), 181–192.
Zurück zum Zitat Patuwo, B.E., & Hu, M. (1998) Forecasting with artificial neural networks: The state of the art. International Journal of Forecasting, 14(1), 35–62.CrossRef Patuwo, B.E., & Hu, M. (1998) Forecasting with artificial neural networks: The state of the art. International Journal of Forecasting, 14(1), 35–62.CrossRef
Zurück zum Zitat Patwardhan, S., Banerjee, S., Pedersen, T. (2003). Using Measures of Semantic Relatedness for Word Sense Disambiguation. In CICLing (pp. 241–257). Patwardhan, S., Banerjee, S., Pedersen, T. (2003). Using Measures of Semantic Relatedness for Word Sense Disambiguation. In CICLing (pp. 241–257).
Zurück zum Zitat Pedersen, T., Patwardhan, S., Michelizzi, J. (2004). Word-Net::Similarity - Measuring the Relatedness of Concepts. In AAAI (pp. 1024–1025). Pedersen, T., Patwardhan, S., Michelizzi, J. (2004). Word-Net::Similarity - Measuring the Relatedness of Concepts. In AAAI (pp. 1024–1025).
Zurück zum Zitat Petrakis, E.G.M., Varelas, G., Hliaoutakis, A., Raftopoulou, P. (2006). X-Similarity: Computing Semantic Similarity between Concepts from Different Ontologies. JDIM, 4(4), 233–237. Petrakis, E.G.M., Varelas, G., Hliaoutakis, A., Raftopoulou, P. (2006). X-Similarity: Computing Semantic Similarity between Concepts from Different Ontologies. JDIM, 4(4), 233–237.
Zurück zum Zitat Pirro, G. (2009). A semantic similarity metric combining features and intrinsic information content. Data and Knowledge Engineering, 68(11), 1289–1308.CrossRef Pirro, G. (2009). A semantic similarity metric combining features and intrinsic information content. Data and Knowledge Engineering, 68(11), 1289–1308.CrossRef
Zurück zum Zitat Resnik, P. (1995). Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In IJCAI (pp. 448–453). Resnik, P. (1995). Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In IJCAI (pp. 448–453).
Zurück zum Zitat Retzer, S., Yoong, P., Hooper, V. (2012). Inter-organisational knowledge transfer in social networks: A definition of intermediate ties. Information Systems Frontiers, 14(2), 343–361.CrossRef Retzer, S., Yoong, P., Hooper, V. (2012). Inter-organisational knowledge transfer in social networks: A definition of intermediate ties. Information Systems Frontiers, 14(2), 343–361.CrossRef
Zurück zum Zitat Rousseeuw, P.J., & Leroy, A.M. (2005). Robust Regression and Outlier Detection: John Wiley & Sons Inc. Rousseeuw, P.J., & Leroy, A.M. (2005). Robust Regression and Outlier Detection: John Wiley & Sons Inc.
Zurück zum Zitat Sanchez, D., Batet, M., Valls, A. (2010). Web-Based Semantic Similarity: An Evaluation in the Biomedical Domain. International Journal of Software and Informatics, 4(1), 39–52. Sanchez, D., Batet, M., Valls, A. (2010). Web-Based Semantic Similarity: An Evaluation in the Biomedical Domain. International Journal of Software and Informatics, 4(1), 39–52.
Zurück zum Zitat Sanchez, D., Batet, M., Valls, A., Gibert, K. (2010). Ontology-driven web-based semantic similarity. Journal of Intelligent Information Systems, 35(3), 383–413.CrossRef Sanchez, D., Batet, M., Valls, A., Gibert, K. (2010). Ontology-driven web-based semantic similarity. Journal of Intelligent Information Systems, 35(3), 383–413.CrossRef
Zurück zum Zitat Scarlat, E., & Maries, I. (2009). Towards an Increase of Collective Intelligence within Organizations Using Trust and Reputation Models. In ICCCI (pp. 140–151). Scarlat, E., & Maries, I. (2009). Towards an Increase of Collective Intelligence within Organizations Using Trust and Reputation Models. In ICCCI (pp. 140–151).
Zurück zum Zitat Sparck Jones, K. (2006). Collective Intelligence: It’s All in the Numbers. IEEE Intelligent Systems (EXPERT), 21(3), 64–65.CrossRef Sparck Jones, K. (2006). Collective Intelligence: It’s All in the Numbers. IEEE Intelligent Systems (EXPERT), 21(3), 64–65.CrossRef
Zurück zum Zitat Tuan Duc, N., Bollegala, D., Ishizuka, M. (2010). Using Relational Similarity between Word Pairs for Latent Relational Search on the Web. In Web Intelligence (pp. 196–199). Tuan Duc, N., Bollegala, D., Ishizuka, M. (2010). Using Relational Similarity between Word Pairs for Latent Relational Search on the Web. In Web Intelligence (pp. 196–199).
Metadaten
Titel
Semantic similarity measurement using historical google search patterns
verfasst von
Jorge Martinez-Gil
José F. Aldana-Montes
Publikationsdatum
01.07.2013
Verlag
Springer US
Erschienen in
Information Systems Frontiers / Ausgabe 3/2013
Print ISSN: 1387-3326
Elektronische ISSN: 1572-9419
DOI
https://doi.org/10.1007/s10796-012-9404-7

Weitere Artikel der Ausgabe 3/2013

Information Systems Frontiers 3/2013 Zur Ausgabe