Skip to main content

2015 | OriginalPaper | Buchkapitel

Web Search Results Clustering Using Frequent Termset Mining

verfasst von : Marek Kozlowski

Erschienen in: Pattern Recognition and Machine Intelligence

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We present a novel method for clustering web search results based on frequent termsets mining. First, we acquire the senses of a query by means of a word sense induction method that identify meanings as trees of closed frequent termsets. Then we cluster the search results based on their lexical and semantic intersection with induced senses. We show that our approach is better or comparable with state-of-the-art classical search result clustering methods in terms of both clustering quality and degree of diversification.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Sanderson, M.: Ambiguous queries: test collections need more sense. In: Proceedings of SIGIR, pp. 499–506. ACM, New York (2008) Sanderson, M.: Ambiguous queries: test collections need more sense. In: Proceedings of SIGIR, pp. 499–506. ACM, New York (2008)
2.
Zurück zum Zitat Di Marco, A., Navigli, R.: Clustering web search results with maximum spanning trees. In: Proceedings of the 12th Congress of the Italian Association for Artificial Intelligence, Palermo, pp. 201–212 (2011) Di Marco, A., Navigli, R.: Clustering web search results with maximum spanning trees. In: Proceedings of the 12th Congress of the Italian Association for Artificial Intelligence, Palermo, pp. 201–212 (2011)
3.
Zurück zum Zitat Di Marco, A., Navigli, R.: Clustering and diversifying web search results with graph-based word sense induction. Comput. Linguist. 39(3), 709–754 (2013). MIT PressCrossRef Di Marco, A., Navigli, R.: Clustering and diversifying web search results with graph-based word sense induction. Comput. Linguist. 39(3), 709–754 (2013). MIT PressCrossRef
4.
Zurück zum Zitat Navigli, R., Crisafulli, G.: Inducing word senses to improve web search result clustering. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Boston, pp. 116–126 (2010) Navigli, R., Crisafulli, G.: Inducing word senses to improve web search result clustering. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Boston, pp. 116–126 (2010)
5.
Zurück zum Zitat Bernardini, A., Carpineto, C., DAmico, M.: Full-subtopic retrieval with keyphrasebased search results clustering. In: Proceedings of 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Milan, pp. 206–213 (2009) Bernardini, A., Carpineto, C., DAmico, M.: Full-subtopic retrieval with keyphrasebased search results clustering. In: Proceedings of 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Milan, pp. 206–213 (2009)
6.
Zurück zum Zitat Osinski, S., Weiss, D.: A concept-driven algorithm for clustering search results. IEEE Intell. Syst. 20(3), 48–54 (2005). IEEE PressCrossRef Osinski, S., Weiss, D.: A concept-driven algorithm for clustering search results. IEEE Intell. Syst. 20(3), 48–54 (2005). IEEE PressCrossRef
7.
Zurück zum Zitat Osinski, S., Stefanowski, J., Weiss, D.: Lingo: search results clustering algorithm based on singular value decomposition. In: Proceedings of the International IIS: IIPWM 2004 Conference held in Zakopane, Zakopane, pp. 359–368 (2004) Osinski, S., Stefanowski, J., Weiss, D.: Lingo: search results clustering algorithm based on singular value decomposition. In: Proceedings of the International IIS: IIPWM 2004 Conference held in Zakopane, Zakopane, pp. 359–368 (2004)
8.
Zurück zum Zitat Zamir, O., Etzioni, O.: Web document clustering: a feasibility demonstration. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, pp. 46–54 (1998) Zamir, O., Etzioni, O.: Web document clustering: a feasibility demonstration. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, pp. 46–54 (1998)
9.
Zurück zum Zitat Zamir, O., Etzioni, O.: Grouper: a dynamic clustering interface to web search results. Comput. Netw. 31(11–12), 1361–1374 (1999). ElsevierCrossRef Zamir, O., Etzioni, O.: Grouper: a dynamic clustering interface to web search results. Comput. Netw. 31(11–12), 1361–1374 (1999). ElsevierCrossRef
10.
Zurück zum Zitat Carpineto, C., Osinski, S., Romano, G., Weiss, D.: A survey of web clustering engines. ACM Comput. Surv. 41(3), 1–38 (2009). ACM, New YorkCrossRef Carpineto, C., Osinski, S., Romano, G., Weiss, D.: A survey of web clustering engines. ACM Comput. Surv. 41(3), 1–38 (2009). ACM, New YorkCrossRef
11.
Zurück zum Zitat Swaminathan, A., Cherian, M., Kirovski, D.: Essential pages. In: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, Milan, pp. 173–182 (2009) Swaminathan, A., Cherian, M., Kirovski, D.: Essential pages. In: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, Milan, pp. 173–182 (2009)
12.
Zurück zum Zitat Van Rijsbergen, C.: Information Retrieval. Butterworths, London (1979) Van Rijsbergen, C.: Information Retrieval. Butterworths, London (1979)
13.
Zurück zum Zitat Cutting, D., Karger, D., Pedersen, J., Tukey, J.: Scatter/gather: a cluster based approach to browsing large document collections. In: Proceedings of SIGIR, Copenhagen, pp. 318–329 (1992) Cutting, D., Karger, D., Pedersen, J., Tukey, J.: Scatter/gather: a cluster based approach to browsing large document collections. In: Proceedings of SIGIR, Copenhagen, pp. 318–329 (1992)
14.
Zurück zum Zitat Maarek, I., Fagin, R., Pelleg, D.: Ephemeral document clustering for web applications. IBM Research Report RJ 10186 (2000) Maarek, I., Fagin, R., Pelleg, D.: Ephemeral document clustering for web applications. IBM Research Report RJ 10186 (2000)
15.
Zurück zum Zitat Zhang, X., Hu, X., Zhou, X.: A comparative evaluation of different link types on enhancing document clustering. In: Proceedings of SIGIR, Singapore, pp. 555–562 (2008) Zhang, X., Hu, X., Zhou, X.: A comparative evaluation of different link types on enhancing document clustering. In: Proceedings of SIGIR, Singapore, pp. 555–562 (2008)
16.
Zurück zum Zitat Pantel, P., Lin, D.: Discovering word senses from text. In: Proceedings of the 8th KDD, Edmonton, pp. 613–619 (2002) Pantel, P., Lin, D.: Discovering word senses from text. In: Proceedings of the 8th KDD, Edmonton, pp. 613–619 (2002)
17.
Zurück zum Zitat Denkowski, M.: A survey of techniques for unsupervised word sense induction. Technical report, Language and Statistics II Literature Review (2009) Denkowski, M.: A survey of techniques for unsupervised word sense induction. Technical report, Language and Statistics II Literature Review (2009)
18.
Zurück zum Zitat Kozłowski, M., Rybiński, H.: SnS: a novel word sense induction method. In: Kryszkiewicz, M., Cornelis, C., Ciucci, D., Medina-Moreno, J., Motoda, H., Raś, Z.W. (eds.) RSEISP 2014. LNCS, vol. 8537, pp. 258–268. Springer, Heidelberg (2014) Kozłowski, M., Rybiński, H.: SnS: a novel word sense induction method. In: Kryszkiewicz, M., Cornelis, C., Ciucci, D., Medina-Moreno, J., Motoda, H., Raś, Z.W. (eds.) RSEISP 2014. LNCS, vol. 8537, pp. 258–268. Springer, Heidelberg (2014)
19.
Zurück zum Zitat Kozlowski, M.: Word sense discovery using frequent termsets. PhD in Warsaw University of Technology (2014) Kozlowski, M.: Word sense discovery using frequent termsets. PhD in Warsaw University of Technology (2014)
20.
Zurück zum Zitat Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of SIGIR, Melbourn, pp. 335–336 (1998) Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of SIGIR, Melbourn, pp. 335–336 (1998)
21.
Zurück zum Zitat Zaki, M., Hsiao, C.: CHARM: an efficient algorithm for closed itemset mining. In: Proceedings 2002 SIAM International Conference on Data Mining, Arlington, pp. 457–472 (2002) Zaki, M., Hsiao, C.: CHARM: an efficient algorithm for closed itemset mining. In: Proceedings 2002 SIAM International Conference on Data Mining, Arlington, pp. 457–472 (2002)
22.
Zurück zum Zitat Navigli, R., Vannella, D.: SemEval-2013 task 11: word sense induction and disambiguation within an end-user applications. In: Proceedings of 7th International Workshop on Semantic Evaluation (SemEval), in the Second Joint Conference on Lexical and Computational Semantics, Atlanta, pp. 193–201 (2013) Navigli, R., Vannella, D.: SemEval-2013 task 11: word sense induction and disambiguation within an end-user applications. In: Proceedings of 7th International Workshop on Semantic Evaluation (SemEval), in the Second Joint Conference on Lexical and Computational Semantics, Atlanta, pp. 193–201 (2013)
Metadaten
Titel
Web Search Results Clustering Using Frequent Termset Mining
verfasst von
Marek Kozlowski
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-19941-2_50

Premium Partner