Skip to main content

2015 | OriginalPaper | Buchkapitel

Utilizing Word Embeddings for Result Diversification in Tweet Search

verfasst von : Kezban Dilek Onal, Ismail Sengor Altingovde, Pinar Karagoz

Erschienen in: Information Retrieval Technology

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The performance of result diversification for tweet search suffers from the well-known vocabulary mismatch problem, as tweets are too short and usually informal. As a remedy, we propose to adopt a query and tweet expansion strategy that utilizes automatically-generated word embeddings. Our experiments using state-of-the-art diversification methods on the Tweets2013 corpus reveal encouraging results for expanding queries and/or tweets based on the word embeddings to improve the diversification performance in tweet search. We further show that the expansions based on the word embeddings may serve as useful as those based on a manually constructed knowledge base, namely, ConceptNet.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bandyopadhyay, A., Mitra, M., Majumder, P.: Query expansion for microblog retrieval. In: Proceedings of TREC 2011 (2011) Bandyopadhyay, A., Mitra, M., Majumder, P.: Query expansion for microblog retrieval. In: Proceedings of TREC 2011 (2011)
2.
Zurück zum Zitat Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)MATH Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)MATH
3.
Zurück zum Zitat Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of SIGMOD 2008, pp. 1247–1250 (2008) Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of SIGMOD 2008, pp. 1247–1250 (2008)
4.
Zurück zum Zitat Bouchoucha, A., He, J., Nie, J.: Diversified query expansion using conceptnet. In: Proceedings of CIKM 2013, pp. 1861–1864 (2013) Bouchoucha, A., He, J., Nie, J.: Diversified query expansion using conceptnet. In: Proceedings of CIKM 2013, pp. 1861–1864 (2013)
5.
Zurück zum Zitat Busch, M., Gade, K., Larson, B., Lok, P., Luckenbill, S., Lin, J.: Earlybird: real-time search at twitter. In: Proceedings of ICDE 2012, pp. 1360–1369 (2012) Busch, M., Gade, K., Larson, B., Lok, P., Luckenbill, S., Lin, J.: Earlybird: real-time search at twitter. In: Proceedings of ICDE 2012, pp. 1360–1369 (2012)
6.
Zurück zum Zitat Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of SIGIR 1998, pp. 335–336 (1998) Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of SIGIR 1998, pp. 335–336 (1998)
7.
Zurück zum Zitat Efron, M., Organisciak, P., Fenlon, K.: Improving retrieval of short texts through document expansion. In: Proceedings of SIGIR 2012, pp. 911–920 (2012) Efron, M., Organisciak, P., Fenlon, K.: Improving retrieval of short texts through document expansion. In: Proceedings of SIGIR 2012, pp. 911–920 (2012)
8.
Zurück zum Zitat Gurini, D.F., Gasparetti, F.: TREC microblog, : track: real-time ranking algorithm for microblog ranking systems. In: Proceedings of TREC 2012 (2012) Gurini, D.F., Gasparetti, F.: TREC microblog, : track: real-time ranking algorithm for microblog ranking systems. In: Proceedings of TREC 2012 (2012)
9.
Zurück zum Zitat Kim, Y., Yeniterzi, R., Callan, J.: Overcoming vocabulary limitations in twitter microblogs. In: Proceedings of TREC 2012 (2012) Kim, Y., Yeniterzi, R., Callan, J.: Overcoming vocabulary limitations in twitter microblogs. In: Proceedings of TREC 2012 (2012)
10.
Zurück zum Zitat Vasileiou, Y., Sellis, T., Giannopoulos, G., Koniaris, M.: Diversifying microblog posts. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds.) WISE 2014, Part II. LNCS, vol. 8787, pp. 189–198. Springer, Heidelberg (2014)CrossRef Vasileiou, Y., Sellis, T., Giannopoulos, G., Koniaris, M.: Diversifying microblog posts. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds.) WISE 2014, Part II. LNCS, vol. 8787, pp. 189–198. Springer, Heidelberg (2014)CrossRef
11.
Zurück zum Zitat Liang, F., Qiang, R., Yang, J.: Exploiting real-time information retrieval in the microblogosphere. In: Proceedings of JCDL 2012, pp. 267–276 (2012) Liang, F., Qiang, R., Yang, J.: Exploiting real-time information retrieval in the microblogosphere. In: Proceedings of JCDL 2012, pp. 267–276 (2012)
12.
Zurück zum Zitat Liu, H., Singh, P.: ConceptNet: a practical commonsense reasoning tool-kit. BT Technol. J. 22(4), 211–226 (2004)CrossRefMathSciNet Liu, H., Singh, P.: ConceptNet: a practical commonsense reasoning tool-kit. BT Technol. J. 22(4), 211–226 (2004)CrossRefMathSciNet
13.
Zurück zum Zitat Liu, X., Bouchoucha, A., Sordoni, A., Nie, J.: Compact aspect embedding for diversified query expansions. In: Proceedings of AAAI 2014, pp. 115–121 (2014) Liu, X., Bouchoucha, A., Sordoni, A., Nie, J.: Compact aspect embedding for diversified query expansions. In: Proceedings of AAAI 2014, pp. 115–121 (2014)
14.
Zurück zum Zitat Massoudi, K., Tsagkias, M., de Rijke, M., Weerkamp, W.: Incorporating query expansion and quality indicators in searching microblog posts. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 362–367. Springer, Heidelberg (2011)CrossRef Massoudi, K., Tsagkias, M., de Rijke, M., Weerkamp, W.: Incorporating query expansion and quality indicators in searching microblog posts. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 362–367. Springer, Heidelberg (2011)CrossRef
15.
Zurück zum Zitat McCreadie, R., Macdonald, C.: Relevance in microblogs: enhancing tweet retrieval using hyperlinked documents. In: Proceedings of OAIR 2013, pp. 189–196 (2013) McCreadie, R., Macdonald, C.: Relevance in microblogs: enhancing tweet retrieval using hyperlinked documents. In: Proceedings of OAIR 2013, pp. 189–196 (2013)
17.
Zurück zum Zitat Miyanishi, T., Seki, K., Uehara, K.: Improving pseudo-relevance feedback via tweet selection. In: Proceedings of CIKM 2013, pp. 439–448 (2013) Miyanishi, T., Seki, K., Uehara, K.: Improving pseudo-relevance feedback via tweet selection. In: Proceedings of CIKM 2013, pp. 439–448 (2013)
18.
Zurück zum Zitat Ozsoy, M.G., Onal, K.D., Altingovde, I.S.: Result diversification for tweet search. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds.) WISE 2014, Part II. LNCS, vol. 8787, pp. 78–89. Springer, Heidelberg (2014)CrossRef Ozsoy, M.G., Onal, K.D., Altingovde, I.S.: Result diversification for tweet search. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds.) WISE 2014, Part II. LNCS, vol. 8787, pp. 78–89. Springer, Heidelberg (2014)CrossRef
19.
Zurück zum Zitat Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of EMNLP 2014, pp. 1532–1543 (2014) Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of EMNLP 2014, pp. 1532–1543 (2014)
20.
Zurück zum Zitat R. Qiang, F. Fan, C. Lv, and J. Yang. Knowledge-based query expansion in real-time microblog search. CoRR, 1503.03961 (2015) R. Qiang, F. Fan, C. Lv, and J. Yang. Knowledge-based query expansion in real-time microblog search. CoRR, 1503.​03961 (2015)
21.
Zurück zum Zitat Rodriguez Perez, J.A., McMinn, A.J., Jose, J.M.: University of glasgow (uog\(_{-}\)twteam) at TREC microblog 2013. In: Proceedings of TREC 2013. (2013) Rodriguez Perez, J.A., McMinn, A.J., Jose, J.M.: University of glasgow (uog\(_{-}\)twteam) at TREC microblog 2013. In: Proceedings of TREC 2013. (2013)
22.
Zurück zum Zitat Rodriguez Perez, J.A., Moshfeghi, Y., Jose, J.M.: On using inter-document relations in microblog retrieval. In: Proceedings of WWW 2013, pp. 75–76 (2013) Rodriguez Perez, J.A., Moshfeghi, Y., Jose, J.M.: On using inter-document relations in microblog retrieval. In: Proceedings of WWW 2013, pp. 75–76 (2013)
23.
Zurück zum Zitat Santos, R.L., Macdonald, C., Ounis, I.: Exploiting query reformulations for web search result diversification. In: Proceedings of WWW 2010, pp. 881–890 (2010) Santos, R.L., Macdonald, C., Ounis, I.: Exploiting query reformulations for web search result diversification. In: Proceedings of WWW 2010, pp. 881–890 (2010)
24.
Zurück zum Zitat Tao, K., Abel, F., Hauff, C., Houben, G.-J., Gadiraju, U.: Groundhog day: near-duplicate detection on twitter. In: Proceedings of WWW 2013, pp. 1273–1284 (2013) Tao, K., Abel, F., Hauff, C., Houben, G.-J., Gadiraju, U.: Groundhog day: near-duplicate detection on twitter. In: Proceedings of WWW 2013, pp. 1273–1284 (2013)
25.
Zurück zum Zitat Tao, K., Hauff, C., Houben, G.-J.: Building a microblog corpus for search result diversification. In: Banchs, R.E., Silvestri, F., Liu, T.-Y., Zhang, M., Gao, S., Lang, J. (eds.) AIRS 2013. LNCS, vol. 8281, pp. 251–262. Springer, Heidelberg (2013)CrossRef Tao, K., Hauff, C., Houben, G.-J.: Building a microblog corpus for search result diversification. In: Banchs, R.E., Silvestri, F., Liu, T.-Y., Zhang, M., Gao, S., Lang, J. (eds.) AIRS 2013. LNCS, vol. 8281, pp. 251–262. Springer, Heidelberg (2013)CrossRef
26.
Zurück zum Zitat Teevan, J., Ramage, D., Morris, M.R.: #twittersearch: a comparison of microblog search and web search. In: Proceedings of WSDM 2011, pp. 35–44 (2011) Teevan, J., Ramage, D., Morris, M.R.: #twittersearch: a comparison of microblog search and web search. In: Proceedings of WSDM 2011, pp. 35–44 (2011)
Metadaten
Titel
Utilizing Word Embeddings for Result Diversification in Tweet Search
verfasst von
Kezban Dilek Onal
Ismail Sengor Altingovde
Pinar Karagoz
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-28940-3_29

Neuer Inhalt