Skip to main content
Erschienen in: Discover Computing 3/2007

01.06.2007

Linear feature-based models for information retrieval

verfasst von: Donald Metzler, W. Bruce Croft

Erschienen in: Discover Computing | Ausgabe 3/2007

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

There have been a number of linear, feature-based models proposed by the information retrieval community recently. Although each model is presented differently, they all share a common underlying framework. In this paper, we explore and discuss the theoretical issues of this framework, including a novel look at the parameter space. We then detail supervised training algorithms that directly maximize the evaluation metric under consideration, such as mean average precision. We present results that show training models in this way can lead to significantly better test set performance compared to other training methods that do not directly maximize the metric. Finally, we show that linear feature-based models can consistently and significantly outperform current state of the art retrieval models with the correct choice of features.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
We assume ties are broken by document id.
 
Literatur
Zurück zum Zitat Baeza-Yates, R., & Navarro, G. (1999). Modern information retrieval. New York: Addison-Wesley. Baeza-Yates, R., & Navarro, G. (1999). Modern information retrieval. New York: Addison-Wesley.
Zurück zum Zitat Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117. Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117.
Zurück zum Zitat Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., et al. (2005). Learning to rank using gradient descent. In ICML’05: Proceedings of the 22nd International Conference on Machine Learning (pp. 89–96). Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., et al. (2005). Learning to rank using gradient descent. In ICML’05: Proceedings of the 22nd International Conference on Machine Learning (pp. 89–96).
Zurück zum Zitat Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167. Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.
Zurück zum Zitat Clarke, C., Craswell, N., & Soboroff, I. (2004). Overview of the TREC 2004 Terabyte Track. In Online Proceedings of the 2004 Text Retrieval Conference. Clarke, C., Craswell, N., & Soboroff, I. (2004). Overview of the TREC 2004 Terabyte Track. In Online Proceedings of the 2004 Text Retrieval Conference.
Zurück zum Zitat Craswell, N., Robertson, S., Zaragoza, H., & Taylor, M. (2005). Relevance weighting for query independent evidence. In Proceedings of the 28th Annual international ACM SIGIR conference on Research and Development in Information Retrieval (pp. 416–423). Craswell, N., Robertson, S., Zaragoza, H., & Taylor, M. (2005). Relevance weighting for query independent evidence. In Proceedings of the 28th Annual international ACM SIGIR conference on Research and Development in Information Retrieval (pp. 416–423).
Zurück zum Zitat Cronen-Townsend, S., Zhou, Y., & Croft, W. B. (2002). Predicting query performance. In SIGIR’02: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 299–306). Cronen-Townsend, S., Zhou, Y., & Croft, W. B. (2002). Predicting query performance. In SIGIR’02: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 299–306).
Zurück zum Zitat Gao, J., Qi, H., Xia, X., & Nie, J.-Y. (2005). Linear discriminant model for information retrieval. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 290–297). Gao, J., Qi, H., Xia, X., & Nie, J.-Y. (2005). Linear discriminant model for information retrieval. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 290–297).
Zurück zum Zitat Gey, F. (1994). Inferring probability of relevance using the method of logistic regression. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 222–231). Gey, F. (1994). Inferring probability of relevance using the method of logistic regression. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 222–231).
Zurück zum Zitat Harman, D. (2004). Overview of the TREC 2002 novelty track. In Proceedings of the 2002 Text Retrieval Conference. Harman, D. (2004). Overview of the TREC 2002 novelty track. In Proceedings of the 2002 Text Retrieval Conference.
Zurück zum Zitat Joachims, T. (2002). Optimizing search engines using clickthrough data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 133–142). Joachims, T. (2002). Optimizing search engines using clickthrough data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 133–142).
Zurück zum Zitat Joachims, T. (2005). A support vector method for multivariate performance measures. In Proceedings of the International Conference on Machine Learning (pp. 377–384). Joachims, T. (2005). A support vector method for multivariate performance measures. In Proceedings of the International Conference on Machine Learning (pp. 377–384).
Zurück zum Zitat Joachims, T., Granka, L., Pan, B., Hembrooke, H., & Gay, G. (2005). Accurately interpreting clickthrough Data as implicit feedback. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 154–161). Joachims, T., Granka, L., Pan, B., Hembrooke, H., & Gay, G. (2005). Accurately interpreting clickthrough Data as implicit feedback. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 154–161).
Zurück zum Zitat Kraaij, W., Westerveld, T., & Hiemstra, D. (2002). The importance of prior probabilities for entry page search. In Proceedings of SIGIR 2002 (pp. 27–34). Kraaij, W., Westerveld, T., & Hiemstra, D. (2002). The importance of prior probabilities for entry page search. In Proceedings of SIGIR 2002 (pp. 27–34).
Zurück zum Zitat Lafferty, J., & Zhai, C. (2001). Document language models, query models, and risk minimization for information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 111–119). Lafferty, J., & Zhai, C. (2001). Document language models, query models, and risk minimization for information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 111–119).
Zurück zum Zitat Lebanon, G., & Lafferty, J. (2004). Hyperplane margin classifiers on the multinomial manifold. In Proceedings of the Twenty-First International Conference on Machine Learning (pp. 66–71). Lebanon, G., & Lafferty, J. (2004). Hyperplane margin classifiers on the multinomial manifold. In Proceedings of the Twenty-First International Conference on Machine Learning (pp. 66–71).
Zurück zum Zitat Matveeva, I., Burges, C., Burkard, T., Laucius, A., & Wong, L. (2006). High accuracy retrieval with multiple nested ranker. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 437–444). Matveeva, I., Burges, C., Burkard, T., Laucius, A., & Wong, L. (2006). High accuracy retrieval with multiple nested ranker. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 437–444).
Zurück zum Zitat Metzler, D., & Croft, W. B. (2005). A Markov random field model for term dependencies. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 472–479). Metzler, D., & Croft, W. B. (2005). A Markov random field model for term dependencies. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 472–479).
Zurück zum Zitat Metzler, D., Strohman, T., Turtle, H., & Croft, W. B. (2004). Indri at Terabyte Track 2004. In Online Proceedings of the 2004 Text Retrieval Conference. Metzler, D., Strohman, T., Turtle, H., & Croft, W. B. (2004). Indri at Terabyte Track 2004. In Online Proceedings of the 2004 Text Retrieval Conference.
Zurück zum Zitat Metzler, D., Strohman, T., Zhou, Y., & Croft, W. B. (2005). Indri at Terabyte Track 2005. In Online Proceedings of the 2005 Text Retrieval Conference. Metzler, D., Strohman, T., Zhou, Y., & Croft, W. B. (2005). Indri at Terabyte Track 2005. In Online Proceedings of the 2005 Text Retrieval Conference.
Zurück zum Zitat Mishne, G., & de Rijke, M. (2005). Boosting Web retrieval through query operators. In Proceedings of the 27th European Conference on Information Retrieval (pp. 502–516). Mishne, G., & de Rijke, M. (2005). Boosting Web retrieval through query operators. In Proceedings of the 27th European Conference on Information Retrieval (pp. 502–516).
Zurück zum Zitat Morgan, W., Greiff, W., & Henderson, J. (2004). Direct maximization of average precision by hill-climbing with a comparison to a maximum entropy approach, Technical report, MITRE, http://www.mitre.org/work/tech_papers/tech_papers_04/morgan_hill/morgan_hill.pdf Morgan, W., Greiff, W., & Henderson, J. (2004). Direct maximization of average precision by hill-climbing with a comparison to a maximum entropy approach, Technical report, MITRE, http://​www.​mitre.​org/​work/​tech_​papers/​tech_​papers_​04/​morgan_​hill/​morgan_​hill.​pdf
Zurück zum Zitat Morik, K., Brockhausen, P., & Joachims, T. (1999). Combining statistical learning with a knowledge-based approach—A case study in intensive care monitoring. In Proceedings of the 16th International Conference on Machine Learning (pp. 268–277). Morik, K., Brockhausen, P., & Joachims, T. (1999). Combining statistical learning with a knowledge-based approach—A case study in intensive care monitoring. In Proceedings of the 16th International Conference on Machine Learning (pp. 268–277).
Zurück zum Zitat Nallapati, R. (2004). Discriminative models for information retrieval. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 64–71). Nallapati, R. (2004). Discriminative models for information retrieval. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 64–71).
Zurück zum Zitat Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 79–86). Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 79–86).
Zurück zum Zitat Pietra, S. D., Pietra, V. D., & Lafferty, J. (1997). Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4), 380–393. Pietra, S. D., Pietra, V. D., & Lafferty, J. (1997). Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4), 380–393.
Zurück zum Zitat Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 275–281). Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 275–281).
Zurück zum Zitat Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (1992). Numerical recipes in C: The art of scientific computing. Cambridge, UK: Cambridge University Press, ISBN 0521431085. Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (1992). Numerical recipes in C: The art of scientific computing. Cambridge, UK: Cambridge University Press, ISBN 0521431085.
Zurück zum Zitat Robertson, S., Walker, S., Beaulieu, M. M., & Gatford, M. (1995). Okapi at TREC-4. In Online Proceedings of the Fourth Text Retrieval Conference (pp. 73–96). Robertson, S., Walker, S., Beaulieu, M. M., & Gatford, M. (1995). Okapi at TREC-4. In Online Proceedings of the Fourth Text Retrieval Conference (pp. 73–96).
Zurück zum Zitat Shen, X., & Zhai, C. (2005). Active feedback in ad hoc information retrieval. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 59–66). Shen, X., & Zhai, C. (2005). Active feedback in ad hoc information retrieval. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 59–66).
Zurück zum Zitat Si, L., & Callan, J. (2001). A statistical model for scientific readability. In CIKM’01: Proceedings of the Tenth International Conference on Information and Knowledge Management (pp. 574–576). Si, L., & Callan, J. (2001). A statistical model for scientific readability. In CIKM’01: Proceedings of the Tenth International Conference on Information and Knowledge Management (pp. 574–576).
Zurück zum Zitat Zhai, C. (2002). Risk minimization and language modeling in information retrieval. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, http://www.cs.cmu.edu/~czhai/thesis.pdf. Zhai, C. (2002). Risk minimization and language modeling in information retrieval. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, http://​www.​cs.​cmu.​edu/​~czhai/​thesis.​pdf.​
Zurück zum Zitat Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to ad-hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 334–342). Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to ad-hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 334–342).
Zurück zum Zitat Zhang, D., Chen, X., & Lee, W. S. (2005). Text classification with kernels on the multinomial manifold. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 266–273). Zhang, D., Chen, X., & Lee, W. S. (2005). Text classification with kernels on the multinomial manifold. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 266–273).
Zurück zum Zitat Zhou, Y., & Croft, W. B. (2005). Document quality models for Web ad hoc retrieval. In CIKM’05: Proceedings of the 14th ACM International Conference on Information and Knowledge Management (pp. 331–332). Zhou, Y., & Croft, W. B. (2005). Document quality models for Web ad hoc retrieval. In CIKM’05: Proceedings of the 14th ACM International Conference on Information and Knowledge Management (pp. 331–332).
Metadaten
Titel
Linear feature-based models for information retrieval
verfasst von
Donald Metzler
W. Bruce Croft
Publikationsdatum
01.06.2007
Verlag
Kluwer Academic Publishers
Erschienen in
Discover Computing / Ausgabe 3/2007
Print ISSN: 2948-2984
Elektronische ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-006-9019-z

Weitere Artikel der Ausgabe 3/2007

Discover Computing 3/2007 Zur Ausgabe

Premium Partner