Skip to main content
Top
Published in: Discover Computing 3/2011

01-06-2011 | Web Mining for Search

A pattern mining approach for information filtering systems

Authors: Yuefeng Li, Abdulmohsen Algarni, Yue Xu

Published in: Discover Computing | Issue 3/2011

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

It is a big challenge to clearly identify the boundary between positive and negative streams for information filtering systems. Several attempts have used negative feedback to solve this challenge; however, there are two issues for using negative relevance feedback to improve the effectiveness of information filtering. The first one is how to select constructive negative samples in order to reduce the space of negative documents. The second issue is how to decide noisy extracted features that should be updated based on the selected negative samples. This paper proposes a pattern mining based approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features. It also classifies extracted features (i.e., terms) into three categories: positive specific terms, general terms, and negative specific terms. In this way, multiple revising strategies can be used to update extracted features. An iterative learning algorithm is also proposed to implement this approach on the RCV1 data collection, and substantial experiments show that the proposed approach achieves encouraging performance and the performance is also consistent for adaptive filtering as well.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of 27th international conference on very large databases (VLDB’01), (pp. 478–499). Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of 27th international conference on very large databases (VLDB’01), (pp. 478–499).
2.
go back to reference Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Reading: Addison Wesley. Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Reading: Addison Wesley.
3.
go back to reference Belkin, N. J., & Croft, W. B. (1992). Information filtering and information retrieval: two sides of the same coin? Communications of the ACM, 35(12), 29–38.CrossRef Belkin, N. J., & Croft, W. B. (1992). Information filtering and information retrieval: two sides of the same coin? Communications of the ACM, 35(12), 29–38.CrossRef
4.
go back to reference Fu, X., Budzik, J., & Hammond, K. J. (2000). Mining navigation history for recommendation. In Proceedings of the 5th international conference on Intelligent user interfaces (IUI’00), (pp. 106–112). Fu, X., Budzik, J., & Hammond, K. J. (2000). Mining navigation history for recommendation. In Proceedings of the 5th international conference on Intelligent user interfaces (IUI’00), (pp. 106–112).
5.
go back to reference Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. In Proceedings of 2000 ACM SIGMOD international conference on management of data (SIGMOD’00), (pp. 1–12). Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. In Proceedings of 2000 ACM SIGMOD international conference on management of data (SIGMOD’00), (pp. 1–12).
6.
go back to reference Iwayama, M. (2000). Relevance feedback with a small number of relevance judgements: Incremental relevance feedback vs. document clusting. In Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’00), (pp. 10–16). Iwayama, M. (2000). Relevance feedback with a small number of relevance judgements: Incremental relevance feedback vs. document clusting. In Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’00), (pp. 10–16).
7.
go back to reference Jindal, N., & Liu, B. (2006). Identifying comparative sentences in text documents. In Proceedings of SIGIR’06, (pp. 244–251). Jindal, N., & Liu, B. (2006). Identifying comparative sentences in text documents. In Proceedings of SIGIR’06, (pp. 244–251).
8.
go back to reference Jones, K. S., Walker, S., & Robertson, S. E. (2000). A probabilistic model of information retrieval: development and comparative experiments—part 1. Information Processing and Management, 36(6), 779–808.CrossRef Jones, K. S., Walker, S., & Robertson, S. E. (2000). A probabilistic model of information retrieval: development and comparative experiments—part 1. Information Processing and Management, 36(6), 779–808.CrossRef
9.
go back to reference Lau, R. Y. K., Bruza, P., & Song, D. (2004). Belief revision for adaptive information retrieval. In Proceedings of SIGIR’04, (pp. 130–137). Lau, R. Y. K., Bruza, P., & Song, D. (2004). Belief revision for adaptive information retrieval. In Proceedings of SIGIR’04, (pp. 130–137).
10.
go back to reference Lavrenko, V., & Croft, W. (2001). Relevance-based language models. In Proceedings of SIGIR’01, (pp. 120–127). Lavrenko, V., & Croft, W. (2001). Relevance-based language models. In Proceedings of SIGIR’01, (pp. 120–127).
11.
go back to reference Lewis, D. D. (1992). An evaluation of phrasal and clustered representations on a text categorization task. In Proceedings of SIGIR’92, (pp. 37–50). Lewis, D. D. (1992). An evaluation of phrasal and clustered representations on a text categorization task. In Proceedings of SIGIR’92, (pp. 37–50).
12.
go back to reference Li, X. & Liu, B. (2003). Learning to classify texts using positive and unlabeled data. In Proceedings of international joint conference on artificial intelligence (IJCAI’03), (pp. 587–594). Li, X. & Liu, B. (2003). Learning to classify texts using positive and unlabeled data. In Proceedings of international joint conference on artificial intelligence (IJCAI’03), (pp. 587–594).
13.
go back to reference Li, Y., Algarni, A., Wu, S.-T., & Xu, Y. (2009). Mining negative relevance feedback for information filtering. In Proceedings of 2009 IEEE/WIC/ACM international conference on web intelligence, (pp. 606–613). Li, Y., Algarni, A., Wu, S.-T., & Xu, Y. (2009). Mining negative relevance feedback for information filtering. In Proceedings of 2009 IEEE/WIC/ACM international conference on web intelligence, (pp. 606–613).
14.
go back to reference Li, Y., & Zhong, N. (2006). Mining ontology for automatically acquiring web user information needs. IEEE Transactions on Knowledge and Data Engineering, 18(4), 554–568.MathSciNetCrossRef Li, Y., & Zhong, N. (2006). Mining ontology for automatically acquiring web user information needs. IEEE Transactions on Knowledge and Data Engineering, 18(4), 554–568.MathSciNetCrossRef
15.
go back to reference Li, Y., Zhou, X., Bruza, P., Xu, Y., & Lau, R. Y. (2008). A two-stage text mining model for information filtering. In Proceeding of the 17th ACM conference on Information and knowledge management (CIKM’08), Napa Valley, California, USA, (pp. 1023–1032). Li, Y., Zhou, X., Bruza, P., Xu, Y., & Lau, R. Y. (2008). A two-stage text mining model for information filtering. In Proceeding of the 17th ACM conference on Information and knowledge management (CIKM’08), Napa Valley, California, USA, (pp. 1023–1032).
16.
go back to reference Liu, B. (2007). Web data mining: Exploring hyperlinks, contents, and usage data (Data-Centric Systems and Applications). Springer, January 2007. Liu, B. (2007). Web data mining: Exploring hyperlinks, contents, and usage data (Data-Centric Systems and Applications). Springer, January 2007.
17.
go back to reference Lv, Y., & Zhai, C. (2009). Adaptive relevance feedback in information retrieval. In Proceedings of CIKM’09, (pp. 255–264). Lv, Y., & Zhai, C. (2009). Adaptive relevance feedback in information retrieval. In Proceedings of CIKM’09, (pp. 255–264).
18.
go back to reference Metzler, D., & Croft, W. (2007). Latent concept expansion using markov random fields. In Proceedings of SIGIR’07, New York, NY, USA, ACM. Metzler, D., & Croft, W. (2007). Latent concept expansion using markov random fields. In Proceedings of SIGIR’07, New York, NY, USA, ACM.
19.
go back to reference Mostafa, J., & Lam, W. (2000). Automatic classification using supervised learning in a medical document filtering application. Information Processing and Management, 36(3), 415–444.CrossRef Mostafa, J., & Lam, W. (2000). Automatic classification using supervised learning in a medical document filtering application. Information Processing and Management, 36(3), 415–444.CrossRef
20.
go back to reference Mostafa, J., Mukhopadhyay, S., Lam, W., & Palakal, M. J. (1997). A multilevel approach to intelligent information filtering: Model, system, and evaluation. ACM Transactions on Information Systems, 15(4), 368–399.CrossRef Mostafa, J., Mukhopadhyay, S., Lam, W., & Palakal, M. J. (1997). A multilevel approach to intelligent information filtering: Model, system, and evaluation. ACM Transactions on Information Systems, 15(4), 368–399.CrossRef
21.
go back to reference Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., & Hsu, M. (2001). Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proceedings of 17th international conference on data engineering (ICDE’01), (pp. 215–224). Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., & Hsu, M. (2001). Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proceedings of 17th international conference on data engineering (ICDE’01), (pp. 215–224).
22.
go back to reference Qin, T., Zhang, X.-D., Wang, D.-S., Liu, T.-Y., Lai, W., & Li, H. (2007). Ranking with multiple hyperplanes. In Proceedings of SIGIR’07, (pp. 279–286). Qin, T., Zhang, X.-D., Wang, D.-S., Liu, T.-Y., Lai, W., & Li, H. (2007). Ranking with multiple hyperplanes. In Proceedings of SIGIR’07, (pp. 279–286).
23.
go back to reference Robertson, S. E., & Soboroff, I. (2002). The trec 2002 filtering track report. In Proceedings of TREC’02. Robertson, S. E., & Soboroff, I. (2002). The trec 2002 filtering track report. In Proceedings of TREC’02.
24.
go back to reference Robertson, S. E., Walker, S., & Hancock-Beaulieu, M. (1998). Okapi at trec-7: Automatic ad hoc, filtering, vlc and interactive. In Proceedings of TREC’98, (pp. 199–210). Robertson, S. E., Walker, S., & Hancock-Beaulieu, M. (1998). Okapi at trec-7: Automatic ad hoc, filtering, vlc and interactive. In Proceedings of TREC’98, (pp. 199–210).
25.
go back to reference Robertson, S. E., Zaragoza, H., & Taylor, M. J. (2004). Simple bm25 extension to multiple weighted fields. In Proceedings of CIKM’04, (pp. 42–49). Robertson, S. E., Zaragoza, H., & Taylor, M. J. (2004). Simple bm25 extension to multiple weighted fields. In Proceedings of CIKM’04, (pp. 42–49).
26.
go back to reference Rocchio, J. (1971). Relevance feedback in information retrieval, volume In the SMART retrieval system: Experiments in automatic document processing. Prentice Hall. Rocchio, J. (1971). Relevance feedback in information retrieval, volume In the SMART retrieval system: Experiments in automatic document processing. Prentice Hall.
27.
go back to reference Scott, S., & Matwin, S. (1999). Feature engineering for text classification. In Proceedings of 16th international conference on machine learning, 1999. Scott, Sam and Matwin, Stan, (pp. 379–388). Scott, S., & Matwin, S. (1999). Feature engineering for text classification. In Proceedings of 16th international conference on machine learning, 1999. Scott, Sam and Matwin, Stan, (pp. 379–388).
28.
go back to reference Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1–47.CrossRef Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1–47.CrossRef
29.
go back to reference Shen, X., Tan, B., & Zhai, C. (2005). Context-sentitive information retrieval using implicit feedback. In Proceedings of SIGIR’05, (pp. 43–50). Shen, X., Tan, B., & Zhai, C. (2005). Context-sentitive information retrieval using implicit feedback. In Proceedings of SIGIR’05, (pp. 43–50).
30.
go back to reference Soboroff, I. & Robertson, S. (2003). Building a filtering test collection for trec 2002. In Proceedings of SIGIR’03, (pp. 243–250). Soboroff, I. & Robertson, S. (2003). Building a filtering test collection for trec 2002. In Proceedings of SIGIR’03, (pp. 243–250).
31.
go back to reference Song, F., & Croft, W. (1999). A general language model for information retrieval. In Proceedings of CIKM’99, (pp. 316–321). Song, F., & Croft, W. (1999). A general language model for information retrieval. In Proceedings of CIKM’99, (pp. 316–321).
32.
go back to reference Wu, S. T., Li, Y., Xu, Y., Pham, B., & Chen, P. (2004). Automatic pattern-taxonomy extraction for web mining. In Proceedings of 2004 IEEE/WIC/ACM international conference on web Intelligence, pp 242–248, China. Wu, S. T., Li, Y., Xu, Y., Pham, B., & Chen, P. (2004). Automatic pattern-taxonomy extraction for web mining. In Proceedings of 2004 IEEE/WIC/ACM international conference on web Intelligence, pp 242–248, China.
33.
go back to reference Turmo, J., Ageno, A., & Catal, N. (2006). Adaptive information extraction. ACM Computing Surveys, 38(2): (Article No. 4). Turmo, J., Ageno, A., & Catal, N. (2006). Adaptive information extraction. ACM Computing Surveys, 38(2): (Article No. 4).
34.
go back to reference Wang, X., Fang, H., & Zhai, C. (2007). Improve retrieval accuracy for difficult quries using negative feedback. In Proceedings of CIKM’07, (pp. 991–994, pp. 991–994) Wang, X., Fang, H., & Zhai, C. (2007). Improve retrieval accuracy for difficult quries using negative feedback. In Proceedings of CIKM’07, (pp. 991–994, pp. 991–994)
35.
go back to reference Wang, X., Fang, H., & Zhai, C. (2008). A study of methods for negative relevance feedback. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’08), (pp. 219–226), New York, NY, USA, 2008. ACM. Wang, X., Fang, H., & Zhai, C. (2008). A study of methods for negative relevance feedback. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’08), (pp. 219–226), New York, NY, USA, 2008. ACM.
36.
go back to reference Wu, S.-T. (2007). Knowledge discovery using pattern taxonomy model in text mining. PhD Thesis, Queensland University of Technology. Wu, S.-T. (2007). Knowledge discovery using pattern taxonomy model in text mining. PhD Thesis, Queensland University of Technology.
37.
go back to reference Wu, S.-T., Li, Y., & Xu, Y. (2006). Deploying approaches for pattern refinement in text mining. In Proceedings of ICDM’06, (pp. 1157–1161). Wu, S.-T., Li, Y., & Xu, Y. (2006). Deploying approaches for pattern refinement in text mining. In Proceedings of ICDM’06, (pp. 1157–1161).
38.
go back to reference Xu, J., & Croft, W. (1996). Query expansion using local and global doscument analysis. In Proceedings of SIGIR’96, New York, NY, USA. ACM, (pp. 4–11). Xu, J., & Croft, W. (1996). Query expansion using local and global doscument analysis. In Proceedings of SIGIR’96, New York, NY, USA. ACM, (pp. 4–11).
39.
go back to reference Xu, J., & Croft, W. (2000). Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems, 18(1), 79–112.CrossRef Xu, J., & Croft, W. (2000). Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems, 18(1), 79–112.CrossRef
40.
go back to reference Xu, Y., & Li, Y. (2007). Generating concise association rules. In Proceedings of CIKM’07, (pp. 781–790). Xu, Y., & Li, Y. (2007). Generating concise association rules. In Proceedings of CIKM’07, (pp. 781–790).
41.
go back to reference Yan, X., Cheng, H., Han, J., & Xin, D. (2005). Summarizing itemset patterns: A profile-based approach. In Proceedings of 11th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’05), (pp. 314–323). Yan, X., Cheng, H., Han, J., & Xin, D. (2005). Summarizing itemset patterns: A profile-based approach. In Proceedings of 11th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’05), (pp. 314–323).
42.
go back to reference Yang, Y., Lad, A., Lao, N., Harpale, A., Kisiel, B., & Rogati, M. (2007). Utility-based information distillation over temporally sequenced documents. In Proceedings of SIGIR’07, (pp. 31–38). Yang, Y., Lad, A., Lao, N., Harpale, A., Kisiel, B., & Rogati, M. (2007). Utility-based information distillation over temporally sequenced documents. In Proceedings of SIGIR’07, (pp. 31–38).
43.
go back to reference Yang, Y., & Liu, X. (1999). A re-examination of text categorization methods. In Proceedings of SIGIR’99, (pp. 42–49). Yang, Y., & Liu, X. (1999). A re-examination of text categorization methods. In Proceedings of SIGIR’99, (pp. 42–49).
44.
go back to reference Yang, Y., Yoo, S., Zhang, J., & Kisiel, B. (2005). Robustness of adaptive filtering methods in a cross-benchmark evaluation. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR05), (pp. 98–105). Yang, Y., Yoo, S., Zhang, J., & Kisiel, B. (2005). Robustness of adaptive filtering methods in a cross-benchmark evaluation. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR05), (pp. 98–105).
45.
go back to reference Zhai, C., & Lafferty, J. (2001). Model-based feedback in language modeling approach to information retrieval. In Proceedings of CIKM’01, (pp. 403–410). Zhai, C., & Lafferty, J. (2001). Model-based feedback in language modeling approach to information retrieval. In Proceedings of CIKM’01, (pp. 403–410).
46.
go back to reference Zhang, Y. (2004). Using bayesian priors to combine classifiers for adaptive filtering. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’04), (pp. 345–352). Zhang, Y. (2004). Using bayesian priors to combine classifiers for adaptive filtering. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’04), (pp. 345–352).
47.
go back to reference Zhang, Y., & Callan, J. (2005). Combining multiple forms of evidence while filtering. In Proceedings of the conference on human language technology and empirical methods in natural language processing (HLT’05), Morristown, NJ, USA, (pp. 587–595). Zhang, Y., & Callan, J. (2005). Combining multiple forms of evidence while filtering. In Proceedings of the conference on human language technology and empirical methods in natural language processing (HLT’05), Morristown, NJ, USA, (pp. 587–595).
Metadata
Title
A pattern mining approach for information filtering systems
Authors
Yuefeng Li
Abdulmohsen Algarni
Yue Xu
Publication date
01-06-2011
Publisher
Springer Netherlands
Published in
Discover Computing / Issue 3/2011
Print ISSN: 2948-2984
Electronic ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-010-9154-4

Other articles of this Issue 3/2011

Discover Computing 3/2011 Go to the issue

Premium Partner