ABSTRACT
A large and growing number of web pages display contextual advertising based on keywords automatically extracted from the text of the page, and this is a substantial source of revenue supporting the web today. Despite the importance of this area, little formal, published research exists. We describe a system that learns how to extract keywords from web pages for advertisement targeting. The system uses a number of features, such as term frequency of each potential keyword, inverse document frequency, presence in meta-data, and how often the term occurs in search query logs. The system is trained with a set of example pages that have been hand-labeled with "relevant" keywords. Based on this training, it can then extract new keywords from previously unseen pages. Accuracy is substantially better than several baseline systems.
- L. Breiman. Bagging predictors. Machine Learning, 24(2):123--140, 1996. Google ScholarCross Ref
- M. Califf and R. Mooney. Bottom-up relational learning of pattern matching rules for information extraction. JMLR, 4:177--210, 2003. Google ScholarDigital Library
- X. Carreras, L Màrquez, and J. Castro. Filtering-ranking perceptron learning for partial parsing. Machine Learning, 60(1--3):41--71, 2005. Google ScholarDigital Library
- S. F. Chen and R. Rosenfeld. A gaussian prior for smoothing maximum entropy models. Technical Report CMU-CS-99-108, CMU, 1999.Google ScholarCross Ref
- H. Chieu and H. Ng. A maximum entropy approach to information extraction from semi-structure and free text. In Proc. of AAAI-02, pages 786--791, 2002. Google ScholarDigital Library
- Y. Even-Zohar and D. Roth. A sequential model for multi class classification. In EMNLP-01, 2001.Google Scholar
- E. Frank, G. W. Paynter, I. H. Witten, C. Gutwin, and C. G. Nevill-Manning. Domain-specific keyphrase extraction. In Proc. of IJCAI-99, pages 668--673, 1999. Google ScholarDigital Library
- D. Freitag. Machine learning for information extraction in informal domains. Machine Learning, 39(2/3):169--202, 2000. Google ScholarDigital Library
- J. Goodman. Sequential conditional generalized iterative scaling. In ACL '02, 2002. Google ScholarDigital Library
- J. Goodman and V. R. Carvalho. Implicit queries for email. In CEAS-05, 2005.Google Scholar
- M. Henzinger, B. Chang, B. Milch, and S. Brin. Query-free news search. In Proceedings of the 12th World Wide Web Conference, pages 1--10, 2003. Google ScholarDigital Library
- A. Hulth. Improved automatic keyword extraction given more linguistic knowledge. In Proc. of EMNLP-03, pages 216--223, 2003. Google ScholarDigital Library
- D. Kelleher and S. Luz. Automatic hypertext keyphrase detection. In IJCAI-05, 2005. Google ScholarDigital Library
- T. Mitchell. Tutorial on machine learning over natural language documents, 1997. Available from tt http://www.cs.cmu.edu/{0}~tom/{0}text-learning.psGoogle Scholar
- V. Punyakanok and D. Roth. The use of classifiers in sequential inference. In NIPS-00, 2001.Google Scholar
- J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann, San Mateo, CA, 1993. Google ScholarDigital Library
- L. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), February 1989.Google ScholarCross Ref
- B. Ribeiro-Neto, M. Cristo, P. B. Golgher, and E. S. de Moura. Impedance coupling in content-targeted advertising. In SIGIR-05, pages 496--503, 2005. Google ScholarDigital Library
- D. Roth and W. Yih. Relational learning via propositional algorithms: An information extraction case study. In IJCAI-01, pages 1257--1263, 2001. Google ScholarDigital Library
- C. Sutton and A. McCallum. Composition of conditional random fields for transfer learning. In Proceedings of HLT/EMLNLP-05, 2005. Google ScholarDigital Library
- E. F. Tjong Kim Sang. Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition. In CoNLL-02, 2002. Google ScholarDigital Library
- P. D. Turney. Learning algorithms for keyphrase extraction. Information Retrieval, 2(4):303--336, 2000. Google ScholarDigital Library
- P. D. Turney. Coherent keyphrase extraction via web mining. In Proc. of IJCAI-03, pages 434--439, 2003. Google ScholarDigital Library
Index Terms
- Finding advertising keywords on web pages
Recommendations
Advertising keywords extraction from web pages
WISM'10: Proceedings of the 2010 international conference on Web information systems and miningA large and growing number of web pages display contextual advertising based on keywords automatically extracted from the text of the page, and it has been become a rapidly growing business in recent years. We describe a system that learns how to ...
Extracting advertising keywords from URL strings
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide WebExtracting advertising keywords from web-pages is important in keyword-based online advertising. Previous works have attempted to extract advertising keywords from the whole content of a web-page. However, in some scenarios, it is necessary to extract ...
Finding competitive keywords from query logs to enhance search engine advertising
A novel method is proposed to find competitive keywords for search engine advertising.The method can explore the keyword associations and their topic information hidden in query logs to identify effective keywords for advertisers.Extensive experiments ...
Comments