Article

Finding advertising keywords on web pages

Authors:
Wen-tau Yih

Microsoft Research, Redmond, WA

Microsoft Research, Redmond, WA
View Profile

,
Joshua Goodman

Microsoft Research, Redmond, WA

Microsoft Research, Redmond, WA
View Profile

,
Vitor R. Carvalho

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

WWW '06: Proceedings of the 15th international conference on World Wide WebMay 2006Pages 213–222https://doi.org/10.1145/1135777.1135813

Published:23 May 2006Publication History

WWW '06: Proceedings of the 15th international conference on World Wide Web

Pages 213–222

ABSTRACT

A large and growing number of web pages display contextual advertising based on keywords automatically extracted from the text of the page, and this is a substantial source of revenue supporting the web today. Despite the importance of this area, little formal, published research exists. We describe a system that learns how to extract keywords from web pages for advertisement targeting. The system uses a number of features, such as term frequency of each potential keyword, inverse document frequency, presence in meta-data, and how often the term occurs in search query logs. The system is trained with a set of example pages that have been hand-labeled with "relevant" keywords. Based on this training, it can then extract new keywords from previously unseen pages. Accuracy is substantially better than several baseline systems.

References

L. Breiman. Bagging predictors. Machine Learning, 24(2):123--140, 1996. Google ScholarCross Ref
M. Califf and R. Mooney. Bottom-up relational learning of pattern matching rules for information extraction. JMLR, 4:177--210, 2003. Google ScholarDigital Library
X. Carreras, L Màrquez, and J. Castro. Filtering-ranking perceptron learning for partial parsing. Machine Learning, 60(1--3):41--71, 2005. Google ScholarDigital Library
S. F. Chen and R. Rosenfeld. A gaussian prior for smoothing maximum entropy models. Technical Report CMU-CS-99-108, CMU, 1999.Google ScholarCross Ref
H. Chieu and H. Ng. A maximum entropy approach to information extraction from semi-structure and free text. In Proc. of AAAI-02, pages 786--791, 2002. Google ScholarDigital Library
Y. Even-Zohar and D. Roth. A sequential model for multi class classification. In EMNLP-01, 2001.Google Scholar
E. Frank, G. W. Paynter, I. H. Witten, C. Gutwin, and C. G. Nevill-Manning. Domain-specific keyphrase extraction. In Proc. of IJCAI-99, pages 668--673, 1999. Google ScholarDigital Library
D. Freitag. Machine learning for information extraction in informal domains. Machine Learning, 39(2/3):169--202, 2000. Google ScholarDigital Library
J. Goodman. Sequential conditional generalized iterative scaling. In ACL '02, 2002. Google ScholarDigital Library
J. Goodman and V. R. Carvalho. Implicit queries for email. In CEAS-05, 2005.Google Scholar
M. Henzinger, B. Chang, B. Milch, and S. Brin. Query-free news search. In Proceedings of the 12th World Wide Web Conference, pages 1--10, 2003. Google ScholarDigital Library
A. Hulth. Improved automatic keyword extraction given more linguistic knowledge. In Proc. of EMNLP-03, pages 216--223, 2003. Google ScholarDigital Library
D. Kelleher and S. Luz. Automatic hypertext keyphrase detection. In IJCAI-05, 2005. Google ScholarDigital Library
T. Mitchell. Tutorial on machine learning over natural language documents, 1997. Available from tt http://www.cs.cmu.edu/{0}~tom/{0}text-learning.psGoogle Scholar
V. Punyakanok and D. Roth. The use of classifiers in sequential inference. In NIPS-00, 2001.Google Scholar
J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann, San Mateo, CA, 1993. Google ScholarDigital Library
L. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), February 1989.Google ScholarCross Ref
B. Ribeiro-Neto, M. Cristo, P. B. Golgher, and E. S. de Moura. Impedance coupling in content-targeted advertising. In SIGIR-05, pages 496--503, 2005. Google ScholarDigital Library
D. Roth and W. Yih. Relational learning via propositional algorithms: An information extraction case study. In IJCAI-01, pages 1257--1263, 2001. Google ScholarDigital Library
C. Sutton and A. McCallum. Composition of conditional random fields for transfer learning. In Proceedings of HLT/EMLNLP-05, 2005. Google ScholarDigital Library
E. F. Tjong Kim Sang. Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition. In CoNLL-02, 2002. Google ScholarDigital Library
P. D. Turney. Learning algorithms for keyphrase extraction. Information Retrieval, 2(4):303--336, 2000. Google ScholarDigital Library
P. D. Turney. Coherent keyphrase extraction via web mining. In Proc. of IJCAI-03, pages 434--439, 2003. Google ScholarDigital Library

Index Terms

Finding advertising keywords on web pages
1. Information systems
  1. Information retrieval
    1. Document representation
  2. Information systems applications
2. Theory of computation
  1. Semantics and reasoning
    1. Program reasoning
      1. Abstraction

Recommendations

Advertising keywords extraction from web pages
WISM'10: Proceedings of the 2010 international conference on Web information systems and mining

A large and growing number of web pages display contextual advertising based on keywords automatically extracted from the text of the page, and it has been become a rapidly growing business in recent years. We describe a system that learns how to ...
Read More
Extracting advertising keywords from URL strings
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web

Extracting advertising keywords from web-pages is important in keyword-based online advertising. Previous works have attempted to extract advertising keywords from the whole content of a web-page. However, in some scenarios, it is necessary to extract ...
Read More
Finding competitive keywords from query logs to enhance search engine advertising

A novel method is proposed to find competitive keywords for search engine advertising.The method can explore the keyword associations and their topic information hidden in query logs to identify effective keywords for advertisers.Extensive experiments ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '06: Proceedings of the 15th international conference on World Wide Web
May 2006
1102 pages
ISBN:1595933239
DOI:10.1145/1135777
General Chairs:
Leslie Carr
University of Southampton
,
David De Roure
University of Southampton
,
Arun Iyengar
IBM Research
,
Program Chairs:
Carole Goble
University of Manchester, UK
,
Mike Dahlin
University of Texas at Austin
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 May 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
advertising
information extraction
keyword extraction
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 229
  Total Citations
  View Citations
- 2,495
  Total Downloads
- Downloads (Last 12 months)45
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Finding advertising keywords on web pages

WWW '06: Proceedings of the 15th international conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Advertising keywords extraction from web pages

Extracting advertising keywords from URL strings

Finding competitive keywords from query logs to enhance search engine advertising