Article

Towards the self-annotating web

Authors:
Philipp Cimiano

University of Karlsruhe, Karlsruhe, Germany

University of Karlsruhe, Karlsruhe, Germany
View Profile

,
Siegfried Handschuh

University of Karlsruhe, Karlsruhe, Germany

University of Karlsruhe, Karlsruhe, Germany
View Profile

,
Steffen Staab

University of Karlsruhe, Karlsruhe, Germany and Ontoprise GmbH, Karlsruhe, Germany

University of Karlsruhe, Karlsruhe, Germany and Ontoprise GmbH, Karlsruhe, Germany
View Profile

WWW '04: Proceedings of the 13th international conference on World Wide WebMay 2004Pages 462–471https://doi.org/10.1145/988672.988735

Published:17 May 2004Publication History

WWW '04: Proceedings of the 13th international conference on World Wide Web

Pages 462–471

ABSTRACT

The success of the Semantic Web depends on the availability of ontologies as well as on the proliferation of web pages annotated with metadata conforming to these ontologies. Thus, a crucial question is where to acquire these metadata from. In this paper wepropose PANKOW (Pattern-based Annotation through Knowledge on theWeb), a method which employs an unsupervised, pattern-based approach to categorize instances with regard to an ontology. The approach is evaluated against the manual annotations of two human subjects. The approach is implemented in OntoMat, an annotation tool for the Semantic Web and shows very promising results.

References

E. Agirre, O. Ansa, E. Hovy, and D. Martinez. Enriching Very Large Ontologies using the WWW. In Proceedings of the First Workshop on Ontology Learning OL'2000 Berlin, Germany, August 25, 2000, 2000. Held in conjunction with the 14th European Conference on Artificial Intelligence ECAI'2000, Berlin, Germany.Google Scholar
E. Alfonseca and S. Manandhar. Extending a lexical ontology by a combination of distributional semantics signatures. In Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2002), pages 1--7, 2002. Google ScholarDigital Library
D. Appelt, J. Hobbs, J. Bear, D. Israel, M. Kameyama, and M. Tyson. FASTUS: a finite state processor for information extraction from real world text. In Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI), 1993.Google Scholar
Eric Brill. Some advances in transformation-based part of speech tagging. In National Conference on Artificial Intelligence, pages 722--727, 1994. Google ScholarDigital Library
J. Carletta. Assessing Agreement on Classification Tasks: The Kappa Statistic. Computational Linguistics, 22(2):249--254, 1996. Google ScholarDigital Library
E. Charniak and M. Berland. Finding parts in very large corpora. In Proceedings of the 37th Annual Meeting of the ACL, pages 57--64, 1999. Google ScholarDigital Library
F. Ciravegna. Adaptive Information Extraction from Text by Rule Induction and Generalisation. In Bernhard Nebel, editor, Proceedings of the Seventeenth International Conference on Artificial Intelligence (IJCAI-01), pages 1251--1256, San Francisco, CA, August 2001. Morgan Kaufmann Publishers, Inc. Google ScholarDigital Library
F. Ciravegna, A. Dingli, D. Guthrie, and Y. Wilks. Integrating Information to Bootstrap Information Extraction from Web Sites. In IJCAI 2003 Workshop on Information Integration on the Web, workshop in conjunction with the 18th International Joint Conference on Artificial Intelligence (IJCAI 2003), Acapulco, Mexico, August, 9-15, pages 9--14, 2003.Google Scholar
Stephen Dill, Nadav Eiron, David Gibson, Daniel Gruhl, R. Guha, Anant Jhingran, Tapas Kanungo, Sridhar Rajagopalan, Andrew Tomkins, John A. Tomlin, and Jason Y. Zien. Semtag and seeker: bootstrapping the semantic web via automated semantic annotation. In Proceedings of the Twelfth International Conference on World Wide Web, pages 178--186. ACM Press, 2003. Google ScholarDigital Library
G.W. Flake, S. Lawrence, C.L. Giles, and F.M. Coetzee. Self-organization and identification of web communities. IEEE Computer, 35(3):66 --70, March 2002. Google ScholarDigital Library
M. Fleischman and E. Hovy. Fine grained classification of named entities. In Proceedings of the Conference on Computational Linguistics, Taipei, Taiwan, August 2002, 2002. Google ScholarDigital Library
Eric J. Glover, Kostas Tsioutsiouliklis, Steve Lawrence, David M. Pennock, and Gary W. Flake. Using web structure for classifying and describing web pages. In Proceedings of the Eleventh International Conference on World Wide Web, pages 562--569. ACM Press, 2002. Google ScholarDigital Library
Googlism, 2003. http://www.googlism.com.Google Scholar
G. Grefenstette. The WWW as a resource for example-based MT tasks. In Proceedings of ASLIB'99 Translating and the Computer 21, 1999.Google Scholar
U. Hahn and K. Schnattinger. Towards text knowledge engineering. In AAAI'98/IAAI'98 Proceedings of the 15th National Conference on Artificial Intelligence and the 10th Conference on Innovative Applications of Artificial Intelligence, pages 524--531, 1998. Google ScholarDigital Library
S. Handschuh and S. Staab. Authoring and annotation of web pages in CREAM. In Proceedings of the 11th International World Wide Web Conference, WWW 2002, Honolulu, Hawaii, May 7-11, 2002, pages 462--473. ACM Press, 2002. Google ScholarDigital Library
S. Handschuh, S. Staab, and F. Ciravegna. S-CREAM - Semi-automatic CREAtion of Metadata. In Proceedings of EKAW 2002, LNCS, pages 358--372, 2002. Google ScholarDigital Library
S. Handschuh, S. Staab, and A. Maedche. CREAM - Creating relational metadata with a component-based, ontology-driven annotation framework. In Proceedings of K-Cap 2001, pages 76--83. ACM Press, 2001. Google ScholarDigital Library
M.A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th International Conference on Computational Linguistics, 1992. Google ScholarDigital Library
L. Hirschman and N. Chinchor. Muc-7 named entity task definition. In Proceedings of the 7th Message Understanding Conference (MUC-7), 1997.Google Scholar
Frank Keller, Maria Lapata, and O. Ourioupina. Using the web to overcome data sparseness. In Proceedings of EMNLP-02, pages 230--237, 2002. Google ScholarDigital Library
A. Madche and S. Staab. Ontology learning for the semantic web. IEEE Intelligent Systems, 16(2):72--79, March/April 2001. Google ScholarDigital Library
K. Markert, N. Modjeska, and M. Nissim. Using the web for nominal anaphora resolution. In EACL Workshop on the Computational Treatment of Anaphora, 2003.Google Scholar
M. Poesio and R. Vieira. A corpus-based investigation of definite description use. Computational Linguistics, 24(2):183--216, 1998. Google ScholarDigital Library
P. Resnik and N. Smith. The web as a parallel corpus. Computational Linguistics, 29(3), 2003. Google ScholarDigital Library
Helmut Schmid. Probabilistic part-of-speech tagging using decision trees. In Proceedings of the International Conference on New Methods in Language Processing, 1994.Google Scholar
S. Staab, C. Braun, I. Bruder, A. Dusterhoft, A. Heuer, M. Klettke, G. Neumann, B. Prager, J. Pretzel, H.-P. Schnurr, R. Studer, H. Uszkoreit, and B. Wrenger. Getess - searching the web exploiting german texts. In Proceedings of the 3rd Workshop on Cooperative Information Agents. Springer Verlag, 1999. Google ScholarDigital Library
M. Vargas-Vera, E. Motta, J. Domingue, M. Lanzoni, A. Stutt, and F. Ciravegna. MnM: Ontology Driven Semi-automatic and Automatic Support for Semantic Markup. In Proceedings of EKAW 2002, LNCS 2473, pages 379--391. Springer, 2002. Google ScholarDigital Library

Index Terms

Towards the self-annotating web

Recommendations

Gimme' the context: context-driven automatic semantic annotation with C-PANKOW
WWW '05: Proceedings of the 14th international conference on World Wide Web

Without the proliferation of formal semantic annotations, the Semantic Web is certainly doomed to failure. In earlier work we presented a new paradigm to avoid this: the 'Self Annotating Web', in which globally available knowledge is used to annotate ...
Read More
VN-KIM IE: Automatic Extraction of Vietnamese Named-Entities on the Web
Abstract
The most fascinating advantage of the semantic web would be its capability of understanding and processing the contents of web pages automatically. Basically, the semantic web realization involves two main tasks: (1) Representation and management ...
Read More
Annotating Web Pages for Semantic Web
CSIE '09: Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 03

Adding semantic annotation for web page is the foundation on constructing Semantic Web. Lexical patterns based annotation methods are adopted by most of the semantic annotation systems. The structures and visual features of web pages imply valuable ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '04: Proceedings of the 13th international conference on World Wide Web
May 2004
754 pages
ISBN:158113844X
DOI:10.1145/988672
Conference Chairs:
Stuart Feldman
IBM Research
,
Mike Uretsky
New York University
,
Program Chairs:
Marc Najork
Microsoft Research
,
Craig Wills
Worcester Polytechnic Institute
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 May 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
information extraction
metadata
semantic annotation
semantic web
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 218
  Total Citations
  View Citations
- 2,033
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards the self-annotating web

WWW '04: Proceedings of the 13th international conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Gimme' the context: context-driven automatic semantic annotation with C-PANKOW

VN-KIM IE: Automatic Extraction of Vietnamese Named-Entities on the Web

Annotating Web Pages for Semantic Web