ABSTRACT
The success of the Semantic Web depends on the availability of ontologies as well as on the proliferation of web pages annotated with metadata conforming to these ontologies. Thus, a crucial question is where to acquire these metadata from. In this paper wepropose PANKOW (Pattern-based Annotation through Knowledge on theWeb), a method which employs an unsupervised, pattern-based approach to categorize instances with regard to an ontology. The approach is evaluated against the manual annotations of two human subjects. The approach is implemented in OntoMat, an annotation tool for the Semantic Web and shows very promising results.
- E. Agirre, O. Ansa, E. Hovy, and D. Martinez. Enriching Very Large Ontologies using the WWW. In Proceedings of the First Workshop on Ontology Learning OL'2000 Berlin, Germany, August 25, 2000, 2000. Held in conjunction with the 14th European Conference on Artificial Intelligence ECAI'2000, Berlin, Germany.Google Scholar
- E. Alfonseca and S. Manandhar. Extending a lexical ontology by a combination of distributional semantics signatures. In Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2002), pages 1--7, 2002. Google ScholarDigital Library
- D. Appelt, J. Hobbs, J. Bear, D. Israel, M. Kameyama, and M. Tyson. FASTUS: a finite state processor for information extraction from real world text. In Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI), 1993.Google Scholar
- Eric Brill. Some advances in transformation-based part of speech tagging. In National Conference on Artificial Intelligence, pages 722--727, 1994. Google ScholarDigital Library
- J. Carletta. Assessing Agreement on Classification Tasks: The Kappa Statistic. Computational Linguistics, 22(2):249--254, 1996. Google ScholarDigital Library
- E. Charniak and M. Berland. Finding parts in very large corpora. In Proceedings of the 37th Annual Meeting of the ACL, pages 57--64, 1999. Google ScholarDigital Library
- F. Ciravegna. Adaptive Information Extraction from Text by Rule Induction and Generalisation. In Bernhard Nebel, editor, Proceedings of the Seventeenth International Conference on Artificial Intelligence (IJCAI-01), pages 1251--1256, San Francisco, CA, August 2001. Morgan Kaufmann Publishers, Inc. Google ScholarDigital Library
- F. Ciravegna, A. Dingli, D. Guthrie, and Y. Wilks. Integrating Information to Bootstrap Information Extraction from Web Sites. In IJCAI 2003 Workshop on Information Integration on the Web, workshop in conjunction with the 18th International Joint Conference on Artificial Intelligence (IJCAI 2003), Acapulco, Mexico, August, 9-15, pages 9--14, 2003.Google Scholar
- Stephen Dill, Nadav Eiron, David Gibson, Daniel Gruhl, R. Guha, Anant Jhingran, Tapas Kanungo, Sridhar Rajagopalan, Andrew Tomkins, John A. Tomlin, and Jason Y. Zien. Semtag and seeker: bootstrapping the semantic web via automated semantic annotation. In Proceedings of the Twelfth International Conference on World Wide Web, pages 178--186. ACM Press, 2003. Google ScholarDigital Library
- G.W. Flake, S. Lawrence, C.L. Giles, and F.M. Coetzee. Self-organization and identification of web communities. IEEE Computer, 35(3):66 --70, March 2002. Google ScholarDigital Library
- M. Fleischman and E. Hovy. Fine grained classification of named entities. In Proceedings of the Conference on Computational Linguistics, Taipei, Taiwan, August 2002, 2002. Google ScholarDigital Library
- Eric J. Glover, Kostas Tsioutsiouliklis, Steve Lawrence, David M. Pennock, and Gary W. Flake. Using web structure for classifying and describing web pages. In Proceedings of the Eleventh International Conference on World Wide Web, pages 562--569. ACM Press, 2002. Google ScholarDigital Library
- Googlism, 2003. http://www.googlism.com.Google Scholar
- G. Grefenstette. The WWW as a resource for example-based MT tasks. In Proceedings of ASLIB'99 Translating and the Computer 21, 1999.Google Scholar
- U. Hahn and K. Schnattinger. Towards text knowledge engineering. In AAAI'98/IAAI'98 Proceedings of the 15th National Conference on Artificial Intelligence and the 10th Conference on Innovative Applications of Artificial Intelligence, pages 524--531, 1998. Google ScholarDigital Library
- S. Handschuh and S. Staab. Authoring and annotation of web pages in CREAM. In Proceedings of the 11th International World Wide Web Conference, WWW 2002, Honolulu, Hawaii, May 7-11, 2002, pages 462--473. ACM Press, 2002. Google ScholarDigital Library
- S. Handschuh, S. Staab, and F. Ciravegna. S-CREAM - Semi-automatic CREAtion of Metadata. In Proceedings of EKAW 2002, LNCS, pages 358--372, 2002. Google ScholarDigital Library
- S. Handschuh, S. Staab, and A. Maedche. CREAM - Creating relational metadata with a component-based, ontology-driven annotation framework. In Proceedings of K-Cap 2001, pages 76--83. ACM Press, 2001. Google ScholarDigital Library
- M.A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th International Conference on Computational Linguistics, 1992. Google ScholarDigital Library
- L. Hirschman and N. Chinchor. Muc-7 named entity task definition. In Proceedings of the 7th Message Understanding Conference (MUC-7), 1997.Google Scholar
- Frank Keller, Maria Lapata, and O. Ourioupina. Using the web to overcome data sparseness. In Proceedings of EMNLP-02, pages 230--237, 2002. Google ScholarDigital Library
- A. Madche and S. Staab. Ontology learning for the semantic web. IEEE Intelligent Systems, 16(2):72--79, March/April 2001. Google ScholarDigital Library
- K. Markert, N. Modjeska, and M. Nissim. Using the web for nominal anaphora resolution. In EACL Workshop on the Computational Treatment of Anaphora, 2003.Google Scholar
- M. Poesio and R. Vieira. A corpus-based investigation of definite description use. Computational Linguistics, 24(2):183--216, 1998. Google ScholarDigital Library
- P. Resnik and N. Smith. The web as a parallel corpus. Computational Linguistics, 29(3), 2003. Google ScholarDigital Library
- Helmut Schmid. Probabilistic part-of-speech tagging using decision trees. In Proceedings of the International Conference on New Methods in Language Processing, 1994.Google Scholar
- S. Staab, C. Braun, I. Bruder, A. Dusterhoft, A. Heuer, M. Klettke, G. Neumann, B. Prager, J. Pretzel, H.-P. Schnurr, R. Studer, H. Uszkoreit, and B. Wrenger. Getess - searching the web exploiting german texts. In Proceedings of the 3rd Workshop on Cooperative Information Agents. Springer Verlag, 1999. Google ScholarDigital Library
- M. Vargas-Vera, E. Motta, J. Domingue, M. Lanzoni, A. Stutt, and F. Ciravegna. MnM: Ontology Driven Semi-automatic and Automatic Support for Semantic Markup. In Proceedings of EKAW 2002, LNCS 2473, pages 379--391. Springer, 2002. Google ScholarDigital Library
Index Terms
- Towards the self-annotating web
Recommendations
Gimme' the context: context-driven automatic semantic annotation with C-PANKOW
WWW '05: Proceedings of the 14th international conference on World Wide WebWithout the proliferation of formal semantic annotations, the Semantic Web is certainly doomed to failure. In earlier work we presented a new paradigm to avoid this: the 'Self Annotating Web', in which globally available knowledge is used to annotate ...
VN-KIM IE: Automatic Extraction of Vietnamese Named-Entities on the Web
AbstractThe most fascinating advantage of the semantic web would be its capability of understanding and processing the contents of web pages automatically. Basically, the semantic web realization involves two main tasks: (1) Representation and management ...
Annotating Web Pages for Semantic Web
CSIE '09: Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 03Adding semantic annotation for web page is the foundation on constructing Semantic Web. Lexical patterns based annotation methods are adopted by most of the semantic annotation systems. The structures and visual features of web pages imply valuable ...
Comments