skip to main content
10.1145/988672.988735acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
Article

Towards the self-annotating web

Published:17 May 2004Publication History

ABSTRACT

The success of the Semantic Web depends on the availability of ontologies as well as on the proliferation of web pages annotated with metadata conforming to these ontologies. Thus, a crucial question is where to acquire these metadata from. In this paper wepropose PANKOW (Pattern-based Annotation through Knowledge on theWeb), a method which employs an unsupervised, pattern-based approach to categorize instances with regard to an ontology. The approach is evaluated against the manual annotations of two human subjects. The approach is implemented in OntoMat, an annotation tool for the Semantic Web and shows very promising results.

References

  1. E. Agirre, O. Ansa, E. Hovy, and D. Martinez. Enriching Very Large Ontologies using the WWW. In Proceedings of the First Workshop on Ontology Learning OL'2000 Berlin, Germany, August 25, 2000, 2000. Held in conjunction with the 14th European Conference on Artificial Intelligence ECAI'2000, Berlin, Germany.Google ScholarGoogle Scholar
  2. E. Alfonseca and S. Manandhar. Extending a lexical ontology by a combination of distributional semantics signatures. In Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2002), pages 1--7, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Appelt, J. Hobbs, J. Bear, D. Israel, M. Kameyama, and M. Tyson. FASTUS: a finite state processor for information extraction from real world text. In Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI), 1993.Google ScholarGoogle Scholar
  4. Eric Brill. Some advances in transformation-based part of speech tagging. In National Conference on Artificial Intelligence, pages 722--727, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Carletta. Assessing Agreement on Classification Tasks: The Kappa Statistic. Computational Linguistics, 22(2):249--254, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Charniak and M. Berland. Finding parts in very large corpora. In Proceedings of the 37th Annual Meeting of the ACL, pages 57--64, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F. Ciravegna. Adaptive Information Extraction from Text by Rule Induction and Generalisation. In Bernhard Nebel, editor, Proceedings of the Seventeenth International Conference on Artificial Intelligence (IJCAI-01), pages 1251--1256, San Francisco, CA, August 2001. Morgan Kaufmann Publishers, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F. Ciravegna, A. Dingli, D. Guthrie, and Y. Wilks. Integrating Information to Bootstrap Information Extraction from Web Sites. In IJCAI 2003 Workshop on Information Integration on the Web, workshop in conjunction with the 18th International Joint Conference on Artificial Intelligence (IJCAI 2003), Acapulco, Mexico, August, 9-15, pages 9--14, 2003.Google ScholarGoogle Scholar
  9. Stephen Dill, Nadav Eiron, David Gibson, Daniel Gruhl, R. Guha, Anant Jhingran, Tapas Kanungo, Sridhar Rajagopalan, Andrew Tomkins, John A. Tomlin, and Jason Y. Zien. Semtag and seeker: bootstrapping the semantic web via automated semantic annotation. In Proceedings of the Twelfth International Conference on World Wide Web, pages 178--186. ACM Press, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G.W. Flake, S. Lawrence, C.L. Giles, and F.M. Coetzee. Self-organization and identification of web communities. IEEE Computer, 35(3):66 --70, March 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Fleischman and E. Hovy. Fine grained classification of named entities. In Proceedings of the Conference on Computational Linguistics, Taipei, Taiwan, August 2002, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Eric J. Glover, Kostas Tsioutsiouliklis, Steve Lawrence, David M. Pennock, and Gary W. Flake. Using web structure for classifying and describing web pages. In Proceedings of the Eleventh International Conference on World Wide Web, pages 562--569. ACM Press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Googlism, 2003. http://www.googlism.com.Google ScholarGoogle Scholar
  14. G. Grefenstette. The WWW as a resource for example-based MT tasks. In Proceedings of ASLIB'99 Translating and the Computer 21, 1999.Google ScholarGoogle Scholar
  15. U. Hahn and K. Schnattinger. Towards text knowledge engineering. In AAAI'98/IAAI'98 Proceedings of the 15th National Conference on Artificial Intelligence and the 10th Conference on Innovative Applications of Artificial Intelligence, pages 524--531, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Handschuh and S. Staab. Authoring and annotation of web pages in CREAM. In Proceedings of the 11th International World Wide Web Conference, WWW 2002, Honolulu, Hawaii, May 7-11, 2002, pages 462--473. ACM Press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Handschuh, S. Staab, and F. Ciravegna. S-CREAM - Semi-automatic CREAtion of Metadata. In Proceedings of EKAW 2002, LNCS, pages 358--372, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Handschuh, S. Staab, and A. Maedche. CREAM - Creating relational metadata with a component-based, ontology-driven annotation framework. In Proceedings of K-Cap 2001, pages 76--83. ACM Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M.A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th International Conference on Computational Linguistics, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Hirschman and N. Chinchor. Muc-7 named entity task definition. In Proceedings of the 7th Message Understanding Conference (MUC-7), 1997.Google ScholarGoogle Scholar
  21. Frank Keller, Maria Lapata, and O. Ourioupina. Using the web to overcome data sparseness. In Proceedings of EMNLP-02, pages 230--237, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Madche and S. Staab. Ontology learning for the semantic web. IEEE Intelligent Systems, 16(2):72--79, March/April 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Markert, N. Modjeska, and M. Nissim. Using the web for nominal anaphora resolution. In EACL Workshop on the Computational Treatment of Anaphora, 2003.Google ScholarGoogle Scholar
  24. M. Poesio and R. Vieira. A corpus-based investigation of definite description use. Computational Linguistics, 24(2):183--216, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Resnik and N. Smith. The web as a parallel corpus. Computational Linguistics, 29(3), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Helmut Schmid. Probabilistic part-of-speech tagging using decision trees. In Proceedings of the International Conference on New Methods in Language Processing, 1994.Google ScholarGoogle Scholar
  27. S. Staab, C. Braun, I. Bruder, A. Dusterhoft, A. Heuer, M. Klettke, G. Neumann, B. Prager, J. Pretzel, H.-P. Schnurr, R. Studer, H. Uszkoreit, and B. Wrenger. Getess - searching the web exploiting german texts. In Proceedings of the 3rd Workshop on Cooperative Information Agents. Springer Verlag, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Vargas-Vera, E. Motta, J. Domingue, M. Lanzoni, A. Stutt, and F. Ciravegna. MnM: Ontology Driven Semi-automatic and Automatic Support for Semantic Markup. In Proceedings of EKAW 2002, LNCS 2473, pages 379--391. Springer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards the self-annotating web

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            WWW '04: Proceedings of the 13th international conference on World Wide Web
            May 2004
            754 pages
            ISBN:158113844X
            DOI:10.1145/988672

            Copyright © 2004 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 17 May 2004

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate1,899of8,196submissions,23%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader