skip to main content
article

Learning by googling

Published:01 December 2004Publication History
Skip Abstract Section

Abstract

The goal of giving a well-defined meaning to information is currently shared by endeavors such as the Semantic Web as well as by current trends within Knowledge Management. They all depend on the large-scale formalization of knowledge and on the availability of formal metadata about information resources. However, the question how to provide the necessary formal metadata in an effective and efficient way is still not solved to a satisfactory extent. Certainly, the most effective way to provide such metadata as well as formalized knowledge is to let humans encode them directly into the system, but this is neither efficient nor feasible. Furthermore, as current social studies show, individual knowledge is often less powerful than the collective knowledge of a certain community.As a potential way out of the knowledge acquisition bottleneck, we present a novel methodology that acquires collective knowledge from the World Wide Web using the GoogleTM API. In particular, we present PANKOW, a concrete instantiation of this methodology which is evaluated in two experiments: one with the aim of classifying novel instances with regard to an existing ontology and one with the aim of learning sub-/superconcept relations.

References

  1. E. Agirre, O. Ansa, E. Hovy, and D. Martinez. Enriching very large ontologies using the WWW. In Proceedings of the ECAI Ontology Learning Workshop, 2000.]]Google ScholarGoogle Scholar
  2. K. Ahmad, M. Tariq, B. Vrusias, and C. Handy. Corpus-based thesaurus construction for image retrieval in specialist domains. In Proceedings of the 25th European Conference on Advances in Information Retrieval (ECIR), pages 502--510, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Alfonseca and S. Manandhar. Extending a lexical ontology by a combination of distributional semantics signatures. In Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2002), pages 1--7, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Banko, E. Brill, S. Dumais, and J. Lin. AskMSR: Question answering using the Worldwide Web. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP'02), 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, 284(5):34--43, 2001.]]Google ScholarGoogle ScholarCross RefCross Ref
  6. G. Bisson, C. Nedellec, and L. Canamero. Designing clustering methods for ontology building - The Mo'K workbench. In Proceedings of the ECAI Ontology Learning Workshop, pages 13--19, 2000.]]Google ScholarGoogle Scholar
  7. C. Brewster, F. Ciravegna, and Y. Wilks. Background and foreground knowledge in dynamic ontology construction. In Proceedings of the SIGIR Semantic Web Workshop, 2003.]]Google ScholarGoogle Scholar
  8. Sergey Brin. Extracting patterns and relations from the World Wide Web. In Proceedings of the WebDB Workshop at EDBT '98, pages 172--183, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. E. Califf and R. J. Mooney. Bottom-up relational learning of pattern matching rules for information extraction. Machine Learning Research, 4(2):177--210, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. A. Caraballo. Automatic construction of a hypernym-labeled noun hierarchy from text. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 120--126, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Carletta. Asessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22(2):249--254, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Charniak and M. Berland. Finding parts in very large corpora. In Proceedings of the 37th Annual Meeting of the ACL, pages 57--64, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Cimiano, S. Handschuh, and S. Staab. Towards the self-annotating web. In Proceedings of the 13th World Wide Web Conference, pages 462--471, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Cimiano, A. Hotho, and S. Staab. Comparing conceptual, divisive and agglomerative clustering for learning taxonomies from text. In Proceedings of the European Conference on Artificial Intelligence, pages 435--439, 2004.]]Google ScholarGoogle Scholar
  15. F. Ciravegna. Adaptive information extraction from text by rule induction and generalization. In Proceedings of tht 17th International Joint Conference on Artificial Intelligence (IJCAI 2001), pages 1251--1256, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. Ciravegna, A. Dingli, D. Guthrie, and Y. Wilks. Integrating Information to Bootstrap Information Extraction from Web Sites. In Proceedings of the IJCAI Workshop on Information Integration on the Web, pages 9--14, 2003.]]Google ScholarGoogle Scholar
  17. H. Cui, M.-Y. Kan, and T.-S. Chua. Unsupervised learning of soft patterns for generating definitions from online news. In Proceedings of the 13th World Wide Web Conference, pages 90--99, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. O. Etzioni, M. Cafarella, D. Downey, S. Kok, A.-M. Popescu, T. Shaked, S. Soderland, D.S. Weld, and A. Yates. Web-scale information extraction in Know-ItAll (preliminary results). In Proceedings of the 13th World Wide Web Conference, pages 100--109, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. O. Etzioni, M. Cafarella, D. Downey, A-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates. Methods for domain-independent information extraction from the web: An experimental comparison. In Proceedings of the AAAI Conference, pages 391--398, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Evans. A framework for named entity recognition in the open domain. In Proceedings of the Recent Advances in Natural Language Processing (RANLP-2003), pages 137--144, 2003.]]Google ScholarGoogle Scholar
  21. D. Faure and C. Nedellec. A corpus-based conceptual clustering method for verb frames and ontology. In P. Velardi, editor, Proceedings of the LREC Workshop on Adapting lexical and corpus resources to sublanguages and applications, pages 5--12, 1998.]]Google ScholarGoogle Scholar
  22. Dieter Fensel. Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce. Springer, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Fleischman and E. Hovy. Fine grained classification of named entities. In Proceedings of the 19th Conference on Computational Linguistics (COLING), 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. F. Freitag and N. Kushmerick. Boosted Wrapper Induction. In Proceedings of AAAI conference, pages 577--583, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. Girju and M. Moldovan. Text mining for causal relations. In Proceedings of the FLAIRS Conference, pages 360--364, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Grefenstette. The WWW as a resource for example-based MT tasks. In Proceedings of ASLIB'99 Translating and the Computer 21, 1999.]]Google ScholarGoogle Scholar
  27. U. Hahn and K. Schnattinger. Towards text knowledge engineering. In AAAI'98/IAAI'98 Proceedings of the 15th National Conference on Artificial Intelligence and the 10th Conference on Innovative Applications of Artificial Intelligence, pages 524--531, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Handschuh and S. Staab. CREAM - Creating Metadata for the Semantic Web. Computer Networks, 42:579--598, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th International Conference on Computational Linguistics, pages 539--545, 1992.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. L. Hirschman and N. Chinchor. Muc-7 named entity task definition. In Proceedings of the 7th Message Understanding Conference (MUC-7), 1997.]]Google ScholarGoogle Scholar
  31. L. M. Iwanska, N. Mata, and K. Kruger. Fully automatic acquisition of taxonomic knowledge from large corpora of texts. In L. M. Iwanksa and S. C. Shapiro, editors, Natural Language Processing and Knowledge Processing, pages 335--345. MIT/AAAI Press, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. F. Keller, M. Lapata, and O. Ourioupina. Using the web to overcome data sparseness. In Proceedings of EMNLP-02, pages 230--237, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C. T. Kwok, O. Etzioni, and Daniel S. Weld. Scaling question answering to the web. In ACM Transactions on Information Systems 2001, pages 150--161, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Maedche, V. Pekar, and S. Staab. Ontology learning part one - on discovering taxonomic relations from the web. In Web Intelligence, pages 301--322. Springer Verlag, 2002.]]Google ScholarGoogle Scholar
  35. A. Maedche and S. Staab. Measuring similarity between ontologies. In Proceedings of the European Conference on Knowledge Acquisition and Management (EKAW), pages 251--263. Springer Verlag, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. K. Markert, N. Modjeska, and M. Nissim. Using the web for nominal anaphora resolution. In EACL Workshop on the Computational Treatment of Anaphora, 2003.]]Google ScholarGoogle Scholar
  37. M. Poesio, T. Ishikawa, S. Schulte im Walde, and R. Viera. Acquiring lexical knowledge for anaphora resolution. In Proceedings of the 3rd Conference on Language Resources and Evaluation, 2002.]]Google ScholarGoogle Scholar
  38. M. Poesio and R. Vieira. A corpus-based investigation of definite description use. Computational Linguistics, 24(2):183--216, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. D. R. Radev, H. Qi, Z. Zheng, S. Blair-Goldensohn, Z. Zhang, W. Fan, and J. M. Prager. Mining the web for answers to natural language questions. In Proceedings of the Conference on Information and Knowledge Management, pages 143--150, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. P. Resnik and N. Smith. The web as a parallel corpus. Computational Lingusitics, 29(3):349--380, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Stephen Soderland. Learning information extraction rules for semi-structured and free text. Machine Learning, 34(1-3):233--272, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. Staab, C. Braun, I. Bruder, A. Düsterhöft, A. Heuer, M. Klettke, G. Neumann, B. Prager, J. Pretzel, H.-P. Schnurr, R. Studer, H. Uszkoreit, and B. Wrenger. Getess - searching the web exploiting german texts. In Proceedings of the 3rd Workshop on Cooperative Information Agents, pages 113--124. Springer Verlag, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. Surowiecki. The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations. Doubleday Books, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. P. D. Turney. Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In Proceedings of the Twelfth European Conference on Machine Learning (ECML), pages 491--502, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning by googling
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGKDD Explorations Newsletter
            ACM SIGKDD Explorations Newsletter  Volume 6, Issue 2
            December 2004
            161 pages
            ISSN:1931-0145
            EISSN:1931-0153
            DOI:10.1145/1046456
            Issue’s Table of Contents

            Copyright © 2004 Authors

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 December 2004

            Check for updates

            Qualifiers

            • article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader