Abstract
The goal of giving a well-defined meaning to information is currently shared by endeavors such as the Semantic Web as well as by current trends within Knowledge Management. They all depend on the large-scale formalization of knowledge and on the availability of formal metadata about information resources. However, the question how to provide the necessary formal metadata in an effective and efficient way is still not solved to a satisfactory extent. Certainly, the most effective way to provide such metadata as well as formalized knowledge is to let humans encode them directly into the system, but this is neither efficient nor feasible. Furthermore, as current social studies show, individual knowledge is often less powerful than the collective knowledge of a certain community.As a potential way out of the knowledge acquisition bottleneck, we present a novel methodology that acquires collective knowledge from the World Wide Web using the GoogleTM API. In particular, we present PANKOW, a concrete instantiation of this methodology which is evaluated in two experiments: one with the aim of classifying novel instances with regard to an existing ontology and one with the aim of learning sub-/superconcept relations.
- E. Agirre, O. Ansa, E. Hovy, and D. Martinez. Enriching very large ontologies using the WWW. In Proceedings of the ECAI Ontology Learning Workshop, 2000.]]Google Scholar
- K. Ahmad, M. Tariq, B. Vrusias, and C. Handy. Corpus-based thesaurus construction for image retrieval in specialist domains. In Proceedings of the 25th European Conference on Advances in Information Retrieval (ECIR), pages 502--510, 2003.]] Google ScholarDigital Library
- E. Alfonseca and S. Manandhar. Extending a lexical ontology by a combination of distributional semantics signatures. In Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2002), pages 1--7, 2002.]] Google ScholarDigital Library
- M. Banko, E. Brill, S. Dumais, and J. Lin. AskMSR: Question answering using the Worldwide Web. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP'02), 2002.]] Google ScholarDigital Library
- T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, 284(5):34--43, 2001.]]Google ScholarCross Ref
- G. Bisson, C. Nedellec, and L. Canamero. Designing clustering methods for ontology building - The Mo'K workbench. In Proceedings of the ECAI Ontology Learning Workshop, pages 13--19, 2000.]]Google Scholar
- C. Brewster, F. Ciravegna, and Y. Wilks. Background and foreground knowledge in dynamic ontology construction. In Proceedings of the SIGIR Semantic Web Workshop, 2003.]]Google Scholar
- Sergey Brin. Extracting patterns and relations from the World Wide Web. In Proceedings of the WebDB Workshop at EDBT '98, pages 172--183, 1998.]] Google ScholarDigital Library
- M. E. Califf and R. J. Mooney. Bottom-up relational learning of pattern matching rules for information extraction. Machine Learning Research, 4(2):177--210, 2004.]] Google ScholarDigital Library
- S. A. Caraballo. Automatic construction of a hypernym-labeled noun hierarchy from text. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 120--126, 1999.]] Google ScholarDigital Library
- J. Carletta. Asessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22(2):249--254, 1996.]] Google ScholarDigital Library
- E. Charniak and M. Berland. Finding parts in very large corpora. In Proceedings of the 37th Annual Meeting of the ACL, pages 57--64, 1999.]] Google ScholarDigital Library
- P. Cimiano, S. Handschuh, and S. Staab. Towards the self-annotating web. In Proceedings of the 13th World Wide Web Conference, pages 462--471, 2004.]] Google ScholarDigital Library
- P. Cimiano, A. Hotho, and S. Staab. Comparing conceptual, divisive and agglomerative clustering for learning taxonomies from text. In Proceedings of the European Conference on Artificial Intelligence, pages 435--439, 2004.]]Google Scholar
- F. Ciravegna. Adaptive information extraction from text by rule induction and generalization. In Proceedings of tht 17th International Joint Conference on Artificial Intelligence (IJCAI 2001), pages 1251--1256, 2001.]] Google ScholarDigital Library
- F. Ciravegna, A. Dingli, D. Guthrie, and Y. Wilks. Integrating Information to Bootstrap Information Extraction from Web Sites. In Proceedings of the IJCAI Workshop on Information Integration on the Web, pages 9--14, 2003.]]Google Scholar
- H. Cui, M.-Y. Kan, and T.-S. Chua. Unsupervised learning of soft patterns for generating definitions from online news. In Proceedings of the 13th World Wide Web Conference, pages 90--99, 2004.]] Google ScholarDigital Library
- O. Etzioni, M. Cafarella, D. Downey, S. Kok, A.-M. Popescu, T. Shaked, S. Soderland, D.S. Weld, and A. Yates. Web-scale information extraction in Know-ItAll (preliminary results). In Proceedings of the 13th World Wide Web Conference, pages 100--109, 2004.]] Google ScholarDigital Library
- O. Etzioni, M. Cafarella, D. Downey, A-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates. Methods for domain-independent information extraction from the web: An experimental comparison. In Proceedings of the AAAI Conference, pages 391--398, 2004.]] Google ScholarDigital Library
- R. Evans. A framework for named entity recognition in the open domain. In Proceedings of the Recent Advances in Natural Language Processing (RANLP-2003), pages 137--144, 2003.]]Google Scholar
- D. Faure and C. Nedellec. A corpus-based conceptual clustering method for verb frames and ontology. In P. Velardi, editor, Proceedings of the LREC Workshop on Adapting lexical and corpus resources to sublanguages and applications, pages 5--12, 1998.]]Google Scholar
- Dieter Fensel. Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce. Springer, 2003.]] Google ScholarDigital Library
- M. Fleischman and E. Hovy. Fine grained classification of named entities. In Proceedings of the 19th Conference on Computational Linguistics (COLING), 2002.]] Google ScholarDigital Library
- F. Freitag and N. Kushmerick. Boosted Wrapper Induction. In Proceedings of AAAI conference, pages 577--583, 2000.]] Google ScholarDigital Library
- R. Girju and M. Moldovan. Text mining for causal relations. In Proceedings of the FLAIRS Conference, pages 360--364, 2002.]] Google ScholarDigital Library
- G. Grefenstette. The WWW as a resource for example-based MT tasks. In Proceedings of ASLIB'99 Translating and the Computer 21, 1999.]]Google Scholar
- U. Hahn and K. Schnattinger. Towards text knowledge engineering. In AAAI'98/IAAI'98 Proceedings of the 15th National Conference on Artificial Intelligence and the 10th Conference on Innovative Applications of Artificial Intelligence, pages 524--531, 1998.]] Google ScholarDigital Library
- S. Handschuh and S. Staab. CREAM - Creating Metadata for the Semantic Web. Computer Networks, 42:579--598, 2003.]] Google ScholarDigital Library
- M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th International Conference on Computational Linguistics, pages 539--545, 1992.]] Google ScholarDigital Library
- L. Hirschman and N. Chinchor. Muc-7 named entity task definition. In Proceedings of the 7th Message Understanding Conference (MUC-7), 1997.]]Google Scholar
- L. M. Iwanska, N. Mata, and K. Kruger. Fully automatic acquisition of taxonomic knowledge from large corpora of texts. In L. M. Iwanksa and S. C. Shapiro, editors, Natural Language Processing and Knowledge Processing, pages 335--345. MIT/AAAI Press, 2000.]] Google ScholarDigital Library
- F. Keller, M. Lapata, and O. Ourioupina. Using the web to overcome data sparseness. In Proceedings of EMNLP-02, pages 230--237, 2002.]] Google ScholarDigital Library
- C. T. Kwok, O. Etzioni, and Daniel S. Weld. Scaling question answering to the web. In ACM Transactions on Information Systems 2001, pages 150--161, 2001.]] Google ScholarDigital Library
- A. Maedche, V. Pekar, and S. Staab. Ontology learning part one - on discovering taxonomic relations from the web. In Web Intelligence, pages 301--322. Springer Verlag, 2002.]]Google Scholar
- A. Maedche and S. Staab. Measuring similarity between ontologies. In Proceedings of the European Conference on Knowledge Acquisition and Management (EKAW), pages 251--263. Springer Verlag, 2002.]] Google ScholarDigital Library
- K. Markert, N. Modjeska, and M. Nissim. Using the web for nominal anaphora resolution. In EACL Workshop on the Computational Treatment of Anaphora, 2003.]]Google Scholar
- M. Poesio, T. Ishikawa, S. Schulte im Walde, and R. Viera. Acquiring lexical knowledge for anaphora resolution. In Proceedings of the 3rd Conference on Language Resources and Evaluation, 2002.]]Google Scholar
- M. Poesio and R. Vieira. A corpus-based investigation of definite description use. Computational Linguistics, 24(2):183--216, 1998.]] Google ScholarDigital Library
- D. R. Radev, H. Qi, Z. Zheng, S. Blair-Goldensohn, Z. Zhang, W. Fan, and J. M. Prager. Mining the web for answers to natural language questions. In Proceedings of the Conference on Information and Knowledge Management, pages 143--150, 2001.]] Google ScholarDigital Library
- P. Resnik and N. Smith. The web as a parallel corpus. Computational Lingusitics, 29(3):349--380, 2003.]] Google ScholarDigital Library
- Stephen Soderland. Learning information extraction rules for semi-structured and free text. Machine Learning, 34(1-3):233--272, 1999.]] Google ScholarDigital Library
- S. Staab, C. Braun, I. Bruder, A. Düsterhöft, A. Heuer, M. Klettke, G. Neumann, B. Prager, J. Pretzel, H.-P. Schnurr, R. Studer, H. Uszkoreit, and B. Wrenger. Getess - searching the web exploiting german texts. In Proceedings of the 3rd Workshop on Cooperative Information Agents, pages 113--124. Springer Verlag, 1999.]] Google ScholarDigital Library
- J. Surowiecki. The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations. Doubleday Books, 2004.]] Google ScholarDigital Library
- P. D. Turney. Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In Proceedings of the Twelfth European Conference on Machine Learning (ECML), pages 491--502, 2001.]] Google ScholarDigital Library
Index Terms
- Learning by googling
Recommendations
Ontology learning: state of the art and open issues
Ontology is one of the fundamental cornerstones of the semantic Web. The pervasive use of ontologies in information sharing and knowledge management calls for efficient and effective approaches to ontology development. Ontology learning, which seeks to ...
Ontology learning: revisted
The term "ontology" comes from the field of philosophy that is concerned with the study of being or existence. In general computer science defines ontology as an "explicit specification of a conceptualization," which is, "the objects, concepts, and ...
Arabic ontology learning using deep learning
WI '17: Proceedings of the International Conference on Web IntelligenceOntology, the backbone of Semantic Web, is defined as the formal specification of conceptual hierarchy with relationships between concepts. Ontology Learning (OL) is a process to create an ontology from text automatically or semi-automatically. OL is an ...
Comments