article

Learning by googling

Authors:
Philipp Cimiano

University of Karlsruhe

University of Karlsruhe
View Profile

,
Steffen Staab

University of Koblenz-Landau

University of Koblenz-Landau
View Profile

Authors Info & Claims

ACM SIGKDD Explorations Newsletter Volume 6 Issue 2December 2004pp 24–33https://doi.org/10.1145/1046456.1046460

Published:01 December 2004Publication History

ACM SIGKDD Explorations Newsletter

Abstract

The goal of giving a well-defined meaning to information is currently shared by endeavors such as the Semantic Web as well as by current trends within Knowledge Management. They all depend on the large-scale formalization of knowledge and on the availability of formal metadata about information resources. However, the question how to provide the necessary formal metadata in an effective and efficient way is still not solved to a satisfactory extent. Certainly, the most effective way to provide such metadata as well as formalized knowledge is to let humans encode them directly into the system, but this is neither efficient nor feasible. Furthermore, as current social studies show, individual knowledge is often less powerful than the collective knowledge of a certain community.As a potential way out of the knowledge acquisition bottleneck, we present a novel methodology that acquires collective knowledge from the World Wide Web using the Google^TM API. In particular, we present PANKOW, a concrete instantiation of this methodology which is evaluated in two experiments: one with the aim of classifying novel instances with regard to an existing ontology and one with the aim of learning sub-/superconcept relations.

References

E. Agirre, O. Ansa, E. Hovy, and D. Martinez. Enriching very large ontologies using the WWW. In Proceedings of the ECAI Ontology Learning Workshop, 2000.]]Google Scholar
K. Ahmad, M. Tariq, B. Vrusias, and C. Handy. Corpus-based thesaurus construction for image retrieval in specialist domains. In Proceedings of the 25th European Conference on Advances in Information Retrieval (ECIR), pages 502--510, 2003.]] Google ScholarDigital Library
E. Alfonseca and S. Manandhar. Extending a lexical ontology by a combination of distributional semantics signatures. In Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2002), pages 1--7, 2002.]] Google ScholarDigital Library
M. Banko, E. Brill, S. Dumais, and J. Lin. AskMSR: Question answering using the Worldwide Web. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP'02), 2002.]] Google ScholarDigital Library
T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, 284(5):34--43, 2001.]]Google ScholarCross Ref
G. Bisson, C. Nedellec, and L. Canamero. Designing clustering methods for ontology building - The Mo'K workbench. In Proceedings of the ECAI Ontology Learning Workshop, pages 13--19, 2000.]]Google Scholar
C. Brewster, F. Ciravegna, and Y. Wilks. Background and foreground knowledge in dynamic ontology construction. In Proceedings of the SIGIR Semantic Web Workshop, 2003.]]Google Scholar
Sergey Brin. Extracting patterns and relations from the World Wide Web. In Proceedings of the WebDB Workshop at EDBT '98, pages 172--183, 1998.]] Google ScholarDigital Library
M. E. Califf and R. J. Mooney. Bottom-up relational learning of pattern matching rules for information extraction. Machine Learning Research, 4(2):177--210, 2004.]] Google ScholarDigital Library
S. A. Caraballo. Automatic construction of a hypernym-labeled noun hierarchy from text. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 120--126, 1999.]] Google ScholarDigital Library
J. Carletta. Asessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22(2):249--254, 1996.]] Google ScholarDigital Library
E. Charniak and M. Berland. Finding parts in very large corpora. In Proceedings of the 37th Annual Meeting of the ACL, pages 57--64, 1999.]] Google ScholarDigital Library
P. Cimiano, S. Handschuh, and S. Staab. Towards the self-annotating web. In Proceedings of the 13th World Wide Web Conference, pages 462--471, 2004.]] Google ScholarDigital Library
P. Cimiano, A. Hotho, and S. Staab. Comparing conceptual, divisive and agglomerative clustering for learning taxonomies from text. In Proceedings of the European Conference on Artificial Intelligence, pages 435--439, 2004.]]Google Scholar
F. Ciravegna. Adaptive information extraction from text by rule induction and generalization. In Proceedings of tht 17th International Joint Conference on Artificial Intelligence (IJCAI 2001), pages 1251--1256, 2001.]] Google ScholarDigital Library
F. Ciravegna, A. Dingli, D. Guthrie, and Y. Wilks. Integrating Information to Bootstrap Information Extraction from Web Sites. In Proceedings of the IJCAI Workshop on Information Integration on the Web, pages 9--14, 2003.]]Google Scholar
H. Cui, M.-Y. Kan, and T.-S. Chua. Unsupervised learning of soft patterns for generating definitions from online news. In Proceedings of the 13th World Wide Web Conference, pages 90--99, 2004.]] Google ScholarDigital Library
O. Etzioni, M. Cafarella, D. Downey, S. Kok, A.-M. Popescu, T. Shaked, S. Soderland, D.S. Weld, and A. Yates. Web-scale information extraction in Know-ItAll (preliminary results). In Proceedings of the 13th World Wide Web Conference, pages 100--109, 2004.]] Google ScholarDigital Library
O. Etzioni, M. Cafarella, D. Downey, A-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates. Methods for domain-independent information extraction from the web: An experimental comparison. In Proceedings of the AAAI Conference, pages 391--398, 2004.]] Google ScholarDigital Library
R. Evans. A framework for named entity recognition in the open domain. In Proceedings of the Recent Advances in Natural Language Processing (RANLP-2003), pages 137--144, 2003.]]Google Scholar
D. Faure and C. Nedellec. A corpus-based conceptual clustering method for verb frames and ontology. In P. Velardi, editor, Proceedings of the LREC Workshop on Adapting lexical and corpus resources to sublanguages and applications, pages 5--12, 1998.]]Google Scholar
Dieter Fensel. Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce. Springer, 2003.]] Google ScholarDigital Library
M. Fleischman and E. Hovy. Fine grained classification of named entities. In Proceedings of the 19th Conference on Computational Linguistics (COLING), 2002.]] Google ScholarDigital Library
F. Freitag and N. Kushmerick. Boosted Wrapper Induction. In Proceedings of AAAI conference, pages 577--583, 2000.]] Google ScholarDigital Library
R. Girju and M. Moldovan. Text mining for causal relations. In Proceedings of the FLAIRS Conference, pages 360--364, 2002.]] Google ScholarDigital Library
G. Grefenstette. The WWW as a resource for example-based MT tasks. In Proceedings of ASLIB'99 Translating and the Computer 21, 1999.]]Google Scholar
U. Hahn and K. Schnattinger. Towards text knowledge engineering. In AAAI'98/IAAI'98 Proceedings of the 15th National Conference on Artificial Intelligence and the 10th Conference on Innovative Applications of Artificial Intelligence, pages 524--531, 1998.]] Google ScholarDigital Library
S. Handschuh and S. Staab. CREAM - Creating Metadata for the Semantic Web. Computer Networks, 42:579--598, 2003.]] Google ScholarDigital Library
M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th International Conference on Computational Linguistics, pages 539--545, 1992.]] Google ScholarDigital Library
L. Hirschman and N. Chinchor. Muc-7 named entity task definition. In Proceedings of the 7th Message Understanding Conference (MUC-7), 1997.]]Google Scholar
L. M. Iwanska, N. Mata, and K. Kruger. Fully automatic acquisition of taxonomic knowledge from large corpora of texts. In L. M. Iwanksa and S. C. Shapiro, editors, Natural Language Processing and Knowledge Processing, pages 335--345. MIT/AAAI Press, 2000.]] Google ScholarDigital Library
F. Keller, M. Lapata, and O. Ourioupina. Using the web to overcome data sparseness. In Proceedings of EMNLP-02, pages 230--237, 2002.]] Google ScholarDigital Library
C. T. Kwok, O. Etzioni, and Daniel S. Weld. Scaling question answering to the web. In ACM Transactions on Information Systems 2001, pages 150--161, 2001.]] Google ScholarDigital Library
A. Maedche, V. Pekar, and S. Staab. Ontology learning part one - on discovering taxonomic relations from the web. In Web Intelligence, pages 301--322. Springer Verlag, 2002.]]Google Scholar
A. Maedche and S. Staab. Measuring similarity between ontologies. In Proceedings of the European Conference on Knowledge Acquisition and Management (EKAW), pages 251--263. Springer Verlag, 2002.]] Google ScholarDigital Library
K. Markert, N. Modjeska, and M. Nissim. Using the web for nominal anaphora resolution. In EACL Workshop on the Computational Treatment of Anaphora, 2003.]]Google Scholar
M. Poesio, T. Ishikawa, S. Schulte im Walde, and R. Viera. Acquiring lexical knowledge for anaphora resolution. In Proceedings of the 3rd Conference on Language Resources and Evaluation, 2002.]]Google Scholar
M. Poesio and R. Vieira. A corpus-based investigation of definite description use. Computational Linguistics, 24(2):183--216, 1998.]] Google ScholarDigital Library
D. R. Radev, H. Qi, Z. Zheng, S. Blair-Goldensohn, Z. Zhang, W. Fan, and J. M. Prager. Mining the web for answers to natural language questions. In Proceedings of the Conference on Information and Knowledge Management, pages 143--150, 2001.]] Google ScholarDigital Library
P. Resnik and N. Smith. The web as a parallel corpus. Computational Lingusitics, 29(3):349--380, 2003.]] Google ScholarDigital Library
Stephen Soderland. Learning information extraction rules for semi-structured and free text. Machine Learning, 34(1-3):233--272, 1999.]] Google ScholarDigital Library
S. Staab, C. Braun, I. Bruder, A. Düsterhöft, A. Heuer, M. Klettke, G. Neumann, B. Prager, J. Pretzel, H.-P. Schnurr, R. Studer, H. Uszkoreit, and B. Wrenger. Getess - searching the web exploiting german texts. In Proceedings of the 3rd Workshop on Cooperative Information Agents, pages 113--124. Springer Verlag, 1999.]] Google ScholarDigital Library
J. Surowiecki. The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations. Doubleday Books, 2004.]] Google ScholarDigital Library
P. D. Turney. Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In Proceedings of the Twelfth European Conference on Machine Learning (ECML), pages 491--502, 2001.]] Google ScholarDigital Library

Index Terms

Learning by googling

Index terms have been assigned to the content through auto-classification.

Recommendations

Ontology learning: state of the art and open issues

Ontology is one of the fundamental cornerstones of the semantic Web. The pervasive use of ontologies in information sharing and knowledge management calls for efficient and effective approaches to ontology development. Ontology learning, which seeks to ...
Read More
Ontology learning: revisted

The term "ontology" comes from the field of philosophy that is concerned with the study of being or existence. In general computer science defines ontology as an "explicit specification of a conceptualization," which is, "the objects, concepts, and ...
Read More
Arabic ontology learning using deep learning
WI '17: Proceedings of the International Conference on Web Intelligence

Ontology, the backbone of Semantic Web, is defined as the formal specification of conceptual hierarchy with relationships between concepts. Ontology Learning (OL) is a process to create an ontology from text automatically or semi-automatically. OL is an ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGKDD Explorations Newsletter Volume 6, Issue 2
December 2004
161 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/1046456
Issue’s Table of Contents

Copyright © 2004 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 December 2004
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 113
  Total Citations
  View Citations
- 1,412
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning by googling

ACM SIGKDD Explorations Newsletter

Abstract

References

Cited By

Index Terms

Recommendations

Ontology learning: state of the art and open issues

Ontology learning: revisted

Arabic ontology learning using deep learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Learning by googling

ACM SIGKDD Explorations Newsletter

Abstract

References

Cited By

Index Terms

Recommendations

Ontology learning: state of the art and open issues

Ontology learning: revisted

Arabic ontology learning using deep learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media