Article

IEPAD: information extraction based on pattern discovery

Authors:
Chia-Hui Chang

Dept. of Computer Science and Information Engineering, National Central University, Chung-Li, Taiwan 320

Dept. of Computer Science and Information Engineering, National Central University, Chung-Li, Taiwan 320
View Profile

,
Shao-Chen Lui

Dept. of Computer Science and Information Engineering, National Central University, Chung-Li, Taiwan 320

Dept. of Computer Science and Information Engineering, National Central University, Chung-Li, Taiwan 320
View Profile

WWW '01: Proceedings of the 10th international conference on World Wide WebMay 2001Pages 681–688https://doi.org/10.1145/371920.372182

Published:01 April 2001Publication History

WWW '01: Proceedings of the 10th international conference on World Wide Web

Pages 681–688

References

1.Chang, C.H.; Lui, S.C.; and Wu, Y.C. Applying pattern mining to Web information extraction. In Proceedings of the Fifth Pacific Asia Conference on Knowledge Discovery and Data Mining, Apr. 2001, Hong Kong.]] Google ScholarDigital Library
2.Chien, L.F. PAT-tree-based keyword extraction for Chinese information retrieval. In Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval. pp. 50-58. 1997.]] Google ScholarDigital Library
3.Doorenbos, R.B.; Etzioni, O.; and Weld, D. S. A scalable comparison-shopping agent for the World Wide Web. In Proceedings of the first international conference on Autonomous Agents. pp. 39-48, NewYork, NY, 1997, ACM Press.]] Google ScholarDigital Library
4.Embley, D.; Jiang, Y.; and Ng, Y. -K. 1999. Recordboundary discovery in Web documents. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD'99)}. pp. 467-478, Philadelphia, Pennsylvania.]] Google ScholarDigital Library
5.Gonnet, G.H.; Baeza-yates, R.A.; and Snider, T. 1992. New Indices for Text: Pat trees and Pat Arrays. Information Retrieval: Data Structures and Algorithms, Prentice Hall.]] Google ScholarDigital Library
6.Gusfield, D. 1997. Algorithms on strings, tree, and sequence, Cambridge. 1997.]] Google ScholarDigital Library
7.Hsu, C.-N., and Dung, M.-T. 1998. Generating finite-state transducers for semi-structured data extraction from the Web. Information Systems. 23(8): 521-538.]] Google ScholarDigital Library
8.Knoblock, A. et al., Eds. 1998. In Proceedings of the 1998 Workshop on AI and Information Integration, Menlo Park, California. AAAI Press.]]Google Scholar
9.Kurtz, S., and Schleiermacher, C. 1999. REPuter: fast computation of maximal repeats in complete genomes. Bioinformatics 15(5): 426-427.]]Google ScholarCross Ref
10.Kushmerick, N. 1999. Gleaning the Web. IEEE Intelligent Systems 14(2): 20-22.]] Google ScholarDigital Library
11.Kushmerick, N.; Weld, D.; and Doorenbos, R. 1997. Wrapper induction for information extraction. In Proceedings of the 15th International Joint Conference on Artificial Intelligence (IJCAI).]]Google Scholar
12.Morrison, D. R. Journal of ACM, 15, pp. 514-534, 1968.]] Google ScholarDigital Library
13.Muslea, I.; Minton, S.; and Knoblock, C. 1999. A hierarchical approach to wrapper induction. In Proceedings of the 3rd International Conference on Autonomous Agents (Agents '99), Seattle, WA.]] Google ScholarDigital Library
14.Muslea, I. 1999. Extraction patterns for information extraction tasks: a survey. In Proceedings of AAAI '99: Workshop on Machine Learning for Information Extraction]]Google Scholar
15.Sedgewick, R. Algorithms in C, Addison Wesley, 1990.]] Google ScholarDigital Library

Index Terms

IEPAD: information extraction based on pattern discovery

Recommendations

Automatic information extraction from semi-structured Web pages by pattern discovery
Web retrieval and mining

The World Wide Web is now undeniably the richest and most dense source of information; yet, its structure makes it difficult to make use of that information in a systematic way. This paper proposes a pattern discovery approach to the rapid generation of ...
Read More
Heuristic learning of rules for information extraction from web documents
InfoScale '07: Proceedings of the 2nd international conference on Scalable information systems

The efficacy of an information extraction system is mostly determined by the quality of the extraction rules. Building these extraction rules is time-consuming and difficult to implement by hand. Hence, we propose a Heuristic Rule Learning (HRL) ...
Read More
OLERA: Semisupervised Web-Data Extraction with Visual Support

Extracting information from semistructured Web documents is an important task for many information agents. Over the past few years, researchers have developed an extensive family of generic information extraction techniques based on supervised ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '01: Proceedings of the 10th international conference on World Wide Web
May 2001
770 pages
ISBN:1581133480
DOI:10.1145/371920
Chairmen:
Vincent Y. Shen
Hong Kong Univ. of Science and Technology
,
Nobuo Saito
Keio Univ., Japan
,
Michael R. Lyu
Chinese Univ. of Hong Kong, HK
,
Mary Ellen Zurko
Iris Associates,USA
Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 April 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
PAT tree
extraction rule
information extraction
multiple string alignment
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 306
  Total Citations
  View Citations
- 2,428
  Total Downloads
- Downloads (Last 12 months)30
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

IEPAD: information extraction based on pattern discovery

WWW '01: Proceedings of the 10th international conference on World Wide Web

References

Cited By

Index Terms

Recommendations

Automatic information extraction from semi-structured Web pages by pattern discovery

Heuristic learning of rules for information extraction from web documents

OLERA: Semisupervised Web-Data Extraction with Visual Support

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

IEPAD: information extraction based on pattern discovery

WWW '01: Proceedings of the 10th international conference on World Wide Web

References

Cited By

Index Terms

Recommendations

Automatic information extraction from semi-structured Web pages by pattern discovery

Heuristic learning of rules for information extraction from web documents

OLERA: Semisupervised Web-Data Extraction with Visual Support

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media