skip to main content
10.1145/1963192.1963211acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
poster

HyLiEn: a hybrid approach to general list extraction on the web

Authors Info & Claims
Published:28 March 2011Publication History

ABSTRACT

We consider the problem of automatically extracting general lists from the web. Existing approaches are mostly dependent upon either the underlying HTML markup or the visual structure of the Web page. We present HyLiEn an unsupervised, Hybrid approach for automatic List discovery and Extraction on the Web. It employs general assumptions about the visual rendering of lists, and the structural representation of items contained in them. We show that our method significantly outperforms existing methods.

References

  1. M. J. Cafarella, A. Halevy, D. Z. Wang, E. Wu, and Y. Zhang. Webtables: exploring the power of tables on the web. Proc. VLDB Endow., 1(1):538--549, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. W. Gatterbauer, P. Bohunsky, M. Herzog, B. Krupl, and B. Pollak. Towards domain-independent information extraction from web tables. In WWW, pages 71--80, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Lerman, L. Getoor, S. Minton, and C. Knoblock. Using the structure of web sites for automatic segmentation of tables. In SIGMOD, pages 119--130, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. W. Liu, X. Meng, and W. Meng. Vide: A vision-based approach for deep web data extraction. IEEE Trans. on Knowl. and Data Eng., 22(3):447--460, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Simon and G. Lausen. Viper: augmenting automatic information extraction with visual perceptions. In CIKM, pages 381--388, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Tong and J. Dean. System and methods for automatically creating lists. In US Patent: 7350187, Mar 2008.Google ScholarGoogle Scholar
  7. R. C. Wang and W. W. Cohen. Language-independent set expansion of named entities using the web. In ICDM '07: Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, pages 342--350, Washington, DC, USA, 2007. IEEE Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Weninger, F. Fumarola, R. Barber, J. Han, and D. Malerba. Unexpected results in automatic list extraction on the web. SIGKDD Explorations, 12(2), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. HyLiEn: a hybrid approach to general list extraction on the web

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '11: Proceedings of the 20th international conference companion on World wide web
      March 2011
      552 pages
      ISBN:9781450306379
      DOI:10.1145/1963192

      Copyright © 2011 Authors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 March 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader