skip to main content
10.1145/1013367.1013455acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
Article

Automatically collecting, monitoring, and mining japanese weblogs

Published:19 May 2004Publication History

ABSTRACT

We present a system that tries to automatically collect and monitor Japanese blog collections that include not only ones made with blog softwares but also ones written as normal web pages. Our approach is based on extraction of date expressions and analysis of HTML documents. Our system also extracts and mines useful information from the collected blog pages.

References

  1. M. Ceglowski. Www::blog::identify - identify blogging tools based on url and content. http://search.cpan.org/~mceglows/ WWW-Blog-Identify-0.06/Identify.pm, 2003.Google ScholarGoogle Scholar
  2. IPA(Information-technology Promotion Agency, Japan). Generic engine for transposable association: Geta. http://geta.ex.nii.ac.jp/, 2002.Google ScholarGoogle Scholar
  3. J. Kleinberg. Bursty and hierarchical structure in streams. In Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1--25, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In Proc. of the 12th International World Wide Web Conference, pages 568--576, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Wiebe, E. Breck, C. Buckley, and C. Cardie. Recognizing and organizing opinions expressed in the world press. In Proc. of the 2003 AAAI Spring Symposium New Directions in Question Answering, pages 12--19, 2003. Technical Report SS-03-07.Google ScholarGoogle Scholar
  6. D. Winer. Weblogs.com xml-rpc interface. http://www.xmlrpc.com/weblogsCom, 2001.Google ScholarGoogle Scholar

Index Terms

  1. Automatically collecting, monitoring, and mining japanese weblogs

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            WWW Alt. '04: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
            May 2004
            532 pages
            ISBN:1581139128
            DOI:10.1145/1013367

            Copyright © 2004 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 19 May 2004

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate1,899of8,196submissions,23%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader