skip to main content
10.1145/1557019.1557075acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

On burstiness-aware search for document sequences

Published:28 June 2009Publication History

ABSTRACT

As the number and size of large timestamped collections (e.g. sequences of digitized newspapers, periodicals, blogs) increase, the problem of efficiently indexing and searching such data becomes more important. Term burstiness has been extensively researched as a mechanism to address event detection in the context of such collections. In this paper, we explore how burstiness information can be further utilized to enhance the search process. We present a novel approach to model the burstiness of a term, using discrepancy theory concepts. This allows us to build a parameter-free, linear-time approach to identify the time intervals of maximum burstiness for a given term. Finally, we describe the first burstiness-driven search framework and thoroughly evaluate our approach in the context of different scenarios.

Skip Supplemental Material Section

Supplemental Material

p477-lappas.mp4

mp4

87.9 MB

References

  1. D. Agarwal, J. M. Phillips, and S. Venkatasubramanian. The hunting of the bump: on maximizing statistical discrepancy. In SODA '06, pages 1137--1146, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Bansal and N. Koudas. Blogscope: a system for online analysis of high volume text streams. In VLDB '07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Bansal and N. Koudas. BlogScope: spatio-temporal analysis of the blogosphere. In WWW '07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Chazelle. The discrepancy method: randomness and complexity. Cambridge University Press, NY, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C.-H. Cheng, K.-Y. Chen, W.-C. Tien, and K.-M. Chao. Improved algorithmms for the k maximum-sums problems. Theor. Comput. Sci., 362(1):162--170, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms, Second Edition. The MIT Press, September 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. P. Dobkin, D. Gunopulos, and W. Maass. Computing the maximum bichromatic discrepancy, with applications to computer graphics and machine learning. J. Comput. Syst. Sci., 52(3):453--470, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Dobkin and D. Eppstein. Computing the discrepancy. In SCG '93, pages 47--52, New York, NY, USA, 1993. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS '01. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. P. C. Fung, J. X. Yu, P. S. Yu, and H. Lu. Parameter free bursty events detection in text streams. In VLDB '05. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Q. He, K. Chang, and E.-P. Lim. Analyzing feature trajectories for event detection. In SIGIR '07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Q. He, K. Chang, E.-P. Lim, and J. Zhang. Bursty feature representation for clustering text streams. In SIAM '07.Google ScholarGoogle Scholar
  13. J. Kleinberg. Bursty and hierarchical structure in streams. In KDD '02, pages 91--101, New York, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In WWW '03. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. National Digital Newspaper Program (NDNP), http://www.loc.gov/ndnp.Google ScholarGoogle Scholar
  16. Qi He and Kuiyu Chang and Ee-Peng Lim. Using burstiness to improve clustering of topics in news streams. In ICDM '07, Washington, DC, USA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. W. L. Ruzzo and M. Tompa. A linear time algorithm for finding all maximal scoring subsequences. In ISMB 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Vlachos, C. Meek, Z. Vagena, and D. Gunopulos. Identifying similarities, periodicities and bursts for online search queries. In SIGMOD '04, pages 131--142, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. Zhu and D. Shasha. Efficient elastic burst detection in data streams. In KDD '03, pages 336--345, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. On burstiness-aware search for document sequences

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
            June 2009
            1426 pages
            ISBN:9781605584959
            DOI:10.1145/1557019

            Copyright © 2009 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 28 June 2009

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate1,133of8,635submissions,13%

            Upcoming Conference

            KDD '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader