skip to main content
10.1145/2970398.2970411acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
research-article

EventMiner: Mining Events from Annotated Documents

Authors Info & Claims
Published:12 September 2016Publication History

ABSTRACT

Events are central in human history and thus also in Web queries, in particular if they relate to history or news. However, ambiguity issues arise as queries may refer to ambiguous events differing in time, geography, or participating entities. Thus, users would greatly benefit if search results were presented along different events. In this paper, we present EventMiner, an algorithm that mines events from top-k pseudo-relevant documents for a given query. It is a probabilistic framework that leverages semantic annotations in the form of temporal expressions, geographic locations, and named entities to analyze natural language text and determine important events. Using a large news corpus, we show that using semantic annotations, EventMiner detects important events and presents documents covering the identified events in the order of their importance.

References

  1. A. Abujabal and K. Berberich. Important events in the past, present, and future. WWW 2015-Companion Volume. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Allan, editor. Topic Detection and Tracking: Event-based Information Organization. Kluwer Academic Publishers, Norwell, MA, USA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. O. Alonso et al. Clustering and exploring search results using timeline constructions. CIKM 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. E. Antoniak. Mixtures of dirichlet processes with applications to bayesian nonparametric problems. Ann. Statist., 2(6):1152--1174, 11 1974.Google ScholarGoogle ScholarCross RefCross Ref
  5. K. Berberich et al. A language modeling approach for temporal information needs. ECIR 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Blackwell and J. B. MacQueen. Ferguson distributions via polya urn schemes. Ann. Statist., 1(2):353--355, 03 1973.Google ScholarGoogle ScholarCross RefCross Ref
  7. R. Campos et al. Survey of temporal information retrieval and related applications. ACM Computing Survey, 47(2):15:1--15:41, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. X. Chang and C. D. Manning. SUTIME: A library for recognizing and normalizing time expressions. LREC 2012.Google ScholarGoogle Scholar
  9. D. Gupta and K. Berberich. Identifying time intervals of interest to queries. CIKM 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Gupta and K. Berberich. Temporal query classification at different granularities. SPIRE 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Gupta and K. Berberich. Diversifying search results using time. Research Report MPI-I-2016--5-001, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  12. D. Gupta and K. Berberich. A probabilistic framework for time-sensitive search. NTCIR-12 2016.Google ScholarGoogle Scholar
  13. W. H\"ardle et al. Nonparametric and semiparametric models. Springer Series in Statistics. Springer-Verlag, New York, 2004.Google ScholarGoogle Scholar
  14. J. Hoffart et al. Robust disambiguation of named entities in text. EMNLP 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Jatowt et al. Estimating document focus time. CIKM 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Jatowt and C. Man Au Yeung. Extracting collective expectations about the future from large text collections. CIKM 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Kuzey et al. A fresh look on knowledge bases: Distilling named events from news. CIKM 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. J. C. MacKay and L. C. B. Peto. A hierarchical dirichlet language model. Natural Language Engineering, 1:289--308, 9 1995.Google ScholarGoogle ScholarCross RefCross Ref
  19. C. Man Au Yeung and A. Jatowt. Studying how the past is remembered: towards computational historythrough large scale text mining. CIKM 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. P. Mazur and R. Dale. Wikiwars: A new corpus for research on temporal expressions. EMNLP 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Metzler et al. Improving search relevance for implicitly temporal queries. SIGIR 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Nunes, et al. Use of temporal expressions in web search. ECIR 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Qamra, et al. Mining blog stories using community based and temporal clustering. CIKM 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. Radinsky et al. Learning causality for news events prediction. WWW 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. H. Samet et al. Reading news with maps by exploiting spatial synonyms. Commun. ACM, 57(10):64--77, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Strötgen and M. Gertz. Event-centric search and exploration in document collections. JCDL 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Strötgen and M. Gertz. Multilingual and cross-domain temporal tagging. Language Resources and Evaluation, 47(2):269--298, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  28. J. Strötgen and M. Gertz. Proximity2-aware ranking for textual, temporal, and geographic queries. CIKM 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. F. M. Suchanek et al. Yago: A large ontology from wikipedia and wordnet. Web Semantics, 6(3):203--217, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. C. Swan and J. Allan. Automatic generation of overview timelines. SIGIR 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. W. Teh. Dirichlet processes. Encyclopedia of Machine Learning. Springer, 2010.Google ScholarGoogle Scholar
  32. Y. W. Teh et al. Sharing clusters among related groups: Hierarchical Dirichletprocesses. Advances in Neural Information Processing Systems, volume 17, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Y. W. Teh et al. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566--1581, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  34. I. Witten and D. Milne. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In AAAI Workshop on Wikipedia and AI, 2008.Google ScholarGoogle Scholar
  35. C. Y. Lin. Rouge: a package for automatic evaluation of summaries. ACL 2004.Google ScholarGoogle Scholar
  36. A. Guttman. R-Trees: A Dynamic index structure for spatial searching. SIGMOD 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. X. Zhu et al. Time-sensitive dirichlet process mixture models. Technical report, DTIC Document, 2005.Google ScholarGoogle Scholar
  38. N. Kanhabua, et al. Temporal information retrieval. Foundations and Trends in Information Retrieval, 9(2):91--208,2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. L. Hu et al. TSDPMM: Incorporating Prior Topic Knowledge into Dirichlet Process Mixture Models for Text Clustering. EMNLP 2015.Google ScholarGoogle ScholarCross RefCross Ref
  40. V. Zhang et al. Geomodification in query rewriting. GIR 2006.Google ScholarGoogle Scholar

Index Terms

  1. EventMiner: Mining Events from Annotated Documents

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval
          September 2016
          318 pages
          ISBN:9781450344975
          DOI:10.1145/2970398

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 September 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          ICTIR '16 Paper Acceptance Rate41of79submissions,52%Overall Acceptance Rate209of482submissions,43%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader