ABSTRACT
Events are central in human history and thus also in Web queries, in particular if they relate to history or news. However, ambiguity issues arise as queries may refer to ambiguous events differing in time, geography, or participating entities. Thus, users would greatly benefit if search results were presented along different events. In this paper, we present EventMiner, an algorithm that mines events from top-k pseudo-relevant documents for a given query. It is a probabilistic framework that leverages semantic annotations in the form of temporal expressions, geographic locations, and named entities to analyze natural language text and determine important events. Using a large news corpus, we show that using semantic annotations, EventMiner detects important events and presents documents covering the identified events in the order of their importance.
- A. Abujabal and K. Berberich. Important events in the past, present, and future. WWW 2015-Companion Volume. Google ScholarDigital Library
- J. Allan, editor. Topic Detection and Tracking: Event-based Information Organization. Kluwer Academic Publishers, Norwell, MA, USA, 2002. Google ScholarDigital Library
- O. Alonso et al. Clustering and exploring search results using timeline constructions. CIKM 2009. Google ScholarDigital Library
- C. E. Antoniak. Mixtures of dirichlet processes with applications to bayesian nonparametric problems. Ann. Statist., 2(6):1152--1174, 11 1974.Google ScholarCross Ref
- K. Berberich et al. A language modeling approach for temporal information needs. ECIR 2010. Google ScholarDigital Library
- D. Blackwell and J. B. MacQueen. Ferguson distributions via polya urn schemes. Ann. Statist., 1(2):353--355, 03 1973.Google ScholarCross Ref
- R. Campos et al. Survey of temporal information retrieval and related applications. ACM Computing Survey, 47(2):15:1--15:41, 2014. Google ScholarDigital Library
- A. X. Chang and C. D. Manning. SUTIME: A library for recognizing and normalizing time expressions. LREC 2012.Google Scholar
- D. Gupta and K. Berberich. Identifying time intervals of interest to queries. CIKM 2014. Google ScholarDigital Library
- D. Gupta and K. Berberich. Temporal query classification at different granularities. SPIRE 2015. Google ScholarDigital Library
- D. Gupta and K. Berberich. Diversifying search results using time. Research Report MPI-I-2016--5-001, 2016.Google ScholarCross Ref
- D. Gupta and K. Berberich. A probabilistic framework for time-sensitive search. NTCIR-12 2016.Google Scholar
- W. H\"ardle et al. Nonparametric and semiparametric models. Springer Series in Statistics. Springer-Verlag, New York, 2004.Google Scholar
- J. Hoffart et al. Robust disambiguation of named entities in text. EMNLP 2011. Google ScholarDigital Library
- A. Jatowt et al. Estimating document focus time. CIKM 2013. Google ScholarDigital Library
- A. Jatowt and C. Man Au Yeung. Extracting collective expectations about the future from large text collections. CIKM 2011. Google ScholarDigital Library
- E. Kuzey et al. A fresh look on knowledge bases: Distilling named events from news. CIKM 2014. Google ScholarDigital Library
- D. J. C. MacKay and L. C. B. Peto. A hierarchical dirichlet language model. Natural Language Engineering, 1:289--308, 9 1995.Google ScholarCross Ref
- C. Man Au Yeung and A. Jatowt. Studying how the past is remembered: towards computational historythrough large scale text mining. CIKM 2011. Google ScholarDigital Library
- P. P. Mazur and R. Dale. Wikiwars: A new corpus for research on temporal expressions. EMNLP 2010. Google ScholarDigital Library
- D. Metzler et al. Improving search relevance for implicitly temporal queries. SIGIR 2009. Google ScholarDigital Library
- S. Nunes, et al. Use of temporal expressions in web search. ECIR 2008. Google ScholarDigital Library
- A. Qamra, et al. Mining blog stories using community based and temporal clustering. CIKM 2006. Google ScholarDigital Library
- K. Radinsky et al. Learning causality for news events prediction. WWW 2012. Google ScholarDigital Library
- H. Samet et al. Reading news with maps by exploiting spatial synonyms. Commun. ACM, 57(10):64--77, 2014. Google ScholarDigital Library
- J. Strötgen and M. Gertz. Event-centric search and exploration in document collections. JCDL 2012.Google ScholarDigital Library
- J. Strötgen and M. Gertz. Multilingual and cross-domain temporal tagging. Language Resources and Evaluation, 47(2):269--298, 2013.Google ScholarCross Ref
- J. Strötgen and M. Gertz. Proximity2-aware ranking for textual, temporal, and geographic queries. CIKM 2013.Google ScholarDigital Library
- F. M. Suchanek et al. Yago: A large ontology from wikipedia and wordnet. Web Semantics, 6(3):203--217, 2008. Google ScholarDigital Library
- R. C. Swan and J. Allan. Automatic generation of overview timelines. SIGIR 2000. Google ScholarDigital Library
- Y. W. Teh. Dirichlet processes. Encyclopedia of Machine Learning. Springer, 2010.Google Scholar
- Y. W. Teh et al. Sharing clusters among related groups: Hierarchical Dirichletprocesses. Advances in Neural Information Processing Systems, volume 17, 2005. Google ScholarDigital Library
- Y. W. Teh et al. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566--1581, 2006.Google ScholarCross Ref
- I. Witten and D. Milne. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In AAAI Workshop on Wikipedia and AI, 2008.Google Scholar
- C. Y. Lin. Rouge: a package for automatic evaluation of summaries. ACL 2004.Google Scholar
- A. Guttman. R-Trees: A Dynamic index structure for spatial searching. SIGMOD 1984. Google ScholarDigital Library
- X. Zhu et al. Time-sensitive dirichlet process mixture models. Technical report, DTIC Document, 2005.Google Scholar
- N. Kanhabua, et al. Temporal information retrieval. Foundations and Trends in Information Retrieval, 9(2):91--208,2015. Google ScholarDigital Library
- L. Hu et al. TSDPMM: Incorporating Prior Topic Knowledge into Dirichlet Process Mixture Models for Text Clustering. EMNLP 2015.Google ScholarCross Ref
- V. Zhang et al. Geomodification in query rewriting. GIR 2006.Google Scholar
Index Terms
- EventMiner: Mining Events from Annotated Documents
Recommendations
Event Search and Analytics: Detecting Events in Semantically Annotated Corpora for Search & Analytics
WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data MiningIn this article, I present the questions that I seek to answer in my PhD research. I posit to analyze natural language text with the help of semantic annotations and mine important events for navigating large text corpora. Semantic annotations such as ...
A French clinical corpus with comprehensive semantic annotations: development of the Medical Entity and Relation LIMSI annOtated Text corpus (MERLOT)
Quality annotated resources are essential for Natural Language Processing. The objective of this work is to present a corpus of clinical narratives in French annotated for linguistic, semantic and structural information, aimed at clinical information ...
Collaborative text-annotation resource for disease-centered relation extraction from biomedical text
Agglomerating results from studies of individual biological components has shown the potential to produce biomedical discovery and the promise of therapeutic development. Such knowledge integration could be tremendously facilitated by automated text ...
Comments