research-article

EventMiner: Mining Events from Annotated Documents

Authors:
Dhruv Gupta

Max Planck Institute for Informatics, Saarbruecken, Germany

Max Planck Institute for Informatics, Saarbruecken, Germany
View Profile

,
Jannik Strötgen

Max Planck Institute for Informatics, Saarbruecken, Germany

Max Planck Institute for Informatics, Saarbruecken, Germany
View Profile

,
Klaus Berberich

Max Planck Institute for Informatics, Saarbruecken, Germany

Max Planck Institute for Informatics, Saarbruecken, Germany
View Profile

ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information RetrievalSeptember 2016Pages 261–270https://doi.org/10.1145/2970398.2970411

Published:12 September 2016Publication History

ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval

Pages 261–270

ABSTRACT

Events are central in human history and thus also in Web queries, in particular if they relate to history or news. However, ambiguity issues arise as queries may refer to ambiguous events differing in time, geography, or participating entities. Thus, users would greatly benefit if search results were presented along different events. In this paper, we present EventMiner, an algorithm that mines events from top-k pseudo-relevant documents for a given query. It is a probabilistic framework that leverages semantic annotations in the form of temporal expressions, geographic locations, and named entities to analyze natural language text and determine important events. Using a large news corpus, we show that using semantic annotations, EventMiner detects important events and presents documents covering the identified events in the order of their importance.

References

A. Abujabal and K. Berberich. Important events in the past, present, and future. WWW 2015-Companion Volume. Google ScholarDigital Library
J. Allan, editor. Topic Detection and Tracking: Event-based Information Organization. Kluwer Academic Publishers, Norwell, MA, USA, 2002. Google ScholarDigital Library
O. Alonso et al. Clustering and exploring search results using timeline constructions. CIKM 2009. Google ScholarDigital Library
C. E. Antoniak. Mixtures of dirichlet processes with applications to bayesian nonparametric problems. Ann. Statist., 2(6):1152--1174, 11 1974.Google ScholarCross Ref
K. Berberich et al. A language modeling approach for temporal information needs. ECIR 2010. Google ScholarDigital Library
D. Blackwell and J. B. MacQueen. Ferguson distributions via polya urn schemes. Ann. Statist., 1(2):353--355, 03 1973.Google ScholarCross Ref
R. Campos et al. Survey of temporal information retrieval and related applications. ACM Computing Survey, 47(2):15:1--15:41, 2014. Google ScholarDigital Library
A. X. Chang and C. D. Manning. SUTIME: A library for recognizing and normalizing time expressions. LREC 2012.Google Scholar
D. Gupta and K. Berberich. Identifying time intervals of interest to queries. CIKM 2014. Google ScholarDigital Library
D. Gupta and K. Berberich. Temporal query classification at different granularities. SPIRE 2015. Google ScholarDigital Library
D. Gupta and K. Berberich. Diversifying search results using time. Research Report MPI-I-2016--5-001, 2016.Google ScholarCross Ref
D. Gupta and K. Berberich. A probabilistic framework for time-sensitive search. NTCIR-12 2016.Google Scholar
W. H\"ardle et al. Nonparametric and semiparametric models. Springer Series in Statistics. Springer-Verlag, New York, 2004.Google Scholar
J. Hoffart et al. Robust disambiguation of named entities in text. EMNLP 2011. Google ScholarDigital Library
A. Jatowt et al. Estimating document focus time. CIKM 2013. Google ScholarDigital Library
A. Jatowt and C. Man Au Yeung. Extracting collective expectations about the future from large text collections. CIKM 2011. Google ScholarDigital Library
E. Kuzey et al. A fresh look on knowledge bases: Distilling named events from news. CIKM 2014. Google ScholarDigital Library
D. J. C. MacKay and L. C. B. Peto. A hierarchical dirichlet language model. Natural Language Engineering, 1:289--308, 9 1995.Google ScholarCross Ref
C. Man Au Yeung and A. Jatowt. Studying how the past is remembered: towards computational historythrough large scale text mining. CIKM 2011. Google ScholarDigital Library
P. P. Mazur and R. Dale. Wikiwars: A new corpus for research on temporal expressions. EMNLP 2010. Google ScholarDigital Library
D. Metzler et al. Improving search relevance for implicitly temporal queries. SIGIR 2009. Google ScholarDigital Library
S. Nunes, et al. Use of temporal expressions in web search. ECIR 2008. Google ScholarDigital Library
A. Qamra, et al. Mining blog stories using community based and temporal clustering. CIKM 2006. Google ScholarDigital Library
K. Radinsky et al. Learning causality for news events prediction. WWW 2012. Google ScholarDigital Library
H. Samet et al. Reading news with maps by exploiting spatial synonyms. Commun. ACM, 57(10):64--77, 2014. Google ScholarDigital Library
J. Strötgen and M. Gertz. Event-centric search and exploration in document collections. JCDL 2012.Google ScholarDigital Library
J. Strötgen and M. Gertz. Multilingual and cross-domain temporal tagging. Language Resources and Evaluation, 47(2):269--298, 2013.Google ScholarCross Ref
J. Strötgen and M. Gertz. Proximity2-aware ranking for textual, temporal, and geographic queries. CIKM 2013.Google ScholarDigital Library
F. M. Suchanek et al. Yago: A large ontology from wikipedia and wordnet. Web Semantics, 6(3):203--217, 2008. Google ScholarDigital Library
R. C. Swan and J. Allan. Automatic generation of overview timelines. SIGIR 2000. Google ScholarDigital Library
Y. W. Teh. Dirichlet processes. Encyclopedia of Machine Learning. Springer, 2010.Google Scholar
Y. W. Teh et al. Sharing clusters among related groups: Hierarchical Dirichletprocesses. Advances in Neural Information Processing Systems, volume 17, 2005. Google ScholarDigital Library
Y. W. Teh et al. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566--1581, 2006.Google ScholarCross Ref
I. Witten and D. Milne. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In AAAI Workshop on Wikipedia and AI, 2008.Google Scholar
C. Y. Lin. Rouge: a package for automatic evaluation of summaries. ACL 2004.Google Scholar
A. Guttman. R-Trees: A Dynamic index structure for spatial searching. SIGMOD 1984. Google ScholarDigital Library
X. Zhu et al. Time-sensitive dirichlet process mixture models. Technical report, DTIC Document, 2005.Google Scholar
N. Kanhabua, et al. Temporal information retrieval. Foundations and Trends in Information Retrieval, 9(2):91--208,2015. Google ScholarDigital Library
L. Hu et al. TSDPMM: Incorporating Prior Topic Knowledge into Dirichlet Process Mixture Models for Text Clustering. EMNLP 2015.Google ScholarCross Ref
V. Zhang et al. Geomodification in query rewriting. GIR 2006.Google Scholar

Index Terms

EventMiner: Mining Events from Annotated Documents
1. Information systems
  1. Information retrieval
  2. Information systems applications
    1. Data mining
    2. Spatial-temporal systems

Recommendations

Event Search and Analytics: Detecting Events in Semantically Annotated Corpora for Search & Analytics
WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining

In this article, I present the questions that I seek to answer in my PhD research. I posit to analyze natural language text with the help of semantic annotations and mine important events for navigating large text corpora. Semantic annotations such as ...
Read More
A French clinical corpus with comprehensive semantic annotations: development of the Medical Entity and Relation LIMSI annOtated Text corpus (MERLOT)

Quality annotated resources are essential for Natural Language Processing. The objective of this work is to present a corpus of clinical narratives in French annotated for linguistic, semantic and structural information, aimed at clinical information ...
Read More
Collaborative text-annotation resource for disease-centered relation extraction from biomedical text

Agglomerating results from studies of individual biological components has shown the potential to produce biomedical discovery and the promise of therapeutic development. Such knowledge integration could be tremendously facilitated by automated text ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval
September 2016
318 pages
ISBN:9781450344975
DOI:10.1145/2970398
General Chairs:
Ben Carterette
University of Delaware, USA
,
Hui Fang
University of Delaware, USA
,
Program Chairs:
Mounia Lalmas
Yahoo! Labs, UK
,
Jian-Yun Nie
University of Montreal, Canada
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 September 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
information retrieval
semantic annotations
text mining
Qualifiers
- research-article
Conference

Acceptance Rates
ICTIR '16 Paper Acceptance Rate41of79submissions,52%Overall Acceptance Rate209of482submissions,43%
More
Upcoming Conference
ICTIR '24

Sponsor:

sigir

The 2024 ACM SIGIR International Conference on the Theory of Information Retrieval

July 13, 2024

Washington DC , DC , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 153
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

EventMiner: Mining Events from Annotated Documents

ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Event Search and Analytics: Detecting Events in Semantically Annotated Corpora for Search & Analytics

A French clinical corpus with comprehensive semantic annotations: development of the Medical Entity and Relation LIMSI annOtated Text corpus (MERLOT)

Collaborative text-annotation resource for disease-centered relation extraction from biomedical text

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

EventMiner: Mining Events from Annotated Documents

ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Event Search and Analytics: Detecting Events in Semantically Annotated Corpora for Search & Analytics

A French clinical corpus with comprehensive semantic annotations: development of the Medical Entity and Relation LIMSI annOtated Text corpus (MERLOT)

Collaborative text-annotation resource for disease-centered relation extraction from biomedical text

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media