ABSTRACT
We present a method called TIMEMACHINE to generate a timeline of events and relations for entities in a knowledge base. For example for an actor, such a timeline should show the most important professional and personal milestones and relationships such as works, awards, collaborations, and family relationships. We develop three orthogonal timeline quality criteria that an ideal timeline should satisfy: (1) it shows events that are relevant to the entity; (2) it shows events that are temporally diverse, so they distribute along the time axis, avoiding visual crowding and allowing for easy user interaction, such as zooming in and out; and (3) it shows events that are content diverse, so they contain many different types of events (e.g., for an actor, it should show movies and marriages and awards, not just movies). We present an algorithm to generate such timelines for a given time period and screen size, based on submodular optimization and web-co-occurrence statistics with provable performance guarantees. A series of user studies using Mechanical Turk shows that all three quality criteria are crucial to produce quality timelines and that our algorithm significantly outperforms various baseline and state-of-the-art methods.
Supplemental Material
- A. Ahmed, C. H. Teo, S. Vishwanathan, and A. Smola. Fair and balanced: Learning to present news stories. In WSDM, 2012. Google ScholarDigital Library
- J. Allan, R. Gupta, and V. Khandelwal. Temporal summaries of new topics. In SIGIR, 2001. Google ScholarDigital Library
- T. Althoff, X. L. Dong, K. Murphy, S. Alai, V. Dang, and W. Zhang. TimeMachine: Timeline Generation for Knowledge-Base Entities. arXiv:1502.04662, 2015. Google ScholarDigital Library
- R. A. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. ACM Press, New York, 1999. Google ScholarDigital Library
- D. Bamman and N. Smith. Unsupervised discovery of biographical structure from text. TACL, 2(10):363--376, 2014.Google ScholarCross Ref
- K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD, 2008. Google ScholarDigital Library
- G. Calinescu, C. Chekuri, M. Pál, and J. Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing, 40(6):1740--1766, 2011. Google ScholarDigital Library
- J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, 1998. Google ScholarDigital Library
- B. Carterette, P. N. Bennett, D. M. Chickering, and S. T. Dumais. Here or there: Preference Judgments for Relevance. In Advances in Information Retrieval. Springer, 2008. Google ScholarDigital Library
- A. Dasgupta, R. Kumar, and S. Ravi. Summarization through submodularity and dispersion. In ACL, 2013.Google Scholar
- Q. X. Do, W. Lu, and D. Roth. Joint inference for event timeline construction. In EMNLP-CoNLL, 2012. Google ScholarDigital Library
- X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In SIGKDD, 2014. Google ScholarDigital Library
- M. Dubinko, R. Kumar, J. Magnani, J. Novak, P. Raghavan, and A. Tomkins. Visualizing tags over time. TWEB, 1(2):7, 2007. Google ScholarDigital Library
- U. Feige. A threshold of ln n for approximating set cover. Journal of the ACM (JACM), 45(4):634--652, 1998. Google ScholarDigital Library
- D. Graus, M.-H. Peetz, D. Odijk, O. de Rooij, and M. de Rijke. yourHistory--Semantic linking for a personalized timeline of historic events. Workshop: LinkedUp Challenge at OKCon, 2013.Google Scholar
- T. Huet, J. Biega, and F. M. Suchanek. Mining history with le monde. In AKBC, 2013. Google ScholarDigital Library
- H. Ji, T. Cassidy, Q. Li, and S. Tamang. Tackling representation, annotation and classification challenges for temporal knowledge base population. KAIS, 2013. Google ScholarDigital Library
- A. Kannan, S. Baker, K. Ramnath, J. Fiss, D. Lin, L. Vanderwende, R. Ansary, A. Kapoor, Q. Ke, M. Uyttendaele, et al. Mining text snippets for images on the web. In SIGKDD, 2014. Google ScholarDigital Library
- S. M. Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer. In IEEE Trans Sig. Process., 1987.Google ScholarCross Ref
- A. Krause and D. Golovin. Submodular function maximization. In Tractability: Practical Approaches to Hard Problems (to appear). Cambridge University Press, 2014.Google Scholar
- J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. In SIGKDD, 2007. Google ScholarDigital Library
- J. Li and C. Cardie. Timeline generation: tracking individuals on twitter. In WWW, 2014. Google ScholarDigital Library
- C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In Proc. ACL Text Summarization Workshop, 2004.Google Scholar
- H. Lin and J. A. Bilmes. Learning mixtures of submodular shells with application to document summarization. In UAI, 2012.Google ScholarDigital Library
- X. Ling and D. S. Weld. Temporal information extraction. In AAAI Conference on Artificial Intelligence, 2010.Google Scholar
- A. Mazeika, T. Tylenda, and G. Weikum. Entity timelines: Visual analytics and named entity evolution. In CIKM, 2011. Google ScholarDigital Library
- M. Minoux. Accelerated greedy algorithms for maximizing submodular set functions. In Optimization Techniques, pages 234--243. Springer, 1978.Google ScholarCross Ref
- G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions -- I. Mathematical Programming, 14(1):265--294, 1978.Google ScholarDigital Library
- R. Qian. Timeline: Understanding Important Events in People's Lives.small http://blogs.bing.com/search/2014/02/21/timeline-understanding-important-events-in-peoples-lives/, February 2014. Last retrieved on Feb 18, 2015.Google Scholar
- D. Shahaf, C. Guestrin, and E. Horvitz. Metro maps of science. In SIGKDD, 2012. Google ScholarDigital Library
- D. Shahaf, J. Yang, C. Suen, J. Jacobs, H. Wang, and J. Leskovec. Information cartography: creating zoomable, large-scale maps of information. In SIGKDD, 2013. Google ScholarDigital Library
- W. Shen, J. Wang, and J. Han. Entity linking with a knowledge base: Issues, techniques, and solutions. TKDE, 2015.Google ScholarCross Ref
- R. Sipos, A. Swaminathan, P. Shivaswamy, and T. Joachims. Temporal corpus summarization using submodular word coverage. In CIKM, 2012. Google ScholarDigital Library
- K. Spärck Jones. Automatic summarising: The state of the art. Information Processing & Management, 43(6):1449--1481, 2007. Google ScholarDigital Library
- F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In WWW, 2007. Google ScholarDigital Library
- F. M. Suchanek and N. Preda. Semantic Culturomics (Vision paper). In Very Large Databases (VLDB), 2014. Google ScholarDigital Library
- R. Swan and J. Allan. Automatic generation of overview timelines. In SIGIR, 2000. Google ScholarDigital Library
- T. Tran, A. Ceroni, M. Georgescu, K. D. Naini, and M. Fisichella. Wikipevent: Leveraging wikipedia edit history for event detection. In WISE. Springer, 2014.Google Scholar
- T. A. Tuan, S. Elbassuoni, N. Preda, and G. Weikum. CATE: Context-Aware Timeline for Entity Illustration. In WWW, 2011. Google ScholarDigital Library
- Y. Wang, M. Zhu, L. Qu, M. Spaniol, and G. Weikum. Timely Yago: harvesting, querying, and visualizing temporal knowledge from Wikipedia. In EDBT, 2010. Google ScholarDigital Library
- G. Weikum, N. Ntarmos, M. Spaniol, P. Triantafillou, A. A. Benczúr, S. Kirkpatrick, P. Rigaux, and M. Williamson. Longitudinal Analytics on Web Archive Data: It's About Time! In CIDR, 2011.Google Scholar
- X. W. Zhao, Y. Guo, R. Yan, Y. He, and X. Li. Timeline generation with social attention. In SIGIR, 2013. Google ScholarDigital Library
Index Terms
- TimeMachine: Timeline Generation for Knowledge-Base Entities
Recommendations
Timeline generation: tracking individuals on twitter
WWW '14: Proceedings of the 23rd international conference on World wide webIn this paper, we preliminarily learn the problem of reconstructing users' life history based on the their Twitter stream and proposed an unsupervised framework that create a chronological list for personal important events (PIE) of individuals. By ...
Timeline generation with social attention
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrievalTimeline generation is an important research task which can help users to have a quick understanding of the overall evolution of any given topic. It thus attracts much attention from research communities in recent years. Nevertheless, existing work on ...
Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data MiningDespite having various attractive qualities such as high prediction accuracy and the ability to quantify uncertainty and avoid over-fitting, Bayesian Matrix Factorization has not been widely adopted because of the prohibitive cost of inference. In this ...
Comments