Top

Published in:

2019 | OriginalPaper | Chapter

Typicality-Based Across-Time Mapping of Entity Sets in Document Archives

Authors : Yijun Duan, Adam Jatowt, Sourav S. Bhowmick, Masatoshi Yoshikawa

Published in: Database Systems for Advanced Applications

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

News archives constitute a rich source of knowledge about the past societies. In order to effectively utilize such large and diverse accounts of the past, novel approaches need to be proposed. One of them is comparison of the past and present entities which can lay grounds for better comprehending the past and the present, as well as can support forecasting techniques. In this paper, we propose a novel research task of automatically generating across-time comparable entity pairs given two sets of entities, as well as we introduce an effective method to solve this task. The proposed model first applies the idea of typicality analysis to measure the representativeness of each entity. Then, it learns an orthogonal transformation between temporally distant entity collections. Finally, it generates a set of typical comparables based on a concise integer linear programming framework. We experimentally demonstrate the effectiveness of our method on the New York Times corpora through both qualitative and quantitative tests.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Selectivity Estimation on Set Containment Search

next chapter Unsupervised Entity Alignment Using Attribute Triples and Relation Triples

We experimentally set the value of \(\lambda \) to be 0.4 in Sec “Experiments”.

https://github.com/explosion/spaCy.

https://spacy.io/api/annotation#named-entities.

Bairi, R.B., Carman, M., Ramakrishnan, G.: On the evolution of Wikipedia: dynamics of categories and articles. In: AAAI (2015)

Berberich, K., Bedathur, S.J., Sozio, M., Weikum, G.: Bridging the terminology gap in web archive search. In: WebDB (2009)

Breiman, L., Meisel, W., Purcell, E.: Variable kernel estimates of multivariate densities. Technometrics 19(2), 135–144 (1977)CrossRef

Campos, R., Dias, G., Jorge, A.M., Jatowt, A.: Survey of temporal information retrieval and related applications. ACM Comput. Surv. (CSUR) 47(2), 15 (2015)

Chen, Y.N., Metze, F.: Two-layer mutually reinforced random walk for improved multi-party meeting summarization. In: 2012 IEEE SLT, pp. 461–466. IEEE (2012)

Dubois, D., Prade, H., Rossazza, J.P.: Vagueness, typicality, and uncertainty in class hierarchies. Int. J. Intell. Syst. 6(2), 167–183 (1991)CrossRef

Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)

Etzioni, O., et al.: Web-scale information extraction in knowitall: (preliminary results). In: Proceedings of the 13th WWW, pp. 100–110. ACM (2004)

Faruqui, M., Dyer, C.: Improving vector space word representations using multilingual correlation. In: EACL, pp. 462–471 (2014)

10.

Feldman, R., Fresco, M., Goldenberg, J., Netzer, O., Ungar, L.: Extracting product comparisons from discussion boards. In: Data Mining, ICDM 2007, pp. 469–474. IEEE (2007)

11.

Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)MathSciNetCrossRef

12.

Gurobi Optimization, Inc.: Gurobi optimizer reference manual (2016). http://www.gurobi.com

13.

Hamilton, W.L., Leskovec, J., Jurafsky, D.: Diachronic word embeddings reveal statistical laws of semantic change. arXiv preprint arXiv:1605.09096 (2016)

14.

Hua, M., Pei, J., Fu, A.W., Lin, X., Leung, H.F.: Efficiently answering top-k typicality queries on large databases. In: Proceedings of VLDB, pp. 890–901. VLDB Endowment (2007)

15.

Huang, X., Wan, X., Xiao, J.: Learning to find comparable entities on the web. In: Wang, X.S., Cruz, I., Delis, A., Huang, G. (eds.) WISE 2012. LNCS, vol. 7651, pp. 16–29. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35063-4_2CrossRef

16.

Jain, A., Pantel, P.: Identifying comparable entities on the web. In: Proceedings of the 18th ACM CIKM, pp. 1661–1664. ACM (2009)

17.

Jiang, Z., Ji, L., Zhang, J., Yan, J., Guo, P., Liu, N.: Learning open-domain comparable entity graphs from user search queries. In: Proceedings of the 22nd ACM CIKM, pp. 2339–2344. ACM (2013)

18.

Jindal, N., Liu, B.: Identifying comparative sentences in text documents. In: Proceedings of ACM SIGIR, pp. 244–251. ACM (2006)

19.

Jindal, N., Liu, B.: Mining comparative sentences and relations. In: AAAI, vol. 22, pp. 1331–1336 (2006)

20.

Kaluarachchi, A.C., Varde, A.S., Bedathur, S., Weikum, G., Peng, J., Feldman, A.: Incorporating terminology evolution for query translation in text retrieval with association rules. In: CIKM, pp. 1789–1792. ACM (2010)

21.

Kanhabua, N., Nørvåg, K.: Exploiting time-based synonyms in searching document archives. In: JCDL, pp. 79–88. ACM (2010)

22.

Li, S., Lin, C.Y., Song, Y.I., Li, Z.: Comparable entity mining from comparative questions. IEEE TKDE 25(7), 1498–1509 (2013)

23.

Lieberman, E., Michel, J.B., Jackson, J., Tang, T., Nowak, M.A.: Quantifying the evolutionary dynamics of language. Nature 449(7163), 713 (2007)CrossRef

24.

Liu, J., Wagner, E., Birnbaum, L.: Compare&contrast: using the web to discover comparable cases for news stories. In: Proceedings of the 16th WWW, pp. 541–550. ACM (2007)

25.

Lu, A., Wang, W., Bansal, M., Gimpel, K., Livescu, K.: Deep multilingual correlation for improved word embeddings. In: NAACL HLT, pp. 250–256 (2015)

26.

McCallum, A., Jensen, D.: A note on the unification of information extraction and data mining using conditional-probability, relational models (2003)

27.

Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

28.

Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168 (2013)

29.

Pagel, M., Atkinson, Q.D., Meade, A.: Frequency of word-use predicts rates of lexical evolution throughout indo-European history. Nature 449(7163), 717 (2007)CrossRef

30.

Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, May 2010. http://is.muni.cz/publication/884893/en

31.

Rodríguez, M.A., Egenhofer, M.J.: Determining semantic similarity among entity classes from different ontologies. IEEE TKDE 15(2), 442–456 (2003)

32.

Sandhaus, E.: The new york times annotated corpus overview, pp. 1–22. The New York Times Company, Research and Development (2008)

33.

Sarawagi, S., Cohen, W.W.: Semi-markov conditional random fields for information extraction. In: NIPS, pp. 1185–1192 (2005)

34.

Schönemann, P.H.: A generalized solution of the orthogonal procrustes problem. Psychometrika 31(1), 1–10 (1966)MathSciNetCrossRef

35.

Scott, D.W., Sain, S.R.: 9-multidimensional density estimation. Handb. Stat. 24, 229–261 (2005)CrossRef

36.

Smith, S.L., Turban, D.H., Hamblin, S., Hammerla, N.Y.: Offline bilingual word vectors, orthogonal transformations and the inverted softmax. arXiv preprint arXiv:1702.03859 (2017)

37.

Tahmasebi, N., Gossen, G., Kanhabua, N., Holzmann, H., Risse, T.: NEER: an unsupervised method for named entity evolution recognition. COLING, pp. 2553–2568 (2012)

38.

Tamma, V., Bench-Capon, T.: An ontology model to facilitate knowledge-sharing in multi-agent systems. Knowl. Eng. Rev. 17(1), 41–60 (2002)CrossRef

39.

Wan, X., Yang, J.: Multi-document summarization using cluster-based link analysis. In: Proceedings of ACM SIGIR, pp. 299–306. ACM (2008)

40.

Xing, C., Wang, D., Liu, C., Lin, Y.: Normalized word embedding and orthogonal transform for bilingual word translation. In: NAACL HLT, pp. 1006–1011 (2015)

41.

Yu, H.T., et al.: A concise integer linear programming formulation for implicit search result diversification. In: Proceedings of the Tenth ACM WSDM, pp. 191–200. ACM (2017)

42.

Zhang, Y., Jatowt, A., Bhowmick, S., Tanaka, K.: Omnia mutantur, nihil interit: Connecting past with present by finding corresponding terms across time. In: ACL, vol. 1, pp. 645–655 (2015)

Title: Typicality-Based Across-Time Mapping of Entity Sets in Document Archives
Authors: Yijun Duan
Adam Jatowt
Sourav S. Bhowmick
Masatoshi Yoshikawa
Publisher: Springer International Publishing
Book: Database Systems for Advanced Applications
Print ISBN: 978-3-030-18575-6

Electronic ISBN: 978-3-030-18576-3

Copyright Year: 2019
DOI: https://doi.org/10.1007/978-3-030-18576-3_21

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner