Abstract
In this paper we propose a methodology to mine concepts from documents and use these concepts to generate an objective summary of all relevant documents. We use the conceptual graph (CG) formalism as proposed by Sowa to represent the concepts and their relationships in the documents. In the present work we have modified and extended the definition of the concept given by Sowa. The modified and extended definition is discussed in detail in section 2 of this paper. A CG of a set of relevant documents can be considered as a semantic network. The semantic network is generated by automatically extracting CG for each document and merging them into one. We discuss (i) generation of semantic network using CGs and (ii) generation of multi-document summary. Here we use restricted Boltzmann machines, a deep learning technique, for automatically extracting CGs. We have tested our methodology using MultiLing 2015 corpus. We have obtained encouraging results, which are comparable to those from the state of the art systems.
Similar content being viewed by others
References
Mani I 2001 Summarization evaluation: an overview. In: Proceedings of NTCIR
Luhn H P 1958 The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2): 159–165
Lin C Y and Hovy E H 2002 From single to multi-document summarization: a prototype system and its evaluation. In: Proceedings of ACL-2002, pp. 457–464
Radev D, Jing H, Stys M and Tam D 2004 Centroid-based summarization of multiple documents. Inf. Process. Manage. 40: 919–938
Kleinberg 1999 Authoritative sources in a hyperlinked environment. J. ACM 46(5): 604–632
Brin S and Page L 1998 The anatomy of a large scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30: 1–7
Erkan G and Radev D 2004 Lexpagerank: prestige in multi-document text summarization. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, July
Mihalcea R Graph-based ranking algorithms for sentence extraction, applied to text summarization. In: Proceedings of ACL 2004 on Interactive Poster and Demonstration Sessions (ACLdemo 2004), Barcelona, Spain
Mihalcea R and Tarau P 2004 TextRank – bringing order into texts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain
Mihalcea R, Tarau P and Figa E 2004 PageRank on semantic networks, with application to word sense disambiguation. In: Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland
McKeownand K and Radev D 1995 Generating summaries of multiple news articles. In: Proceedings of the 18th Annual International ACM, Seattle, WA, pp. 74–82
Virendra G and Tanveer J S 2012 Multi-document summarization using sentence clustering. In: IEEE Proceedings of the 4th International Conference on Intelligent Human–Computer Interaction, Kharagpur, India, pp. 314–318
Sowa J F 1984 Conceptual structures, information processing in mind and machine. Addison Wesley, Boston, MA, USA
Edward E S and Douglas L M 1981 Categories and concepts Cambridge, Massachusetts–London, England: Harvard University Press
Sowa J F 1976 Conceptual graphs for a data base interface. IBM J. Res. Dev. 20(4): 336–357
Ivan A S, Baldwin T, Bond F, Copestake A and Flickinger D 2002 Multiword expressions: a pain in the neck for NLP. In: Proceedings of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2002), Mexico City, Mexico, pp. 1–15
Brill E 1994 Some advances in transformation based part of speech tagging. In: Proceedings of the Twelfth International Conference on Artificial Intelligence (AAAI-94), Seattle, WA, pp. 722–727
Ngai G and Florian R Transformation-based learning in the fast lane. In: Proceedings of NAACL’2001, Pittsburgh, PA, pp. 40–47
Lafferty J, McCallum A and Pereira F 2001 Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning (ICML-2001), pp. 282–289
Hinton G and Salakhutdinov R 2006 Reducing the dimensionality of data with neural networks. Science 313(5786): 504–507
Srivastava N, Salakhutdinov R R and Hinton G E 2013 Modeling documents with a deep Boltzmann machine. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI)
Rao P R K and Lalitha Devi S 2015 Automatic identification of conceptual structures using deep Boltzmann machines. In: Proceedings of the Forum for Information Reterival and Evaluation, ACM DL, Gandhinagar, India, pp. 16–80
Mikolov T, Chen K, Corrado G and Dean J 2013 Efficient estimation of word representations in vector space. In: Proceedings of the Workshop at ICLR
Blum N 2001 A simplified realization of the Hopcroft–Karp approach to maximum matching in general graphs. Tech. Rep. 895549-CS, Computer Science Department, University of Bonn
Hopcroft J E and Karp R M 1973 An n5/2 algorithm for maximum matchings in bipartite graphs. SIAM J. Comput. 2(4): 225–231, https://doi.org/10.1137/0202019
Giannakopoulos G, Kubina J, John M C, Steinberger J, Favre B, Kabadjov M, Kruschwitz U and Poesio M 2015 Multiling 2015: multilingual summarization of single and multi-documents, on-line fora, and call-center conversations. In: Proceedings of SIGDIAL, Prague, pp. 270–274
Yang S Y and Soo V W 2012 Extract conceptual graphs from plain texts in patent claims. J. Eng. Appl. Artif. Intell. 25(4): 874–887
Rao P R K, Lalitha Devi S and Rosso P 2013 Automatic identification of concepts and conceptual relations from patents using machine learning methods. In: Proceedings of the 10th International Conference on Natural Language Processing (ICON 2013), Noida, India
Lin C Y 2004 ROUGE: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out, Barcelona, Spain
Lin C Y and Hovy E 2003 Automatic evaluation of summaries using n-gram co-occurrence. In: Proceedings of the 2003 Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada, pp. 71–78
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Rao, P.R.K., Lalitha Devi, S. Enhancing multi-document summarization using concepts. Sādhanā 43, 27 (2018). https://doi.org/10.1007/s12046-018-0789-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12046-018-0789-y