Skip to main content
Log in

Enhancing multi-document summarization using concepts

  • Published:
Sādhanā Aims and scope Submit manuscript

Abstract

In this paper we propose a methodology to mine concepts from documents and use these concepts to generate an objective summary of all relevant documents. We use the conceptual graph (CG) formalism as proposed by Sowa to represent the concepts and their relationships in the documents. In the present work we have modified and extended the definition of the concept given by Sowa. The modified and extended definition is discussed in detail in section 2 of this paper. A CG of a set of relevant documents can be considered as a semantic network. The semantic network is generated by automatically extracting CG for each document and merging them into one. We discuss (i) generation of semantic network using CGs and (ii) generation of multi-document summary. Here we use restricted Boltzmann machines, a deep learning technique, for automatically extracting CGs. We have tested our methodology using MultiLing 2015 corpus. We have obtained encouraging results, which are comparable to those from the state of the art systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6

Similar content being viewed by others

References

  1. Mani I 2001 Summarization evaluation: an overview. In: Proceedings of NTCIR

  2. Luhn H P 1958 The automatic creation of literature abstracts. IBM J. Res. Dev.  2(2): 159–165

    Article  MathSciNet  Google Scholar 

  3. Lin C Y and Hovy E H 2002 From single to multi-document summarization: a prototype system and its evaluation. In: Proceedings of ACL-2002, pp. 457–464

  4. Radev D, Jing H, Stys M and Tam D 2004 Centroid-based summarization of multiple documents. Inf. Process. Manage.  40: 919–938

    Article  MATH  Google Scholar 

  5. Kleinberg 1999 Authoritative sources in a hyperlinked environment. J. ACM  46(5): 604–632

    Article  MathSciNet  MATH  Google Scholar 

  6. Brin S and Page L 1998 The anatomy of a large scale hypertextual Web search engine. Comput. Netw. ISDN Syst.  30: 1–7

    Article  Google Scholar 

  7. Erkan G and Radev D 2004 Lexpagerank: prestige in multi-document text summarization. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, July

  8. Mihalcea R Graph-based ranking algorithms for sentence extraction, applied to text summarization. In: Proceedings of ACL 2004 on Interactive Poster and Demonstration Sessions (ACLdemo 2004), Barcelona, Spain

  9. Mihalcea R and Tarau P 2004 TextRank – bringing order into texts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain

  10. Mihalcea R, Tarau P and Figa E 2004 PageRank on semantic networks, with application to word sense disambiguation. In: Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland

  11. McKeownand K and Radev D 1995 Generating summaries of multiple news articles. In: Proceedings of the 18th Annual International ACM, Seattle, WA, pp. 74–82

  12. Virendra G and Tanveer J S 2012 Multi-document summarization using sentence clustering. In: IEEE Proceedings of the 4th International Conference on Intelligent Human–Computer Interaction, Kharagpur, India, pp. 314–318

  13. Sowa J F 1984 Conceptual structures, information processing in mind and machine. Addison Wesley, Boston, MA, USA

    MATH  Google Scholar 

  14. Edward E S and Douglas L M 1981 Categories and concepts Cambridge, Massachusetts–London, England: Harvard University Press

    Google Scholar 

  15. Sowa J F 1976 Conceptual graphs for a data base interface. IBM J. Res. Dev.  20(4): 336–357

    Article  MathSciNet  MATH  Google Scholar 

  16. Ivan A S, Baldwin T, Bond F, Copestake A and Flickinger D 2002 Multiword expressions: a pain in the neck for NLP. In: Proceedings of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2002), Mexico City, Mexico, pp. 1–15

  17. Brill E 1994 Some advances in transformation based part of speech tagging. In: Proceedings of the Twelfth International Conference on Artificial Intelligence (AAAI-94), Seattle, WA, pp. 722–727

  18. Ngai G and Florian R Transformation-based learning in the fast lane. In: Proceedings of NAACL’2001, Pittsburgh, PA, pp. 40–47

  19. Lafferty J, McCallum A and Pereira F 2001 Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning (ICML-2001), pp. 282–289

  20. Hinton G and Salakhutdinov R 2006 Reducing the dimensionality of data with neural networks. Science  313(5786): 504–507

    Article  MathSciNet  MATH  Google Scholar 

  21. Srivastava N, Salakhutdinov R R and Hinton G E 2013 Modeling documents with a deep Boltzmann machine. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI)

  22. Rao P R K and Lalitha Devi S 2015 Automatic identification of conceptual structures using deep Boltzmann machines. In: Proceedings of the Forum for Information Reterival and Evaluation, ACM DL, Gandhinagar, India, pp. 16–80

  23. Mikolov T, Chen K, Corrado G and Dean J 2013 Efficient estimation of word representations in vector space. In: Proceedings of the Workshop at ICLR

  24. Blum N 2001 A simplified realization of the Hopcroft–Karp approach to maximum matching in general graphs. Tech. Rep. 895549-CS, Computer Science Department, University of Bonn

  25. Hopcroft J E and Karp R M 1973 An n5/2 algorithm for maximum matchings in bipartite graphs. SIAM J. Comput.  2(4): 225–231, https://doi.org/10.1137/0202019

    Article  MathSciNet  MATH  Google Scholar 

  26. Giannakopoulos G, Kubina J, John M C, Steinberger J, Favre B, Kabadjov M, Kruschwitz U and Poesio M 2015 Multiling 2015: multilingual summarization of single and multi-documents, on-line fora, and call-center conversations. In: Proceedings of SIGDIAL, Prague, pp. 270–274

  27. Yang S Y and Soo V W 2012 Extract conceptual graphs from plain texts in patent claims. J. Eng. Appl. Artif. Intell.  25(4): 874–887

    Article  Google Scholar 

  28. Rao P R K, Lalitha Devi S and Rosso P 2013 Automatic identification of concepts and conceptual relations from patents using machine learning methods. In: Proceedings of the 10th International Conference on Natural Language Processing (ICON 2013), Noida, India

  29. Lin C Y 2004 ROUGE: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out, Barcelona, Spain

  30. Lin C Y and Hovy E 2003 Automatic evaluation of summaries using n-gram co-occurrence. In: Proceedings of the 2003 Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada, pp. 71–78

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Pattabhi R K Rao or S Lalitha Devi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rao, P.R.K., Lalitha Devi, S. Enhancing multi-document summarization using concepts. Sādhanā 43, 27 (2018). https://doi.org/10.1007/s12046-018-0789-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12046-018-0789-y

Keywords

Navigation