ABSTRACT
This paper studies the importance of identifying and categorizing scientific concepts as a way to achieve a deeper understanding of the research literature of a scientific community. To reach this goal, we propose an unsupervised bootstrapping algorithm for identifying and categorizing mentions of concepts. We then propose a new clustering algorithm that uses citations' context as a way to cluster the extracted mentions into coherent concepts. Our evaluation of the algorithms against gold standards shows significant improvement over state-of-the-art results. More importantly, we analyze the computational linguistic literature using the proposed algorithms and show four different ways to summarize and understand the research community which are difficult to obtain using existing techniques.
- D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. JMLR, 2003. Google ScholarDigital Library
- D. M. Blei and J. D. Lafferty. Dynamic topic models. In ICML, 2006. Google ScholarDigital Library
- M. Collins and Y. Singer. Unsupervised models for named entity classification. In EMNLP, 1999.Google Scholar
- T. L. Griffiths and M. Steyvers. Finding scientific topics. Proc Natl Acad Sci U S A, 2004.Google ScholarCross Ref
- S. Gupta and C. D. Manning. Analyzing the dynamics of research by extracting key aspects of scientific papers. In IJCNLP, 2011.Google Scholar
- R. Huang and E. Riloff. Inducing domain-specific semantic class taggers from (almost) nothing. In ACL, 2010. Google ScholarDigital Library
- M. Meilă. Comparing clusterings by the variation of information. Learning theory and kernel machines, 2003.Google Scholar
- C. Niu, W. Li, J. Ding, and R. K. Srihari. A bootstrapping approach to named entity classification using successive learners. In ACL, 2003. Google ScholarDigital Library
- X.-H. Phan and C.-T. Nguyen. Gibbslda: A C/C implementation of latent dirichlet allocation (LDA). 2007.Google Scholar
- V. Punyakanok and D. Roth. The use of classifiers in sequential inference. In NIPS, 2001.Google ScholarDigital Library
- D. Radev and A. Abu-Jbara. Rediscovering acl discoveries through the lens of acl anthology network citing sentences. In ACL, 2012. Google ScholarDigital Library
- D. R. Radev, M. T. Joseph, B. Gibson, and P. Muthukrishnan. A bibliometric and network analysis of the field of computational linguistics. JASIST, 2009.Google Scholar
- D. R. Radev, P. Muthukrishnan, and V. Qazvinian. The ACL anthology network corpus. In Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, 2009. Google ScholarDigital Library
- E. Riloff and J. Shepherd. A corpus-based approach for building semantic lexicons. In EMNLP, 1997.Google Scholar
- B. Roark and E. Charniak. Noun-phrase co-occurrence statistics for semiautomatic semantic lexicon construction. In ACL, 1998. Google ScholarDigital Library
- Y. Sim, N. A. Smith, and D. A. Smith. Discovering factions in the computational linguistics community. In ACL, 2012. Google ScholarDigital Library
- M. Thelen and E. Riloff. A bootstrapping method for learning semantic lexicons using extraction pattern contexts. In EMNLP, 2002. Google ScholarDigital Library
- X. Wang, A. McCallum, and X. Wei. Topical n-grams: Phrase and topic discovery, with an application to information retrieval. In ICDM, 2007. Google ScholarDigital Library
- R. Yangarber, W. Lin, and R. Grishman. Unsupervised learning of generalized names. In ACL, 2002. Google ScholarDigital Library
- D. Yarowsky. Unsupervised word sense disambiguation rivaling supervied methods. In ACL, 1995. Google ScholarDigital Library
Index Terms
- Concept-based analysis of scientific literature
Recommendations
Bibliometric analysis of fracking scientific literature
This study uses bibliometric methods to analyze the scientific literature of fracking. Web of Science database, including the Science Citation Index, Sciences Citation Index and Conference Proceedings Citation Index--Science were used to collect the ...
Measuring social media activity of scientific literature: an exhaustive comparison of scopus and novel altmetrics big data
This paper measures social media activities of 15 broad scientific disciplines indexed in Scopus database using Altmetric.com data. First, the presence of Altmetric.com data in Scopus database is investigated, overall and across disciplines. Second, a ...
Scientometric analysis of Iraqi-Kurdistan universities' scientific productivity
Purpose - This purpose of this study is to examine research performance of Iraqi-Kurdistan universities, using the number of papers appearing in journals and proceedings, and the number of citations received by those papers as covered by Scopus, 1970-...
Comments