Evolutionary features of academic articles co-keyword network and keywords co-occurrence network: Based on two-mode affiliation network

https://doi.org/10.1016/j.physa.2016.01.017Get rights and content

Highlights

  • A novel method to grasp articles’ key points and relations from a holistic view.

  • An empirical study based on the two-mode affiliation network theory.

  • Integrates statistics, text mining, complex networks and visualization.

  • Constructed the articles co-keyword networks and keywords co-occurrence networks.

  • Defined innovation coefficient of the articles in annual level.

Abstract

Keeping abreast of trends in the articles and rapidly grasping a body of article’s key points and relationship from a holistic perspective is a new challenge in both literature research and text mining. As the important component, keywords can present the core idea of the academic article. Usually, articles on a single theme or area could share one or some same keywords, and we can analyze topological features and evolution of the articles co-keyword networks and keywords co-occurrence networks to realize the in-depth analysis of the articles. This paper seeks to integrate statistics, text mining, complex networks and visualization to analyze all of the academic articles on one given theme, complex network(s). All 5944 “complex networks” articles that were published between 1990 and 2013 and are available on the Web of Science are extracted. Based on the two-mode affiliation network theory, a new frontier of complex networks, we constructed two different networks, one taking the articles as nodes, the co-keyword relationships as edges and the quantity of co-keywords as the weight to construct articles co-keyword network, and another taking the articles’ keywords as nodes, the co-occurrence relationships as edges and the quantity of simultaneous co-occurrences as the weight to construct keyword co-occurrence network. An integrated method for analyzing the topological features and evolution of the articles co-keyword network and keywords co-occurrence networks is proposed, and we also defined a new function to measure the innovation coefficient of the articles in annual level. This paper provides a useful tool and process for successfully achieving in-depth analysis and rapid understanding of the trends and relationships of articles in a holistic perspective.

Introduction

Within the recent development and popularization of information and data analysis technology, computing large-scale data-intensive analysis of scientific data is a new trend of data-mining  [1], and as one of the main aspects of data-mining, text mining has become a new method of knowledge discovery. Text mining is a useful tool for understanding the basic information provided by one or more texts through structured algorithms. However, it can also be used to determine the relationships among the textual elements and the texts themselves. The results can be used for knowledge discovery and other applications. As an important tool and method for knowledge discovery, text mining has been used in many fields, such as medicine  [2], biochemistry  [3], business  [4], and so on. The objects of analysis in text mining include literatures  [5], news  [6], network information  [7], long texts  [8], etc. Various technologies  [9], [10], [11], [12], [13], [14], [15] and tools  [16], [17], [18], [19], [20], [21] are used in this field. Such technologies and tools have been enhanced not only to conduct single-text analysis but also to analyze big data and complexity.

Existing literatures indicate that one of the most frequently use of text mining methods is to conduct a literature review, which allows researchers to determine the developing trends in the field. Literature reviews are also fundamental in academic research. There are currently two ways to conduct a literature review. The first is to identify important academic articles by their citation frequency and the impact factors of the journals in which they were published  [22]. This method is used to identify recent developments in a field during a short period. However, due to the limited sample, it is difficult to achieve a holistic perspective using this method. The other frequently used method is content analysis  [23], a research technique that involves the systematic, objective and quantitative description of a text’s content. This method has recently received increased attention. Researchers can use content analysis to find multi-text statistics and clusters by delimiting a research object and establishing a quantification standard. However, it is difficult to maintain the consistency of the quantitative criteria because both the classification and coding rules are based on the knowledge and experience of the researchers, which undermines the objectivity of the analysis. Content analysis is also an inadequate method for mining the complex relationships among the texts.

There is an urgent need to develop a tool for tracking the trends in academic articles and rapidly understanding the key points and inner relationship of a collection of texts from a holistic perspective. Keywords, an important textual element, can provide a concise overview of the important content and key points of a body of articles. Keyword analysis can also expedite text mining  [24], [25]. Many scholars use tag clouds to analyze unstructured keywords because this method allows the user to highlight the most significant concepts, which facilitates navigation and visualization  [26]. However, tag clouds only show the frequency of single words and do not show the relationships of the keywords and the relationships between the articles based on the keywords. Unlike tag clouds, complex network is a young but active method to discover the inner relationship between different entities from real or virtual system. It is well used in different areas, such as economic networks, biological networks, and so on. It can effectively model a network’s topological features  [27], [28], [29], mine its relationships  [30], and analyze its evolution  [28]. As a new frontier of complex network, multi-mode network has been shown to better represent reality according to its heterogeneous attributes, it has been successfully used in some other area, such as multi-mode societal ecological affiliation network  [31], [32], [33], [34], [35], fibers transmission  [36] and shareholding network of the listed companies  [37].

In this paper, we study the patterns of relationships among academic articles on a given theme, complex network, from a holistic perspective by constructing and analyzing annual articles co-keyword equivalent networks (AENs for short) and annual keywords co-occurrence equivalent networks (KENs for short). The process of constructing the two different networks is the same as the one employed to construct equivalent networks  [38] using the two-mode affiliation network. The topological features of the two networks in annual level and the evolution as well as the stability of the two networks are analyzed. Then, the innovation coefficient of the networks about the given theme, complex networks, is defined and analyzed.

Section snippets

Constructing the AENs and KENs

In this paper, affiliation relationships can be found between keywords and the articles in which they appear. Networks constructed according to affiliation relationships are a typical type of two-mode network called a member-network  [39] or hyper-network  [40]. The two-mode affiliation network is composed of a set of actors (keywords) and a set of events (articles)  [41]. According to Wasserman  [42] and Li et al.  [38], when there are two nodes, α and β, that have the same relationship with γ

The visualization of the two different networks

The equivalent networks of each keyword and each paper were constructed based on co-keywords and keywords co-occurrence at the same period (year). The equivalent networks were then superimposed to form the AENs and KENs. Fig. 2, Fig. 3, Fig. 4 present the visualization results and the quantity of nodes and edges of AENs and KENs.

As the Fig. 2, Fig. 4 show, the relationships between the words become increasingly complex over time (the nodes with the same color mean they are more strongly

Discussion and conclusion

In order to gain the evolutionary features of a body of articles and their relations, in this paper, we used 5944 “complex networks” articles that were published between 1990 and 2013 as the sample. Based on the two-mode affiliation network theory, we constructed the AENs by taking the articles as nodes, the co-keyword relationships as edges and the quantity of co-keywords as weights and the KENs by taking the articles’ keywords as nodes, the co-occurrence relationships as edges and the

Acknowledgments

This research is supported by grants from the National Natural Science Foundation of China (Grant No. 71173199), the China Scholarship Council (File No. 201406400004), the Humanities and Social Sciences Planning Funds Project under the Ministry of Education of the PRC (Grant No. 10YJA630001), and the Fundamental Research Funds for the Central Universities (Grant No. 2-9-2014-104). The authors would like to express their gratitude to the reviewers and Xuan Huang, Xiaoqing Hao, Xiaoliang Jia,

References (50)

  • X.H. Xia et al.

    Energy security, efficiency and carbon emission of Chinese industry

    Energy Policy

    (2011)
  • H. Li et al.

    The shareholding similarity of the shareholders of the worldwide listed energy companies based on a two-mode primitive network and a one-mode derivative holding-based network

    Physica A

    (2014)
  • J.M. McPherson

    Hypernetwork sampling: Duality and differentiation among voluntary organizations

    Soc. Networks

    (1982)
  • H. Li et al.

    The shareholding similarity of the shareholders of the worldwide listed energy companies based on a two-mode primitive network and a one-mode derivative holding-based network

    Physica A

    (2014)
  • C.A. Hidalgo et al.

    The dynamics of a mobile phone network

    Physica A

    (2008)
  • P. Warrer et al.

    Using textmining techniques in electronic patient records to identify ADRs from medicine use

    Br. J. Clin. Pharmacol.

    (2012)
  • M. Miwa et al.

    A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text

    Bioinformatics

    (2013)
  • X. Gao, S. Murugesan, B. Lo, Extraction of keyterms by simple text mining for business information retrieval, in:...
  • A. Korhonen et al.

    The first step in the development of text mining technology for cancer risk assessment: Identifying and organizing scientific evidence in risk assessment literature

    BMC Bioinformatics

    (2009)
  • P. Kroha, R. Baeza-Yates, B. Krellner, Text mining of business news for forecasting, in: Database and Expert Systems...
  • J. Švec et al.

    Web text data mining for building large scale language modelling corpus

  • H. Takeuchi et al.

    Context-based text mining for insights in long documents

  • A.G. Skarmeta et al.

    Data mining for text categorization with semisupervised agglomerative hierarchical clustering

    Int. J. Intell. Syst.

    (2000)
  • W. Claster, S. Shanmuganathan, N. Ghotbi, Text mining in radiological data records: An unsupervised neural network...
  • C.H. Lee et al.

    A self-adaptive clustering scheme with a time-decay function for microblogging text mining

  • Cited by (177)

    View all citing articles on Scopus
    View full text