Evolutionary features of academic articles co-keyword network and keywords co-occurrence network: Based on two-mode affiliation network

doi:10.1016/j.physa.2016.01.017

Physica A: Statistical Mechanics and its Applications

Volume 450, 15 May 2016, Pages 657-669

https://doi.org/10.1016/j.physa.2016.01.017 Get rights and content

Highlights

•
A novel method to grasp articles’ key points and relations from a holistic view.
•
An empirical study based on the two-mode affiliation network theory.
•
Integrates statistics, text mining, complex networks and visualization.
•
Constructed the articles co-keyword networks and keywords co-occurrence networks.
•
Defined innovation coefficient of the articles in annual level.

Abstract

Keeping abreast of trends in the articles and rapidly grasping a body of article’s key points and relationship from a holistic perspective is a new challenge in both literature research and text mining. As the important component, keywords can present the core idea of the academic article. Usually, articles on a single theme or area could share one or some same keywords, and we can analyze topological features and evolution of the articles co-keyword networks and keywords co-occurrence networks to realize the in-depth analysis of the articles. This paper seeks to integrate statistics, text mining, complex networks and visualization to analyze all of the academic articles on one given theme, complex network(s). All 5944 “complex networks” articles that were published between 1990 and 2013 and are available on the Web of Science are extracted. Based on the two-mode affiliation network theory, a new frontier of complex networks, we constructed two different networks, one taking the articles as nodes, the co-keyword relationships as edges and the quantity of co-keywords as the weight to construct articles co-keyword network, and another taking the articles’ keywords as nodes, the co-occurrence relationships as edges and the quantity of simultaneous co-occurrences as the weight to construct keyword co-occurrence network. An integrated method for analyzing the topological features and evolution of the articles co-keyword network and keywords co-occurrence networks is proposed, and we also defined a new function to measure the innovation coefficient of the articles in annual level. This paper provides a useful tool and process for successfully achieving in-depth analysis and rapid understanding of the trends and relationships of articles in a holistic perspective.

Introduction

Within the recent development and popularization of information and data analysis technology, computing large-scale data-intensive analysis of scientific data is a new trend of data-mining [1], and as one of the main aspects of data-mining, text mining has become a new method of knowledge discovery. Text mining is a useful tool for understanding the basic information provided by one or more texts through structured algorithms. However, it can also be used to determine the relationships among the textual elements and the texts themselves. The results can be used for knowledge discovery and other applications. As an important tool and method for knowledge discovery, text mining has been used in many fields, such as medicine [2], biochemistry [3], business [4], and so on. The objects of analysis in text mining include literatures [5], news [6], network information [7], long texts [8], etc. Various technologies [9], [10], [11], [12], [13], [14], [15] and tools [16], [17], [18], [19], [20], [21] are used in this field. Such technologies and tools have been enhanced not only to conduct single-text analysis but also to analyze big data and complexity.

Existing literatures indicate that one of the most frequently use of text mining methods is to conduct a literature review, which allows researchers to determine the developing trends in the field. Literature reviews are also fundamental in academic research. There are currently two ways to conduct a literature review. The first is to identify important academic articles by their citation frequency and the impact factors of the journals in which they were published [22]. This method is used to identify recent developments in a field during a short period. However, due to the limited sample, it is difficult to achieve a holistic perspective using this method. The other frequently used method is content analysis [23], a research technique that involves the systematic, objective and quantitative description of a text’s content. This method has recently received increased attention. Researchers can use content analysis to find multi-text statistics and clusters by delimiting a research object and establishing a quantification standard. However, it is difficult to maintain the consistency of the quantitative criteria because both the classification and coding rules are based on the knowledge and experience of the researchers, which undermines the objectivity of the analysis. Content analysis is also an inadequate method for mining the complex relationships among the texts.

There is an urgent need to develop a tool for tracking the trends in academic articles and rapidly understanding the key points and inner relationship of a collection of texts from a holistic perspective. Keywords, an important textual element, can provide a concise overview of the important content and key points of a body of articles. Keyword analysis can also expedite text mining [24], [25]. Many scholars use tag clouds to analyze unstructured keywords because this method allows the user to highlight the most significant concepts, which facilitates navigation and visualization [26]. However, tag clouds only show the frequency of single words and do not show the relationships of the keywords and the relationships between the articles based on the keywords. Unlike tag clouds, complex network is a young but active method to discover the inner relationship between different entities from real or virtual system. It is well used in different areas, such as economic networks, biological networks, and so on. It can effectively model a network’s topological features [27], [28], [29], mine its relationships [30], and analyze its evolution [28]. As a new frontier of complex network, multi-mode network has been shown to better represent reality according to its heterogeneous attributes, it has been successfully used in some other area, such as multi-mode societal ecological affiliation network [31], [32], [33], [34], [35], fibers transmission [36] and shareholding network of the listed companies [37].

In this paper, we study the patterns of relationships among academic articles on a given theme, complex network, from a holistic perspective by constructing and analyzing annual articles co-keyword equivalent networks (AENs for short) and annual keywords co-occurrence equivalent networks (KENs for short). The process of constructing the two different networks is the same as the one employed to construct equivalent networks [38] using the two-mode affiliation network. The topological features of the two networks in annual level and the evolution as well as the stability of the two networks are analyzed. Then, the innovation coefficient of the networks about the given theme, complex networks, is defined and analyzed.

Section snippets

Constructing the AENs and KENs

In this paper, affiliation relationships can be found between keywords and the articles in which they appear. Networks constructed according to affiliation relationships are a typical type of two-mode network called a member-network [39] or hyper-network [40]. The two-mode affiliation network is composed of a set of actors (keywords) and a set of events (articles) [41]. According to Wasserman [42] and Li et al. [38], when there are two nodes, $α$ and $β$ , that have the same relationship with $γ$

The visualization of the two different networks

The equivalent networks of each keyword and each paper were constructed based on co-keywords and keywords co-occurrence at the same period (year). The equivalent networks were then superimposed to form the AENs and KENs. Fig. 2, Fig. 3, Fig. 4 present the visualization results and the quantity of nodes and edges of AENs and KENs.

As the Fig. 2, Fig. 4 show, the relationships between the words become increasingly complex over time (the nodes with the same color mean they are more strongly

Discussion and conclusion

In order to gain the evolutionary features of a body of articles and their relations, in this paper, we used 5944 “complex networks” articles that were published between 1990 and 2013 as the sample. Based on the two-mode affiliation network theory, we constructed the AENs by taking the articles as nodes, the co-keyword relationships as edges and the quantity of co-keywords as weights and the KENs by taking the articles’ keywords as nodes, the co-occurrence relationships as edges and the

Acknowledgments

This research is supported by grants from the National Natural Science Foundation of China (Grant No. 71173199), the China Scholarship Council (File No. 201406400004), the Humanities and Social Sciences Planning Funds Project under the Ministry of Education of the PRC (Grant No. 10YJA630001), and the Fundamental Research Funds for the Central Universities (Grant No. 2-9-2014-104). The authors would like to express their gratitude to the reviewers and Xuan Huang, Xiaoqing Hao, Xiaoliang Jia,

References (50)

L. Wang et al.
G-Hadoop: MapReduce across distributed data centers for data-intensive computing
Future Gener. Comput. Syst.
(2013)
Z. Xu et al.
Knowle: a semantic link network based system for organizing large scale online news events
Future Gener. Comput. Syst.
(2015)
Z. Xu et al.
Mining temporal explicit and implicit semantic relations between entities using web search engines
Future Gener. Comput. Syst.
(2014)
Y.X. Wang et al.
Diagnosis and multi-modality treatment of adult pulmonary plastoma: Analysis of 18 cases and review of literature
Asian Pac. J. Trop. Med.
(2014)
H. An et al.
The role of fluctuating modes of autocorrelation in crude oil prices
Physica A
(2014)
W. Zhong et al.
The evolution of communities in the international oil trade network
Physica A
(2014)
H. Li et al.
On the topological properties of the cross-shareholding networks of listed companies in China: Taking shareholders’ cross-shareholding relationships into account
Physica A
(2014)
X.H. Xia et al.
Energy regulation in China: Objective selection, potential assessment and responsibility sharing by partial frontier analysis
Energy Policy
(2014)
Z.M. Chen et al.
Demand-driven energy requirement of world economy 2007: A multi-region input–output network simulation
Commun. Nonlinear Sci. Numer. Simul.
(2013)
G.Q. Chen et al.
Three-scale input–output modeling for urban economy: Carbon emission by Beijing 2007
Commun. Nonlinear Sci. Numer. Simul.
(2013)

X.H. Xia et al.

Energy security, efficiency and carbon emission of Chinese industry

Energy Policy

(2011)

H. Li et al.

The shareholding similarity of the shareholders of the worldwide listed energy companies based on a two-mode primitive network and a one-mode derivative holding-based network

Physica A

(2014)

J.M. McPherson

Hypernetwork sampling: Duality and differentiation among voluntary organizations

Soc. Networks

(1982)

H. Li et al.

The shareholding similarity of the shareholders of the worldwide listed energy companies based on a two-mode primitive network and a one-mode derivative holding-based network

Physica A

(2014)

C.A. Hidalgo et al.

The dynamics of a mobile phone network

Physica A

(2008)

P. Warrer et al.

Using textmining techniques in electronic patient records to identify ADRs from medicine use

Br. J. Clin. Pharmacol.

(2012)

M. Miwa et al.

A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text

Bioinformatics

(2013)

X. Gao, S. Murugesan, B. Lo, Extraction of keyterms by simple text mining for business information retrieval, in:...

A. Korhonen et al.

The first step in the development of text mining technology for cancer risk assessment: Identifying and organizing scientific evidence in risk assessment literature

BMC Bioinformatics

(2009)

P. Kroha, R. Baeza-Yates, B. Krellner, Text mining of business news for forecasting, in: Database and Expert Systems...

J. Švec et al.

Web text data mining for building large scale language modelling corpus

H. Takeuchi et al.

Context-based text mining for insights in long documents

A.G. Skarmeta et al.

Data mining for text categorization with semisupervised agglomerative hierarchical clustering

Int. J. Intell. Syst.

(2000)

W. Claster, S. Shanmuganathan, N. Ghotbi, Text mining in radiological data records: An unsupervised neural network...

C.H. Lee et al.

A self-adaptive clustering scheme with a time-decay function for microblogging text mining

Cited by (177)

Analyzing and mapping the current status, hotspots, and perspectives of lightweight cellular concrete: A bibliometric evaluation from 2000 to 2022
2024, Journal of Building Engineering
Lightweight cellular concrete (LCC) is being widely utilized in engineering due to its advantageous properties such as light weight, adjustable strength, and self-supporting capabilities after the curing process. However, there is a lack of scientometric studies on LCC and insufficient attention has been given to the overall trends in this field. This paper aims to analyze the existing literature on LCC to identify the current research focal points and provide insights into future research directions. The research and review literature on LCC in the Web of Science Core Collection database from 2000 to 2022 were analyzed using scientometric methods in R language and VOSviewer software. The findings were visually presented to analyze the annual distribution of research, author contributions and collaborations, influential institutions and countries/regions, co-citation patterns, highly cited literature, keyword co-occurrence, and research frontiers. This analysis aimed to summarize the general characteristics of research in the field of LCC during the specified period. The field of LCC still encounters numerous challenges related to sustainability, strength enhancement, development of new materials, and expansion of applications. Nevertheless, it also presents promising opportunities for growth. With extensive research and innovation, LCC is anticipated to emerge as a significant material in the construction and engineering domain in the future.
Ischemic stroke pathophysiology: A bibliometric and visualization analysis from 1990 to 2022
2024, Heliyon
Pathophysiology plays a significant role in the scientific study of ischemic stroke, and has attracted increasing interest from researchers in the field. However, a comprehensive bibliometric analysis is lacking in this field. The purpose of this study is to identify the current research status and hotspots of ischemic stroke pathophysiology from a bibliometric perspective.
The Web of Science Core Collection database was searched for articles published from 1990 to 2022. CiteSpace, VOSviewer, and R package “bibliometrix” software were used to analyze countries/regions, institutions, journals, authors, papers, and keywords to predict the latest trends in ischemic stroke pathophysiology research.
This analysis collected 7578 records of ischemic stroke pathophysiology. China and America emerged as the leading countries in this field, with Harvard University being the most active institution. Among journals and authors in this field, journal Stroke and author Gregory YH Lip published the most papers, while Nature Medicine was the journal with the highest citation per article. Keywords and co-citation clusters were closely related to “central nervous system”, “mechanisms”, “biochemistry & molecular biology” and “radiology, nuclear medicine & medical imaging”, while other related fields, such as peripheral organs damage induced by the central nervous system and rehabilitation after ischemic stroke, require further research efforts.
This is the first bibliometric study that comprehensively mapped out the knowledge structure and development trends of ischemic stroke pathophysiology in recent 32 years, which may provide a reference for scholars to explore ischemic stroke pathophysiology.
A data mining approach to analyze the role of biomacromolecules-based nanocomposites in sustainable packaging
2024, International Journal of Biological Macromolecules
Recent decades have witnessed a surge in research interest in bio-nanocomposite-based packaging materials, but still, a lack of systematic analysis exists in this domain. Bio-based packaging materials pose a sustainable alternative to petroleum-based packaging materials. The current work employs bibliometric analysis to deliver a comprehensive outline on the role of bio nanocomposites in packaging. India, Iran, and China were revealed to be the top three nations actively engaged in this domain in total publications. Islamic Azad University in Iran and Universiti Putra Malaysia in Malaysia are among the world's best institutions in active research and publications in this field. The extensive collaboration between nations and institutions highlights the significance of a holistic approach towards bio-nanocomposite. The National Natural Science Foundation of China is the leading funding body in this field of research. Among authors, Jong whan Rhim secured the topmost citations (2234) in this domain (13 publications). Among journals, Carbohydrate Polymers secured the maximum citation count (4629) from 36 articles; the initial one was published in 2011. Bio nanocomposite is the most frequently used keyword. Researchers and policymakers focussing on sustainable packaging solutions will gain crucial insights on the current research status on packaging solutions using bio-nanocomposites from the conclusions.
Citation counts prediction of statistical publications based on multi-layer academic networks via neural network model[Formula presented]
2024, Expert Systems with Applications
Citation counts is a crucial factor in evaluating the quality of research papers. Therefore, it is vital to accurately predict citation counts and explore the mechanisms underlying citations. In this study, we focus on predicting the citation counts in the field of statistics. We collect 55,024 academic papers published in 43 statistics journals between 2001 and 2018. Furthermore, we collect and clean a high-quality dataset and then construct multi-layer networks from different perspectives, including journal networks, author citation networks, co-citation networks, co-authorship networks, and keyword co-occurrence networks. Additionally, we extract 77 factors for citation counts prediction, including 22 traditional and 55 network-related factors. To address the issues of zero-inflated and over-dispersed citation counts, a neural network model is designed to achieve high prediction accuracy. Furthermore, we adopt a leave-one-feature-out approach to investigate the importance of these factors. The proposed neural network model achieves an MAE value of 7.352, which outperforms other machine learning models in the comparison. Thus, this study provides a useful guide for researchers to predict citation counts and can be easily extended to other research fields.
Systematic analysis of the blockchain in the energy sector: Trends, issues, and future directions
2024, Telecommunications Policy
The decentralized revolution being ushered in by blockchain, an emerging technology with the potential to bring significant innovation and benefits, has also piqued the interest of major players in the energy supply industry, financial institutions, technology developers, academia, and national governments. How the deep integration of blockchain and energy will impact energy’s prospects and the energy market remains an open question. This study conducts a composite of qualitative and quantitative analyses of 622 selected articles on blockchain in the energy field. It is dedicated to identifying feasible technical paths and potential themes. The results show that (1) Research in this field follows two main approaches: engineering- and management-oriented. (2) Management-oriented research has become mainstream. (3) Research has led to new transaction systems, and the commercialization of energy blockchains has driven research iterations. Thematic predictions call for more attention to the negative impacts of technology, such as new business risks, the problem of lagging regulations, and the drawbacks of high energy consumption. This study provides research references and practical suggestions for researchers, regulators, and enterprises.
Effects of curcumin on non-alcoholic fatty liver disease: A scientific metrogy study
2024, Phytomedicine
Non-alcoholic fatty liver disease (NAFLD) is one of the most common chronic liver diseases encountered in clinical practice. Curcumin can alleviate insulin resistance, inhibit oxidative stress response, reduce inflammation, reduce liver fat deposition, and effectively improve NAFLD through various modalities, inhibiting the progression into cirrhosis and fibrosis.
To explore the current status, hot spots, and developing trends of curcumin in NAFLD treatment through quantitative scientific analysis to serve as a reference for subsequent studies.
A comprehensive analysis of the mechanism of action of curcumin in the treatment of NAFLD and methods to increase curcumin bioavailability using bibliometric analysis and literature review.
This study used VOSviewer software to analyze the literature related to curcumin treatment of NAFLD in the Web of Science (WOS) core set database. A comprehensive and in-depth review was conducted based on the results of scientific econometric research and literature review.
The review observed that curcumin can activate various signaling pathways such as AMPK and NF-κB to inhibit oxidative stress and apoptosis, thereby reflecting its pharmacological effects: lowering lipid, anti-inflammatory, reducing insulin resistance, and anti-fibrosis. These mechanisms improve or even reverse the complex pathological features of lipid metabolism disorders associated with NAFLD. Curcumin also can potentially serve as a primary regulatory target for treating hepatic steatosis using gut microbiota. However, these pharmacological effects of curcumin were limited owing to its low bioavailability.
This review discusses NAFLD treatment with curcumin, analyzes the reasons for its low bioavailability, and introduces models for studying and methods for improving curcumin bioavailability. As research on NAFLD grows, future research should capture the trend of basic research, pay attention to clinical research, and continuously explore the therapeutic potential of curcumin.

View all citing articles on Scopus

View full text

Evolutionary features of academic articles co-keyword network and keywords co-occurrence network: Based on two-mode affiliation network

Highlights

Abstract

Introduction

Section snippets

Constructing the AENs and KENs

The visualization of the two different networks

Discussion and conclusion

Acknowledgments

Future Gener. Comput. Syst.

Future Gener. Comput. Syst.

Future Gener. Comput. Syst.

Asian Pac. J. Trop. Med.

Physica A

Physica A

Physica A

Energy Policy

Commun. Nonlinear Sci. Numer. Simul.

Commun. Nonlinear Sci. Numer. Simul.

Energy Policy

Physica A

Soc. Networks

Physica A

Physica A

Using textmining techniques in electronic patient records to identify ADRs from medicine use

Br. J. Clin. Pharmacol.

A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text

Bioinformatics

The first step in the development of text mining technology for cancer risk assessment: Identifying and organizing scientific evidence in risk assessment literature

BMC Bioinformatics

Web text data mining for building large scale language modelling corpus

Context-based text mining for insights in long documents

Data mining for text categorization with semisupervised agglomerative hierarchical clustering

Int. J. Intell. Syst.

A self-adaptive clustering scheme with a time-decay function for microblogging text mining