1 Introduction

Rail networks have been one of the prime movers of world economies since the industrial revolution [1]. People and goods have been transported by rail systems for centuries. In built environments, rail networks in the form of underground systems or elevated systems have become a symbol of modernization and commercialization [2] and play an important role in local public transport. Hundreds of millions passengers commute in rolling stocks a day in cities and between cities [3, 4]. In fact, most rail networks were built because there were more than enough passengers to be carried by buses, coaches, and private cars on road surfaces. That forced governments to build mass transit rail networks from one location to another location, and so on to ease traffic jams and enhance the flow of people and commerce [5, 6]. When a rail network is designed, town planners and traffic consultants identify the station locations based on the existing and projected populations and estimate traffic flow, i.e., how many passengers from location A to location B during the peak hours. From that, transport engineers choose a particular type of rolling stocks that meet the projected capacity [7]. However, urbanization will always lead to more people moving into the city and new residential and commercial areas will be developed, the rail development process will then be re-iterated and new rail networks need to be designed and linked to the existing network [8].

In recent years, China has spent several trillion yuan to construct inter-city high-speed rails and to expand metro-systems or undergrounds in major cities such as Shenzhen, Guangzhou, Shanghai, Beijing, etc. [3, 9] According to the official Xinhua News Agency [10], China invested 745.5 billion yuan (USD115 billion) in building railways in 2011 alone. As at the end of 2010, China had already taken the lead with more than 7000 km of high-speed railways in service in the world [9]. As the inter-city high-speed railway system that is a complex system involving engineering, social, and economic factors is affected by climatic and human factors, Ning et al. [9] developed parallel control and management system that incorporates artificial systems, computational experiments, and parallel execution (ACP) for rail planning and management. This ACP method and other complex network theories have also been applied to study system behaviors of urban rail transportation systems [1113]. Nevertheless, the recent failures in some existing urban rail systems and their station facilities in Hong Kong, Shenzhen, Guangzhou, Shanghai, and Beijing have caused great concerns on how reliable urban rail systems are, particularly due to heavy loadings on station facilities. For example, Hong Kong’s urban rail system (MTR system) carries over 70,000 passengers per hour per direction during peak hours.

To address such concerns, this paper describes an alternative approach that has five centrality measures (following the suggestions from Ning et al. [9, 11, 12] who advocate the interaction between humans and engineering systems). Centrality measures have been developed in the social network analysis community since the 1950s. These measures can be applied to man-made networks, such as rail and road networks, and provide significant insights about the importance of rail stations and road intersections from different perspectives. In studying the city’s network structure, researchers [1416] transformed a city to a spatial network by treating street interactions as nodes and streets as edges. It was suggested that different centrality measures capture different aspects of city life [1416]. Ramli et al. [17] demonstrated that one of the centrality measures, known as betweenness centrality, was statistically significantly correlated with passenger ridership data of Singapore’s rapid transit system while Tu [18] reported that another centrality measure—closeness centrality was closely related to the operational condition of a rail line in an urban rail network. Nevertheless, the above articles were either published in physics journals [14, 15] or conference proceedings [1618]. In fact, centrality has been mentioned sporadically in transportation-related journals in which centrality measures were used to locate the most accessible route in a network [19, 20], the shortest path of a network [21], and design the best shape for a crossdock [22].

The remainder of the paper is structured as follows. First, five centrality measures and their relationships with physical man-made network structures including rail systems are introduced in Sect. 2. Next, Hong Kong’s MTR system, as one of the most advanced urban rail systems [23, 24], was used as a case study for illustrating the applicability of centrality measures to strategic facility management and risk management. Results were compared with some ridership data. Section 4 concludes the paper with implications and future research directions.

2 Network Analysis, Centrality, and Its Applications

A social network refers to a social structure that is made up of individuals, organizations, communities, or towns which are connected by one or more specific types of interdependency, such as friendship, financial exchange, social exchange, product exchange, and/or resource exchange.

Freeman [25] characterizes that social network analysis is indeed (i) a theory—a way of looking at the world or a social web, and (ii) a methodology—a set of techniques for making sense of a complex world, web, or structural network. Social network analysis considers relationships (called edges, links, or connections) between individuals or communities (called vertices or nodes) as directional/bidirectional. The resulting graph-based structure is very simple when only a few individuals or communities are connected. However, the structure can be very complex when different kinds of ties interconnect a large number of individuals or communities [26].

To understand the role that each actor (individual, community, or location) plays in a structural network, a number of centrality measures have been proposed [2733]. Specifically, Freeman [34] reviewed the concepts of point and graph centrality and explained three measures of centrality, namely degree centrality, closeness centrality, and betweenness centrality, in great detail. He suggested that degree centrality can be viewed as the importance index of a node for its potential to participate in the communication activity. Betweenness centrality can be used as an index of the potential of a node for control of communication. On the other hand, closeness centrality can be viewed as a node’s independence of such potential control by others [35]. Freeman’s betweenness and closeness normally assume that whatever flows through the network moves only along the shortest possible paths, i.e., geodesic paths.

Bonacich [38] proposed eigenvector centrality that measures influence propagation in a network structure. Borgatti [36] explained that eigenvector centrality is similar to degree centrality, “the difference being that eigenvector centrality measures a long-term direct and indirect risk (or influence) while degree centrality measures immediate risk (influence) only” (p. 62). Freeman et al. [31] relaxed the shortest possible path criterion and proposed flow betweenness to measure the centrality effect due to all proper paths in which no node is visited more than once. Brin and Page [29] produced a variant of eigenvector centrality—PageRank and used it to identify the relative importance of a webpage in the World Wide Web, now known as Google’s search.

2.1 Centrality Measures

Following Freeman’s [34] terminology, we begin by presenting the mathematical formulation of the simplest measure of centrality—degree centrality. According to Freeman [34], degree centrality is the count of edges connected to a given node, p k :

$$ C_{\text{D}} \left( {p_{k} } \right) = \sum\limits_{i = 1}^{n} {a\left( {p_{i} ,p_{k} } \right)}, $$
(1)

where \( a\left( {p_{i} ,p_{k} } \right) \) = 1 if and only if p i and p k are connected by a line; 0 otherwise; and n is the total number of nodes. Degree centrality can also be calculated as the row sum (or column sum) of the adjacency matrix A as follows:

$$ C_{\text{D}} \left( {p_{k} } \right) = \sum\limits_{i = 1}^{n} {a_{ik} }, $$
(2)

where a ik is the (i, k) element of matrix A. \( C_{\text{D}} \left( {p_{k} } \right) \) is large if node p k is adjacent to (i.e., in direct contact with) a large number of nodes, and small if p k tends to be cut off from such direct contact. The magnitude of \( C_{\text{D}} \left( {p_{k} } \right) \) is normally dependent on the size of the network. Freeman [34] showed that for a given network, the maximum number of a node can at most be adjacent to n-1 other nodes, such as a star or wheel configuration. Therefore, the relative degree centrality can be written as

$$ C_{\text{D}}^{{\prime }} \left( {p_{k} } \right) = \frac{{\sum\nolimits_{i = 1}^{n} {a\left( {p_{i} ,p_{k} } \right)} }}{n - 1}. $$
(3)

The second centrality measure is closeness centrality. Sabidussi [33] suggested that the centrality of a node is measured by summing the geodesic distances from that node to all other nodes in the network. In fact, this is a measure of node decentrality or inverse centrality since it grows as points are far apart. If one lets \( d\left( {p_{i} ,p_{k} } \right) \) equal to the number of edges in the geodesic linking p i and p k , then closeness centrality \( C_{\text{C}} \left( {p_{k} } \right) \) that is the inverse of Sabidussi’s measure of the decentrality is written as

$$ C_{\text{C}} \left( {p_{k} } \right) = \frac{1}{{\sum\nolimits_{i = 1}^{n} {d\left( {p_{i} ,p_{k} } \right)} }}. $$
(4)

As the case for degree centrality, closeness centrality is dependent on the size of the network. Hence, one cannot compare values of closeness centrality from networks of different sizes. Freeman [34] showed that the relative closeness centrality of a node p k be defined as

$$ C_{\text{C}}^{{\prime }} \left( {p_{k} } \right) = \frac{n - 1}{{\sum\nolimits_{i = 1}^{n} {d\left( {p_{i} ,p_{k} } \right)} }}. $$
(5)

\( C_{C}^{'} \left( {p_{k} } \right) \) can be viewed as the inverse of the average distance between p k and the other nodes normalized by the minimum sum of distances \( \left( {n - 1} \right) \). Thus, it is a direct measure of distance-based node centrality. It takes a value of unity when p k is maximally close to all other points such as the central node of a star or wheel configuration. It shrinks as the average distance between p k and other nodes grows.

The third centrality measure is Freeman’s betweenness centrality. If one assumes two nodes p i and p j to be indifferent with respect to which of several alternative geodesics through them they are connected, the probability of using one of the geodesics is \( \frac{1}{{g_{ij} }} \), where g ij is the number of geodesics linking p i and p j . If g ij (k) is the number of geodesics linking p i and p j that contains p k , the probability b ij (k) that p k falls on the geodesics linking p i and p j is given as

$$ b_{ij} \left( k \right) = \frac{{g_{ij} \left( k \right)}}{{g_{ij} }}. $$
(6)

This value is the partial betweenness of p k for the pair of p i and p j . To determine the betweenness centrality of the node p k , one can sum its partial betweenness values for all unordered pairs of nodes where \( i \ne j \ne k \) as follows:

$$ C_{\text{B}} \left( {p_{k} } \right) = \sum\limits_{i = 1}^{n} {\sum\limits_{j = i + 1}^{n} {b_{ij} \left( k \right)} } = \sum\limits_{i = 1}^{n} {\sum\limits_{j = i + 1}^{n} {\frac{{g_{ij} \left( k \right)}}{{g_{ij} }}} }. $$
(7)

Like \( C_{\text{D}} \left( {p_{k} } \right) \) and \( C_{\text{C}} \left( {p_{k} } \right) \), betweenness centrality \( C_{\text{B}} \left( {p_{k} } \right) \) is dependent on the size of the network. Freeman [34] showed that the maximum value of \( C_{\text{B}} \left( {p_{k} } \right) \) is achieved only by the central node in a star. The maximum value is \( \frac{{n^{2} - 3n + 2}}{2} \). Therefore, the relative betweenness centrality is written as

$$ C_{\text{B}}^{'} \left( {p_{k} } \right) = \frac{{2C_{\text{B}} \left( {p_{k} } \right)}}{{n^{2} - 3n + 2}}. $$
(8)

Borgatti and Everett [37] noted that “when the network being studied consists of ties that are very costly to build, betweenness will indeed index an ability to extort benefits from flows through the network” (p. 474). Pitts [28] studied the medieval river trade network of Russia and concluded that the cities with high betweenness centrality had opportunities for amassing wealth and exerting control over other cities.

The fourth measure of centrality is eigenvector centrality [28]. Eigenvector centrality is obtained from the principal eigenvector (the one associated with the largest eigenvalue) of the adjacency matrix A of the network. The eigen-equation is written as

$$ v = \lambda^{ - 1} {\bf Av}, $$
(9)

where v is the eigenvector and λ is the corresponding eigenvector i.e., eigenpair. A number of fast algorithms such as Rayleigh Quotient Iteration [39] can be used to determine the largest eigenpair.

The eigenvector centrality is determined by

$$ C_{\text{E}} \left( {p_{k} } \right) = \sum\limits_{i = 1}^{n} {a_{ki} v_{i} }. $$
(10)

From Eq. (10), one can interpret that a node that has a high eigenvector centrality score is one that is adjacent to nodes that themselves having high scores. Indeed, eigenvector centrality is closely related to the influence measures proposed by researchers [32, 41, 42]. The fifth centrality measure is PageRank [29]. PageRank is a probability distribution used to represent the likelihood that a node (i.e., webpage) can be picked up via links initiated by a particular person. Mathematically speaking, if a vector r contains the PageRank values of n webpages, it can be determined by solving the following equation iteratively.

$$ r_{i + 1} = r_{ 0} + d{\text{A'}}\,r_{\text{i}} \quad {\text{for}}\quad i = 0,{ 1}, \, \ldots ,{\text{ until convergence}}; $$
(11)

where r 0 is the initial guess of r in which each element is (1-d)/n, d is the damping factor which is normally set to 0.85 [29], and the adjacency function a’(p i , p j ) is 0 if p i does not link to p j and normalized such that for each j,

$$ \sum\limits_{i = 1}^{n} {a'\left( {p_{i} ,p_{j} } \right)} = 1. $$
(12)

Hence, the PageRank values are the entries of the dominant eigenvector of the modified adjacency matrix A’ in which the elements of each column sum up to 1. PageRank is a variant of eigenvector centrality.

2.2 Flow Processes and Centrality Measures

Section 2.1 presents some of the most commonly used centrality measures. The development of each measure has a historical background with certain underlying assumptions on flow process. Freeman [34] and Borgatti [36] provided clarification on their development and uses. Specifically, Borgatti [36] typified the mechanisms of dyadic diffusion into two major forms; one replication (copy mechanism) and another transfer (move mechanism). He also identified that some mechanisms assume things moving along the shortest distances—geodesics while others do not, like paths, trails, and walks. From that, Borgatti [36] provided a summary about which measures should be used for different flow processes. The summary is shown in Table 1.

Table 1 Flow processes and major centrality measures

In a rail network, people move around from one station to another station. This phenomenon is best characterized as a transfer process. Therefore, either betweenness centrality or closeness centrality is an appropriate measure depending on the objective of the study as suggested by Table 1. In fact, betweenness centrality can be considered as an index that represents the frequency of traffic one can observe flowing through a node across multiple instances. On the other hand, closeness centrality is an index that represents the length of time it takes traffic to reach a node (assuming train using the more or less same duration to travel from one station to another). For a rail operator, betweenness centrality is a much more important indicator for facility management and risk management.

3 An Example: Rail System(s) in Hong Kong

In Hong Kong, the MTR Corporation operates a territory-wide nine-line commuter rail system with a total length of 175 km and the 35 km Airport Express as shown in Fig. 1. The MTR Corporation started operating an urban line with 8 stations (now part of Kwun Tong Line) in 1979. In the 1990s, MTR’s rail system expanded to three urban lines, namely Kwun Tong Line, Island Line, and Tsuen Wan Line, with 38 stations. In 2007, the MTR Corporation merged with another Hong Kong’s rail operator (KCRC) that operated East Rail Line, Ma On Shan Line, and West Rail Line.

Fig. 1
figure 1

Hong Kong’s urban rail network in 2014

Figure 1 shows the commuter rail network operated by the MTR Corporation in 2014. By analyzing the rail network using the social network analysis software NodeXL, various centrality measures were obtained. Table 2 shows the rank order of rail stations based on betweenness, closeness, degree, eigenvector, and PageRank measures of centrality. It indicates that Kowloon Tong Station—an interchange between Kwun Tong Line and East Rail Line is the most important station based on betweenness, closeness, and degree centrality. Tai Wan Station—an interchange between East Rail Line and Ma On Shan Line is the second most important station based on betweenness centrality, followed by stations in Admiralty, Nam Cheung, Quarry Bay, Lai King, and Yau Tong—all are interchanges between two rail lines. On the other hand, Kowloon Tong Station is ranked 31 and 3 according to its eigenvector centrality and PageRank centrality, respectively. Comparing the rankings by betweenness centrality and closeness centrality, it was found that Quarry Bay Station and Yau Tong Station (two interchanges between two rail lines) have much lower rankings using closeness centrality. It is because both stations are the critical links serving both the eastern parts on both sides of Victoria Harbor but have relatively few stations on further east. Nevertheless, they are very important in completing the rail loop in most densely populated areas in Hong Kong.

Table 2 Importance rank order of rail stations based on centrality measures

Figure 2 shows the importance of rail stations weighted by betweenness centrality. The size of circle represents the relative value of betweenness centrality. Figure 2 indicated that as expected, interchange stations have higher rankings based on betweenness centrality, followed by their immediate next stations, etc. Figure 3 shows the importance of rail stations based on the MTR’s network in 1990. It illustrated that the most important station based on betweenness centrality in 1990 was Prince Edward Station, followed by stations in Sham Shui Po, Quarry Bay, and Admiralty.

Fig. 2
figure 2

The importance of rail stations weighted by betweenness centrality. Note The size of circle represents the relative value of betweenness centrality

Fig. 3
figure 3

The importance of rail stations weighted by betweenness centrality (MTR system in 1990). Note The size of circle represents the relative value of betweenness centrality

A comparison of Figs. 2 and 3 illustrates that the values of betweenness centrality changed when an urban rail network expanded. For example, Prince Edward Station was the most important station in 1990 and its ranking of importance in terms of betweenness centrality dropped to 25 in 2014. Hence, the degree of importance of a station is dynamic and depends on the development of the rail system.

Measurement data were obtained from MTR for the year of 2005 [42] before MTR Corporation merged with KCRC. Using the number of passengers boarding and alighting along four major line at that time (Island Line, Tsuen Wan Line, Kowloon Tong Line, and Tung Chun Line), it was found that five stations had the number of passengers boarding and alighting more than 10,000 during the peak 15-min period in the morning between 7:30 and 9:30 in weekdays. They were Admiralty of 22,100 passengers, Prince Edwards of 20,300 passengers, MongKok of 18,900 passengers, Central/Hong Kong of 15,800 passengers, and Quarry Bay of 12,500 passengers. All were interchanges of MTR Lines at that time.

4 Conclusion

Railway is one of the most important transportation modes because it is very efficient and environmentally friendly to carry a large number of passengers from one location to another [11, 43], in order to provide a quick and efficient evaluation of the relative importance of stations in an urban rail system for facility management and risk management. This paper introduces five centrality measures and provides a detailed historical review of their development, formulations, and applications. As argued by Freeman [30], betweenness centrality is one of the most important indicators for a vertex or node because it refers to the extent of the vertex that is structurally central for standing in between others and the vertex can therefore facilitate or impede the transmission of information/goods. Freeman [30] also cited the definition of betweenness centrality expressed by Shimbel [44] in 1953:

“Suppose that in order for site i to contact site j, site k must be used as an intermediate station. Site k in such a network has a certain ‘responsibility’ to sites i and j. If we count all of the minimum paths which pass through site k, then we have a measure of the ‘stress’ which site k must undergo during the activity of the network. A vector giving this number for each number of the network would give us a good idea of stress conditions throughout the system.”

This paper shows that by applying betweenness centrality to the rail network in Hong Kong, a number of rail stations, especially those located at the interchanges between two rail lines, stand out to be the more important rail stations as expected. Nevertheless, the values of betweenness centrality show that not all of them have the same importance. For example, in Hong Kong’s MTR system, Kowloon Tong Station is the most important rail station, followed by Tai Wai Station and Admiralty Station, then stations in Nam Cheung, Quarry Bay, Lai King, and Yau Tong having almost the same value of betweenness centrality. Besides, the relative importance of a station is dynamic and depends on the expansion of the rail system.

In practice, the most ‘central’ station is under the greatest stress because it carries the largest number of passengers either as an entrance and exit to the network or as a location for passengers changing commuter lines. Its ticketing machines, gates, escalators, lifts, information systems, screening doors, etc, serve the largest number of passengers. In sum, betweeness centrality truly reflects the importance of a rail station in terms of its usability and criticality.

Most advanced cities are dependent on reliable and safe rail networks to carry a large number of commuters from their homes to offices and then back homes [3, 9, 11, 12]. Moreover, tourists today are also relying on rail networks to travel from one scenic spot to another scenic spot in many cities. Hence, it is critically important for a rail operator to maintain a very high level of reliable services to their customers. Therefore, betweenness centrality can serve as a very useful tool for rail operators to plan their maintenance schedule because the more ‘central’ stations experience much more stress, resulting in high loadings on its facilities. In addition, rail operators and government officials shall also use this tool to access the risk associated when a particular station is interrupted accidentally or on purpose. It should be noted that centrality measures can be applied to inter-city rail systems such as high-speed rail networks. However, when a high-speed rail network is linked to an urban rail network, great caution should be exercised because high-speed rail and urban rail are very different in terms of capacity and frequency. Future research should explore the interconnectedness of different rail networks.