Introduction
Data description and aggregate analysis
Data description
Attribute | No. of categories | Description |
---|---|---|
Day | 8 | (1) Monday, (2) Tuesday, (3) Wednesday, (4) Thursday, (5) Friday, (6) Saturday, (7) Sunday, (8) Public holiday |
Time | 20 | (1) 5:00–5:59, (2) 6:00–6:59, (3) 7:00–7:59, …, (20) 24:00–24:59 |
Passenger type | 11 | (1) Adult, (2) Student, (3) Child, (4) Elderly person, (5) Person with disability, (6) Adult commuter for work, (7) Adult commuter for school, (8) Child commuter not for school, (9) Child commuter for school, (10) Disabled commuter for work, (11) Disabled commuter for school |
Origin station | 52 | 52 stations |
Destination station | 52 | 52 stations |
Aggregate analysis
Method
Preliminaries
Extraction of travel patterns via data polishing
-
This study defines the graph constructed based on the co-occurrence relationships between usage vertices and OD vertices to be the co-occurrence graph, \({G}_{c}\). The graph depicted in Fig. 6 is denoted by \(G\) for the construction of the co-occurrence graph, \({G}_{c}\). The graph, \(G\), is constructed based on the matrix (OD information × Usage information) generated by smart card data. However, the diagonal component of the matrix is 0. The sum of the row corresponds to an OD vertex, the sum of each column corresponds to a usage vertex, and each element in the matrix corresponds to edge information. The vertices and edges in the graph, \(G\), are defined as follows.
-
A vertex representing a particular combination of day, \(l=1,\dots , 8\); time, \(m=1, \dots , 20\); and passenger type, \(n=1, \dots , 11\) is denoted by “usage vertex \({x}_{lmn}\),” and the vertex set of all usage vertices is denoted by “usage vertex set \(\varvec{X}\).” The number of elements in the usage vertex set \(\varvec{X}\) is 1,760 because it contains one user vertex for each of the total number of day × time × passenger type combinations. Each usage vertex, \({x}_{lmn}\), encodes the information about the number of passengers of type \(n\) who used the train network at time \(m\) on day \(l\). For example, in Fig. 6, the usage vertex (\({x}_{l=1,m=1,n=1}\)) representing the combination of Monday, 5:00–5:59 hrs, and Adult passenger type contains information on the total number of adult users on Mondays during 5:00–5:59 hrs. In addition, a vertex representing a particular combination of origin station \(o=1, \dots , 52\) and destination station \(d=1, \dots , 52\) is denoted by “OD vertex \({y}_{od}\),” and the vertex set of all OD vertices is denoted by “OD vertex set \(\varvec{Y}\).” The number of elements in the OD vertex set \(\varvec{Y}\) is 52 × 52–52=2,652 because it contains one vertex for each of the total number of origin stations × total number of destination stations combinations, except duplicates. Each OD vertex, \({y}_{od}\), encodes information regarding the total number of users travelling from the origin station \(o\) to the destination station \(d\). For example, in Fig. 6, the OD vertex (\({y}_{o=1,d=2}\)) representing O1→D2 captures the information on the total number of users travelling from origin station O1 to destination station D2. Further, each edge connecting a usage vertex \({x}_{lmn}\) with an OD vertex \({y}_{od}\) encodes the information regarding the number of users of passenger type \(n\) who travelled from the origin station \(o\) to the destination station \(d\) at time \(m\) on day \(l\). For example, in Fig. 6, the edge connecting the usage vertex (\({x}_{l=1,m=1,n=1}\)) corresponding to Monday × 5:00–5:59 × Adults and the OD vertex (\({y}_{o=1,d=2}\)) corresponding to O1→D2 encodes information regarding the number of adult users travelling from origin station O1 to destination station D2 on Monday during 5:00–5:59 hrs. The maximum number of edges in the graph, \(G\), is 4,667,520 (= 1760 × 2652).
-
We construct the co-occurrence graph, \({G}_{c}\), by extracting combinations that share co-occurrence relationships with respect to all combinations of usage vertices and OD vertices in the graph \(G\). In this case, co-occurrence is expressed by the ratio of common users among the users corresponding to each pair of usage and OD vertices. Further, a statistical test is performed to determine the significance of co-occurrence to rule out the possibility that its manifestation is coincidental instead of causal. In this paper, t-values are used as the criteria for co-occurrence in the natural language processing field, and the statistical significance of co-occurrence is adjudged by a t-test. The t-value used as the test statistic for the t-test is calculated using (1), where W denotes the total number of users (= 9,008,709).$$t\text{-value}=\frac{\left(\left|{x}_{lmn}\cap {y}_{od}\right|-\frac{\left|{x}_{lmn}\right|\times \left|{y}_{od}\right|}{W}\right)}{\sqrt{\left|{x}_{lmn}\cap {y}_{od}\right|}}$$(1)
-
Next, data polishing is applied to the similarity graph obtained in the previous step to group usage vertices \({s}_{k}\in \varvec{U}\) in a fashion that ensures that only pairs of usage vertices with strong connections in the similarity graph remain connected by edges. The similarity measure of sets is used to adjudge whether usage vertex pairs share a strong connection. In this study, the Jaccard coefficient is used as the similarity measure as in the case of the construction of the similarity graph. The similarity between any two usage vertices, \({s}_{k}\) and \({s}_{k}^{{\prime}} \), is defined by (3).$$sim\left({s}_{k},{s}_{k}^{{\prime}} \right)=\frac{\left|N\left[{s}_{k}\right]\cap N\left[{s}_{k}^{{\prime}} \right]\right|}{\left|N\left[{s}_{k}\right]\cup N\left[{s}_{k}^{{\prime}} \right]\right|} s.t. {s}_{k},{s}_{k}^{{\prime}} \in \varvec{S}$$(3)
-
Next, we attempt to estimate the most frequent origin and destination stations corresponding to users in each extracted clique. As an example, we consider the case of an extracted clique consisting of two usage vertices\({x}_{l=1,m=1,n=1}\) (Monday × 5:00–5:59 hrs × Adult) and \({x}_{l=1,m=2,n=1}\) (Monday × 6:00–6:59 hrs × Adult). First, we extract the OD vertices exhibiting co-occurrence with both usage vertices on the basis of the co-occurrence graph, \({G}_{c}\). Then, we identify the most frequent OD vertices in the co-occurring combinations on the basis of the graph, \(G\). Via this process, we identify the types of passengers who travel between different sets of origin and destination stations at different times of the day on different days of the week.
Results
The threshold
Similarity of usage vertices
Combination | No. of cliques |
---|---|
Time and passenger type are the same | 37 |
Day and passenger type are the same | 2 |
Only passenger type is the same | 10 |
Day, time, and passenger type are all distinct | 3 |
Combination | No. of cliques |
---|---|
Time and passenger type are the same | 28 |
Day and passenger type are the same | 0 |
Only passenger type is the same | 4 |
Day, time, and passenger type are all distinct | 0 |
Combination | No. of cliques |
---|---|
Time and passenger type are the same | 4 |
Day and passenger type are the same | 0 |
Only passenger type is the same | 1 |
Day, time, and passenger type are all distinct | 0 |
Combination | No. of cliques |
---|---|
Time and passenger type are the same | 28 |
Day and passenger type are the same | 0 |
Only passenger type is the same | 2 |
Day, time, and passenger type are all distinct | 0 |
Combination | No. of cliques |
---|---|
Time and passenger type are the same | 0 |
Day and passenger type are the same | 0 |
Only passenger type is the same | 3 |
Day, time, and passenger type are all distinct | 0 |
Combination | No. of cliques |
---|---|
Time and passenger type are the same | 0 |
Day and passenger type are the same | 0 |
Only passenger type is the same | 1 |
Day, time, and passenger type are all distinct | 0 |
Combination | No. of cliques |
---|---|
Time and passenger type are the same | 0 |
Day and passenger type are the same | 0 |
Only passenger type is the same | 1 |
Day, time, and passenger type are all distinct | 0 |
Combination | No. of cliques |
---|---|
Time and passenger type are the same | 0 |
Day and passenger type are the same | 0 |
Only passenger type is the same | 1 |
Day, time, and passenger type are all distinct | 0 |
Combination | No. of cliques |
---|---|
Time and passenger type are the same | 0 |
Day and passenger type are the same | 0 |
Only passenger type is the same | 1 |
Day, time, and passenger type are all distinct | 0 |
Combination | No. of cliques |
---|---|
Time and passenger type are the same | 0 |
Day and passenger type are the same | 0 |
Only passenger type is the same | 1 |
Day, time, and passenger type are all distinct | 0 |
Travel patterns of IruCa users
Cliques | Origin station | Destination station |
---|---|---|
(C1) “Sunday × 6:00–6:59 × Child commuter for school” “Thursday × 17:00–17:59 × Child commuter not for school” | Ota | Shioya |
(C2) “Public holiday × 13:00–13:59 × Child commuter for school” “Wednesday × 18:00–18:59 × Child commuter not for school” | Sanjo | Fusazaki |
(C3) “Saturday × 23:00–23:59 × Child” “Public holiday × 24:00–24:59 × Adult” | Kawaramachi | Kotoden-Kotohira |