1 Introduction
2 Related Work
2.1 Index-Based Approach
2.2 Compression-Based Approach
3 Preliminaries
3.1 Redundant Edges
3.2 Equivalence Class
3.3 Modular Decomposition
4 Overview of Our Approach
4.1 Multilevel Compression and Modular Decomposition Tree
4.2 Answering Reachability Queries Using Modular Decomposition Tree
5 Algorithms
5.1 Building Modular Decomposition Tree
5.1.1 Complexity
5.2 Finding Reachability Using the Decomposition Tree
5.2.1 Size of the Decomposition Tree
6 Experiments
6.1 Experimental Setup
6.1.1 Implementation and Running Environment
6.1.2 Datasets and Queries
6.2 Experiments on Real Datasets
Dataset |
G
|
\({\tilde{G}}\)
|
\(G^\epsilon \ \)
|
\(G_c\)
| |||
---|---|---|---|---|---|---|---|
|V| | |E| | \(r_e\)% | \(r_n\)% | \(r_e\)% | \(r_n\)% | \(r_e\)% | |
Kegg | 3617 | 3908 | 93.8 | 37.6 | 35.7 | 9.7 | 9.3 |
arXiv | 6000 | 66,707 | 20 | 97.9 | 19.7 | 93.3 | 19.3 |
XMark | 6080 | 7025 | 99 | 55.8 | 57 | 25.7 | 31 |
PubMed | 9000 | 40,028 | 67.5 | 76.7 | 62 | 76.2 | 61.9 |
soc-Epinions | 42,176 | 43,797 | 96.6 | 19.9 | 19.3 | 13 | 12.7 |
Web | 371,764 | 517,805 | 79.8 | 30.5 | 24.9 | 16.6 | 14.6 |
LJ | 971,232 | 1,024,140 | 95.1 | 11.1 | 10.8 | 7.9 | 7.6 |
Patent | 3,774,768 | 16,518,947 | 71.6 | 91.2 | 68.9 | 90.5 | 68.7 |
05Patent | 1,671,488 | 3,303,789 | 90.1 | 80.3 | 78.9 | 78.8 | 78.2 |
05Citeseerx | 1,457,057 | 3,002,252 | 81 | 37.9 | 50 | 37.4 | 49.7 |
Citeseerx | 6,540,401 | 15,011,260 | 74.4 | 39.7 | 46.4 | 38.9 | 46.1 |
DBpedia | 3,365,623 | 7,989,191 | 59.2 | 50.5 | 31.7 | 43.9 | 28.9 |
6.2.1 Datasets
Dataset |
G
|
\(G^\epsilon \ \)
|
\(G_c\)
| ||
---|---|---|---|---|---|
\(|V| + |E|\)
|
\(|V_{G^\epsilon }| + |E_{G^\epsilon }|\)
| \(r_{G^\epsilon }\)% |
\(|V_{G_c}| + |E_{G_c}|\)
| \(r_{G_c}\)% | |
Kegg | 7825 | 2756 | 35.2 | 720 | 9.2 |
arXiv | 72,707 | 19,046 | 26.2 | 18,481 | 25.4 |
XMark | 13,105 | 7394 | 56.4 | 3732 | 28.5 |
PubMed | 49,028 | 31,730 | 64.7 | 31,632 | 64.5 |
soc-Epinions | 85,973 | 16,846 | 19.6 | 11,016 | 12.8 |
Web | 889,569 | 242,305 | 27.2 | 137,110 | 15.4 |
LJ | 1,995,372 | 218,608 | 10.9 | 153,860 | 7.7 |
Patent | 20,293,715 | 14,827,554 | 73.1 | 14,779,167 | 72.8 |
05Patent | 4,975,277 | 3,949,609 | 79.4 | 3,901,493 | 78.4 |
05Citeseerx | 4,459,309 | 2,052,235 | 46 | 2,037,926 | 45.7 |
Citeseerx | 21,551,661 | 9,562,970 | 44.4 | 9,458,669 | 43.9 |
DBpedia | 11,354,814 | 4,233,784 | 37.3 | 3,787,744 | 33.4 |
6.2.2 Compression Ratio
Dataset | Tree size \((|V|+|m|+1)\) | Space (MB) for decomposition tree | Space (MB) for equivalence class |
---|---|---|---|
Kegg | 4238 | 0.009 | 0.006 |
arXiv | 6352 | 0.01 | 0.01 |
XMark | 8631 | 0.02 | 0.01 |
PubMed | 9385 | 0.02 | 0.02 |
soc-Epinions | 44,590 | 0.09 | 0.08 |
Web | 427,907 | 0.9 | 0.7 |
LJ | 997,681 | 1.93 | 1.85 |
Patent | 3,981,729 | 7.3 | 7.2 |
05Patent | 1,856,796 | 3.31 | 3.19 |
05Citeseerx | 1,592,636 | 2.86 | 2.78 |
Citeseerx | 7,139,965 | 12.99 | 12.47 |
DBpedia | 3,804,103 | 7.49 | 6.41 |
Dataset | Time (s) |
---|---|
Kegg | 0.057 |
arXiv | 0.16 |
XMark | 0.16 |
PubMed | 0.49 |
soc-Epinions | 4.49 |
Web | 286.67 |
LJ | 3073 |
Patent | 389.23 |
05Patent | 71.65 |
05Citeseerx | 175.69 |
Citeseerx | 8772.81 |
DBpedia | 89,764.32 |
6.2.3 Index Construction Time
Dataset | Grail |
\(\hbox {IP}^+\)
| Feline |
\(\hbox {BFL}^+\)
| ||||
---|---|---|---|---|---|---|---|---|
\(G^\epsilon \)
|
\(G_c\)
|
\(G^\epsilon \)
|
\(G_c\)
|
\(G^\epsilon \)
|
\(G_c\)
|
\(G^\epsilon \)
|
\(G_c\)
| |
Kegg | 1.01 |
0.64
| 0 | 0 | 0.47 |
0.25
| 0.07 |
0.02
|
arXiv | 4.03 |
5.34
| 0.015 |
0
| 2.03 |
1.96
| 0.66 |
0.55
|
XMark | 2.03 |
1.04
| 0.02 |
0
| 1.5 |
0.7
| 0.14 |
0.07
|
PubMed | 6.12 |
5.05
| 0.02 |
0.01
| 2.49 |
2.25
| 0.84 |
0.75
|
soc-Epinions | 4.75 |
3.13
| 0.04 |
0.01
| 2.39 |
1.02
| 0.49 |
0.24
|
Web | 71.39 |
43.24
| 0.04 |
0.03
| 46.21 |
27.77
| 9.52 |
5.81
|
LJ | 64.59 |
48.58
| 0.06 |
0.03
| 39.08 |
27.58
| 7.52 |
4.52
|
Patent | 5367.76 |
5283.97
| 5.6 |
5.06
| 3496.5 |
3477.12
| 1169.05 |
1161.4
|
05Patent | 1407.6 |
1408.6
| 1.42 |
1.25
|
950.03
| 959.18 | 306.06 |
301.48
|
05Citeseerx |
516.78
| 519.09 | 0.53 |
0.51
| 315.52 |
314.87
| 104.22 |
102.61
|
Citeseerx |
2697.92
| 2711.95 | 3.26 |
2.53
| 1634.8 |
1629.01
| 757.73 |
738.46
|
DBpedia | 1478.23 |
1221.09
| 1.23 |
1.07
| 937.71 |
834.48
| 256.32 |
237.09
|
6.2.4 Index Size
Dataset | Grail |
\(\hbox {IP}^+\)
| Feline |
\(\hbox {BFL}^+\)
| ||||
---|---|---|---|---|---|---|---|---|
\(G^\epsilon \)
|
\(G_c\)
|
\(G^\epsilon \)
|
\(G_c\)
|
\(G^\epsilon \)
|
\(G_c\)
|
\(G^\epsilon \)
|
\(G_c\)
| |
Kegg | 0.005 |
0.001
| 0.04 |
0.008
| 0.03 |
0.008
| 0.05 |
0.01
|
arXiv | 0.02 | 0.02 | 0.21 |
0.2
| 0.13 |
0.12
| 0.24 |
0.23
|
XMark | 0.01 |
0.006
| 0.12 |
0.05
| 0.08 |
0.03
| 0.14 |
0.06
|
PubMed | 0.03 |
0.02
| 0.19 | 0.19 |
0.03
| 0.15 | 0.22 |
0.2
|
soc-Epinions | 0.03 |
0.02
| 0.21 |
0.12
| 0.19 |
0.12
| 0.27 |
0.16
|
Web | 0.43 |
0.23
| 3.25 |
1.62
| 2.59 |
1.41
| 4.04 |
2.01
|
LJ | 0.41 |
0.29
| 2.71 |
1.72
| 2.47 |
1.75
| 3.39 |
2.16
|
Patent | 13.12 |
13.03
| 67.91 |
66.84
| 78.76 |
78.22
| 123.65 |
122.6
|
05Patent | 5.12 |
5.02
| 18.35 |
17.79
| 30.71 |
30.16
| 42.43 |
41.39
|
05Citeseerx | 2.1 |
2.07
| 12.64 |
12.47
| 12.63 |
12.47
| 17.08 |
16.77
|
Citeseerx | 9.91 |
9.71
| 67.91 |
66.84
| 59.46 |
58.27
| 76.06 |
73.7
|
DBpedia | 6.48 |
5.64
| 45.35 |
38.54
| 38.87 |
33.84
| 56.45 |
47.62
|
6.2.5 Query Performance
Dataset | Grail |
\(\hbox {IP}^+\)
| Feline |
\(\hbox {BFL}^+\)
| ||||
---|---|---|---|---|---|---|---|---|
\(G^\epsilon \)
|
\(G_c\)
|
\(G^\epsilon \)
|
\(G_c\)
|
\(G^\epsilon \)
|
\(G_c\)
|
\(G^\epsilon \)
|
\(G_c\)
| |
Kegg | 22.97 |
19.89
|
26.96
| 32.7 | 22.43 |
18.11
| 32.38 |
25.49
|
arXiv | 123.68 |
94.42
| 82.25 |
76.36
| 96.07 |
94.14
| 61.95 |
57.57
|
XMark | 25.86 |
22.9
| 32.39 |
30.95
| 29.29 |
23.93
| 33.78 |
31
|
PubMed | 53.22 |
50.15
| 39.56 |
37.92
| 43.39 |
40.73
| 47.8 |
42.74
|
soc-Epinions | 40.29 |
35.5
| 50.85 |
45.26
| 35.04 |
28.13
| 47.94 |
45.54
|
Web | 96.93 |
82.37
| 145.6 |
132.89
| 75.14 |
66.05
| 117.86 |
99.88
|
LJ | 120.41 |
93.93
| 144.27 |
120.55
|
92.79
| 95.45 | 161.85 |
127.46
|
Patent | 831.7 |
688.4
| 375.74 |
361.91
| 592.53 |
484.85
|
193.18
| 199.2 |
05Patent | 197.17 |
159.57
| 145.28 |
139.81
| 148.81 |
146.47
| 146.25 |
143.71
|
05Citeseerx | 169.84 |
166.59
| 152.91 |
140.41
| 156.76 |
152.27
| 148.68 |
145.53
|
Citeseerx | 208.44 |
206.52
| 223.37 |
191.43
| 200.95 |
196.69
| 232.57 |
206.14
|
DBpedia | 216.21 |
215
| 221.47 |
206.32
| 206.32 |
200.32
| 215.62 |
186.22
|
6.3 Experiments on Synthetic Datasets
Dataset | |V| | |E| | \(r_n\) (%) | \(r_e\) (%) | Compression time (s) | ||
---|---|---|---|---|---|---|---|
\(G^\epsilon \)
|
\(G_c\)
|
\(G^\epsilon \)
|
\(G_c\)
|
\(G_c\)
| |||
Data1 | 1000 | 2000 | 99.4 | 90.3 | 82.95 | 78.4 | 0.02 |
Data2 | 5000 | 8000 | 98.46 | 82.26 | 75.05 | 64.93 | 0.09 |
Data3 | 10,000 | 25,000 | 99.78 | 95.31 | 90.63 | 88.84 | 0.21 |
Data4 | 50,000 | 75,000 | 98.26 | 80.32 | 74.53 | 62.57 | 1.10 |
Data5 | 100,000 | 150,000 | 98.18 | 80.2 | 74.8 | 62.82 | 2.68 |
Data6 | 1,000,000 | 1,500,000 | 98.18 | 80.24 | 74.44 | 62.49 | 40.01 |
Data7 | 5,000,000 | 7,500,000 | 98.2 | 80.32 | 74.48 | 62.56 | 433.98 |
Data8 | 10,000,000 | 15,000,000 | 98.2 | 80.32 | 74.49 | 62.57 | 12,626.34 |
6.3.1 Compression Ratio
6.3.2 Index Construction Time
6.3.3 Index Size
6.3.4 Query Time
Dataset |
\(\hbox {IP}^+\)
| Feline |
\(\hbox {BFL}^+\)
| |||
---|---|---|---|---|---|---|
\(G^\epsilon \)
|
\(G_c\)
|
\(G^\epsilon \)
|
\(G_c\)
|
\(G^\epsilon \)
|
\(G_c\)
| |
Data1 | 23.35 |
21.77
| 286.04 |
124.523
| 25.03 |
22.99
|
Data2 | 31.71 |
29.06
| 1593.85 |
914
| 35.47 |
34.22
|
Data3 | 37.66 |
34.02
| 1988.83 |
375.01
| 41.61 |
40.53
|
Data4 | 66.17 |
52.21
|
2069.84
| 2484.5 | 58.26 |
51.31
|
Data5 | 76.42 |
66.08
| 58,884.8 |
37,781.2
| 82.11 |
76.85
|
Data6 | 79.52 |
69.29
| – |
41,254.4
| – |
91.52
|
Data7 | 82.3 |
71.98
| – | – | – | – |
Data8 | 89.02 |
73.7
| – | – | – | – |