01.12.2015  Regular article  Ausgabe 1/2015 Open Access
Routerlevel community structure of the Internet Autonomous Systems
 Zeitschrift:
 EPJ Data Science > Ausgabe 1/2015
Competing interests
Authors’ contributions
1 Introduction
2 Dataset
2.1 Graph preprocessing
degree
2

AS
 

True  False  
True  0.12  0.16 
False  0.70  0.02 
Chain

Number of cases

Typical configuration


193,398  Internal AS connection  
90,307  Internal AS tunnel  
86,775  InterAS tunnel  
57,988  InterAS tunnel  
32,473  Internal AS tunnel  
28,774  InterAS tunnel 
3 Analysis methods and results

Infomap, by Rosvall and Bergstrom, based in the description length [21].

LPM, the Label Propagation Method by Raghavan et al. [22], which performs a diffusion process on the graph.

Deltacom, an efficient greedy algorithm for modularity optimization introduced here.

Louvain, a fast modularity optimization algorithm [23].

CommUGP, a local community detection method [24].
3.1 A hierarchical partitioning method based on multiresolution modularity
3.1.1 Newman’s modularity
3.1.2 Introducing a resolution parameter
3.2 Finding ASes through similarity maximization

The Infomap algorithm has a multiresolution variant [36], which generates a community tree with several levels \(k\in\{1,\ldots,k_{\max}\}\). We matched each of the ASes into this structure in the same way as we did with Multiresolution Deltacom: i.e., we find the best level for each AS, and then we compute the recall score (now \(\Theta=\{1,\ldots,k_{\max}\}\) and \(\theta=k\)). Here we found a clear improvement too, which is shown in the orange curve of Figure 2, pointing out that it is important to find the correct resolution for each AS.

From the general schema described by Reichardt and Bornholdt in [31] several multiresolution approaches have been proposed. We used the Constant Potts Model by Traag et al. [37], which removes the null model in the Hamiltonian to cancel global dependences. For this algorithm we tested different values of \(\gamma\in[10^{8}, 1]\). We observed that this interval covers all the interesting situations, as for \(\gamma=10^{8}\) we obtain a partition with 10 communities, and for \(\gamma=1\) all the nodes are in different communities. We sampled the interval by dividing it into 4, 8, 16 and 32 pieces, equidistant in the logarithmic scale. When dividing it into \(2^{k}\) pieces we obtain \(2^{k} + 1\) values of γ which we use as the parameter space \(\Theta=\{\gamma_{0}, \gamma _{1}, \ldots, \gamma_{2^{k}}\}\) in order to match each AS against its most similar community (according to the recall score \(R_{2}(AS)\)) among the \(2^{k} + 1\) different partitions. We chose to stop the exploration at \(k=5\) (33 values of γ) because the observed convergence suggested that refining the parameter space would not produce much better results. The curve for \(k=5\) representing CPM is shown in purple in Figure 2. Finally, the central picture in Figure 3 shows the best γ value (from among the 33 considered values) for each AS, as a function of its size. Compared to Deltacom, in CPM the correlation between resolution and size is not so strong.
3.3 Detecting ASes from samples
4 Discussion
Method

avg
R
(
AS
)
 

ASes ≥
100 nodes

All ASes
 
LPM  0.05  0.18 
Deltacom (t = 1)  0.08  0.00 
Louvain  0.13  0.01 
CommUGP  0.16  0.37 
Infomap  0.41  0.35 
Deltacom Multiresolution + Regression line + 5% sample  0.57  0.75 
Deltacom Multiresolution + Regression line  0.59  0.77 
Infomap Multiresolution (Best Resolution)  0.61  0.49 
Deltacom Multiresolution (Best Resolution)  0.67  0.87 
CPM (Best Resolution from 33 values of γ)  0.70  0.83 