main-content

## Über dieses Buch

This book constitutes the proceedings of the 15th International Workshop on Algorithms and Models for the Web Graph, WAW 2018, held in Moscow, Russia in May 2018.

The 11 full papers presented in this volume were carefully reviewed and selected from various submissions. The papers focus on topics like the information retrieval and data mining on the Web; Web as a text repository and as a graph, induced in various ways by link among pages, hosts and users; the understanding of graphs that arise from the Web and various user activities on the Web; stimulation of the development of high-performance algorithms and applications that exploit these graphs.

## Inhaltsverzeichnis

### Finding Induced Subgraphs in Scale-Free Inhomogeneous Random Graphs

Abstract
We study the induced subgraph isomorphism problem on inhomogeneous random graphs with infinite variance power-law degrees. We provide a fast algorithm that determines for any connected graph H on k vertices if it exists as induced subgraph in a random graph with n vertices. By exploiting the scale-free graph structure, the algorithm runs in O(nk) time for small values of k. We test our algorithm on several real-world data sets.
Ellen Cardinaels, Johan S. H. van Leeuwaarden, Clara Stegehuis

### The Asymptotic Normality of the Global Clustering Coefficient in Sparse Random Intersection Graphs

Abstract
We establish the asymptotic normality of the global clustering coefficient in sparse uniform random intersection graphs.
Mindaugas Bloznelis, Jerzy Jaworski

### Clustering Properties of Spatial Preferential Attachment Model

Abstract
In this paper, we study the clustering properties of the Spatial Preferential Attachment (SPA) model introduced by Aiello et al. in 2009. This model naturally combines geometry and preferential attachment using the notion of spheres of influence. It was previously shown in several research papers that graphs generated by the SPA model are similar to real-world networks in many aspects. For example, the vertex degree distribution was shown to follow a power law. In the current paper, we study the behaviour of C(d), which is the average local clustering coefficient for the vertices of degree d. This characteristic was not previously analyzed in the SPA model. However, it was empirically shown that in real-world networks C(d) usually decreases as $$d^{-a}$$ for some $$a>0$$ and it was often observed that $$a=1$$. We prove that in the SPA model C(d) decreases as $$1{\slash }d$$. Furthermore, we are also able to prove that not only the average but the individual local clustering coefficient of a vertex v of degree d behaves as $$1{\slash }d$$ if d is large enough. The obtained results are illustrated by numerous experiments with simulated graphs.
Lenar Iskhakov, Bogumił Kamiński, Maksim Mironov, Paweł Prałat, Liudmila Prokhorenkova

### Parameter Estimators of Sparse Random Intersection Graphs with Thinned Communities

Abstract
This paper studies a statistical network model generated by a large number of randomly sized overlapping communities, where any pair of nodes sharing a community is linked with probability q via the community. In the special case with $$q=1$$ the model reduces to a random intersection graph which is known to generate high levels of transitivity also in the sparse context. The parameter q adds a degree of freedom and leads to a parsimonious and analytically tractable network model with tunable density, transitivity, and degree fluctuations. We prove that the parameters of this model can be consistently estimated in the large and sparse limiting regime using moment estimators based on partially observed densities of links, 2-stars, and triangles.
Joona Karjalainen, Johan S. H. van Leeuwaarden, Lasse Leskelä

### Joint Alignment from Pairwise Differences with a Noisy Oracle

Abstract
In this work we consider the problem of recovering n discrete random variables $$x_i\in \{0,\ldots ,k-1\}, 1 \le i \le n$$ with the smallest possible number of queries to a noisy oracle that returns for a given query pair $$(x_i,x_j)$$ a noisy measurement of their modulo k pairwise difference, i.e., $$y_{ij} = x_i-x_j \ (\mathrm {mod}\ k)$$. This is a joint discrete alignment problem with important applications in computer vision [12, 23], graph mining [20], and spectroscopy imaging [22]. Our main result is a recovery algorithm (up to some offset) that solves with high probability the non-convex maximum likelihood estimation problem using $$O(n^{1+o(1)})$$ queries.
Michael Mitzenmacher, Charalampos E. Tsourakakis

### Analysis of Relaxation Time in Random Walk with Jumps

Abstract
We study the relaxation time in the random walk with jumps. The random walk with jumps combines random walk based sampling with uniform node sampling and improves the performance of network analysis and learning tasks. We derive various conditions under which the relaxation time decreases with the introduction of jumps.
Konstantin Avrachenkov, Ilya Bogdanov

### QAP Analysis of Company Co-mention Network

Abstract
In our research we form a network called company co-mention network. News analytics data have been employed to collect the companies co-mentioning. Each company acquires a certain value based on the amount of news in which the company was mentioned. A matrix containing the number of co-mentioning news between pairs of companies has been created for network analysis. Each company is presented as a node, and news mentioning two companies establishes a link between them. The network is constructed quite similarly to social networks or co-citation networks. The networked map of the companies is used to visualize the dependence structure of the economy by identifying groups of companies that are more central than others. The analysis carried out in the context of sectors of economy and territorial affiliation made it possible to identify key companies and to explore the similarity of the power law of vertices within sectors. QAP analysis between the co-mention network and the sector affiliation network was carried out to examine the ability of the sector affiliation network to predict the structure of the co-mention network.
S. P. Sidorov, A. R. Faizliev, V. A. Balash, A. A. Gudkov, A. Z. Chekmareva, M. Levshunov, S. V. Mironov

### Towards a Systematic Evaluation of Generative Network Models

Abstract
Generative graph models play an important role in network science. Unlike real-world networks, they are accessible for mathematical analysis and the number of available networks is not limited. The explanatory power of results on generative models, however, heavily depends on how realistic they are. We present a framework that allows for a systematic evaluation of generative network models. It is based on the question whether real-world networks can be distinguished from generated graphs with respect to certain graph parameters.
As a proof of concept, we apply our framework to four popular random graph models (Erdős-Rényi, Barabási-Albert, Chung-Lu, and hyperbolic random graphs). Our experiments for example show that all four models are bad representations for Facebook’s social networks, while Chung-Lu and hyperbolic random graphs are good representations for other networks, with different strengths and weaknesses.
Thomas Bläsius, Tobias Friedrich, Maximilian Katzmann, Anton Krohmer, Jonathan Striebel

### Dynamic Competition Networks: Detecting Alliances and Leaders

Abstract
We consider social networks of competing agents that evolve dynamically over time. Such dynamic competition networks are directed, where a directed edge from nodes u to v corresponds a negative social interaction. We present a novel hypothesis that serves as a predictive tool to uncover alliances and leaders within dynamic competition networks. Our focus is in the present study is to validate it on competitive networks arising from social game shows such as Survivor and Big Brother.
Anthony Bonato, Nicole Eikmeier, David F. Gleich, Rehan Malik

### An Experimental Study of the k-MXT Algorithm with Applications to Clustering Geo-Tagged Data

Abstract
We consider a graph fragmentation process which can be described as follows. Each vertex v selects the k adjacent vertices which have the largest number of common of neighbours. For each selected neighbour u, we retain the edge (vu) to form a the subgraph graph S of the input graph. The object of interest are the components of S, the k-Max-Triangle-Neighbour (k-MXT) subgraph, and the vertex clusters they produce in the original graph.
We study the application of this process to clustering in the planted partition model, and on the geometric disk graph formed from geo-tagged photographic data downloaded from Flickr.
In the planted partition model, there are $$\ell$$ numbers of partitions, or subgraphs, which are connected densely within each partition but sparser between partitions. The objective is to recover these hidden partitions. We study the case of the planted partition model based on the random graph $$G_{n,p}$$ with additional edge probability q within the partitions. Theoretical and experimental results show that the 2-MXT algorithm can recover the partitions for any $$q/p>0$$ constant provided the density of triangles is high enough.
We apply the k-MXT algorithm experimentally to the problem of clustering geographical data, using London as an example. Given a dataset consisting of geographical coordinates extracted from photographs, we construct a disk graph by connecting every point to other points if and only if theirs distance is at most d. Our experimental results show that the k-MXT algorithm is able to produce clusters which are of comparable to popular clustering algorithms such as DBSCAN (see e.g. Fig. 5).
Colin Cooper, Ngoc Vu

### A Statistical Performance Analysis of Graph Clustering Algorithms

Abstract
Measuring graph clustering quality remains an open problem. Here, we introduce three statistical measures to address the problem. We empirically explore their behavior under a number of stress test scenarios and compare it to the commonly used modularity and conductance. Our measures are robust, immune to resolution limit, easy to intuitively interpret and also have a formal statistical interpretation. Our empirical stress test results confirm that our measures compare favorably to the established ones. In particular, they are shown to be more responsive to graph structure, less sensitive to sample size and breakdowns during numerical implementation and less sensitive to uncertainty in connectivity. These features are especially important in the context of larger data sets or when the data may contain errors in the connectivity patterns.
Pierre Miasnikof, Alexander Y. Shestopaloff, Anthony J. Bonner, Yuri Lawryshyn

### Backmatter

Weitere Informationen