nach oben

2020 | Buch

Kapitel lesen Erstes Kapitel lesen

Algorithms and Models for the Web Graph

17th International Workshop, WAW 2020, Warsaw, Poland, September 21–22, 2020, Proceedings

herausgegeben von: Dr. Bogumił Kamiński, Paweł Prałat, Dr. Przemysław Szufel

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book constitutes the proceedings of the 17th International Workshop on Algorithms and Models for the Web Graph, WAW 2020, held in Warsaw, Poland, in September 2020. The 12 full papers presented in this volume were carefully reviewed and selected from 19 submissions. The aim of the workshop was to further the understanding of graphs that arise from the Web and various user activities on the Web, and stimulate the development of high-performance algorithms and applications that exploit these graphs.

Due to the corona pandemic the conference was postponed from June 2020 to September 2020.

Inhaltsverzeichnis

Frontmatter

Hypergraph Analytics of Domain Name System Relationships

Abstract

We report on the use of novel mathematical methods in hypergraph analytics over a large quantity of DNS data. Hypergraphs generalize graphs, as used in network science, to better model complex multiway relations in cyber data. Specifically, casting DNS data from Georgia Tech’s ActiveDNS repository as hypergraphs allows us to fully represent the interactions between collections of domains and IP addresses. To facilitate large-scale analytics, we fielded an analytical pipeline of two capabilities: HyperNetX (HNX) is a Python package for the exploration and visualization of hypergraphs; while on the backend, the Chapel HyperGraph Library (CHGL) is a library for high performance hypergraph analytics written in the exascale programming language Chapel. CHGL was used to process gigascale DNS data, performing compute-intensive calculations for data reduction and segmentation. Identified portions are then sent to HNX for both exploratory analysis and knowledge discovery targeting known tactics, techniques, and procedures.

Cliff A. Joslyn, Sinan Aksoy, Dustin Arendt, Jesun Firoz, Louis Jenkins, Brenda Praggastis, Emilie Purvine, Marcin Zalewski

Global Graph Curvature

Abstract

Recently, non-Euclidean spaces became popular for embedding structured data. However, determining suitable geometry and, in particular, curvature for a given dataset is still an open problem. In this paper, we define a notion of global graph curvature, specifically catered to the problem of embedding graphs. We theoretically analyze this value and show that the optimal curvature essentially depends on the dimensionality of the embedding space and loss function one aims to minimize via embedding. We also review existing notions of local curvature (e.g., Ollivier-Ricci curvature) and conduct a theoretical analysis of their properties. In particular, we demonstrate that the global curvature differs significantly from the aggregations of local ones. Thus, the proposed measure is non-trivial and it requires new empirical estimators as well as separate theoretical analysis.

Liudmila Prokhorenkova, Egor Samosvat, Pim van der Hoorn

Information Diffusion in Complex Networks: A Model Based on Hypergraphs and Its Analysis

Abstract

This work introduces the problem of social influence diffusion in complex networks, where vertices are linked not only through simple pairwise relationships to other nodes but with groups of nodes of arbitrary size. A challenging problem that arises in this domain is to determine a small subset of nodes S (a target-set) able to spread their influence in the whole network. This problem has been formalized and studied in different ways, and many viable solutions have been found for graphs. These have been applied to study several phenomena in research fields such as social, economic, biological, and physical sciences.

In this contribution, we investigated the social influence problem on hypergraphs. As hypergraphs are mathematical structures generalization of graphs, they can naturally model the many-to-many relationships characterizing a complex network. Given a network represented by a hypergraph \(H=(V,E)\), we consider a dynamic influence diffusion process on H, evolving as follows. At the beginning of the process, the nodes in a given set \(S\subseteq V\) are influenced. Then, at each iteration, the influenced hyperedges set is augmented by all hyperedges having a sufficiently large number of influenced nodes. Consequently, the set of influenced nodes is extended by all the nodes contained in a sufficiently large number of already influenced hyperedges. The process terminates when no new nodes can be influenced.

The so defined problem is an inherent chicken-and-egg question as nodes are influenced by groups of other nodes (or hyperedges), while hyperedges (or group of nodes) are influenced by the nodes they contain. In this paper, we provide a formal definition of the influence diffusion problem on hypergraphs. We propose a set of greedy-based heuristic strategies for finding the minimum influence target set, and we present an in-depth analysis of their performance on several classes of random hypergraphs. Furthermore, we describe an experiment on a real use-case, based on the character co-occurrences network of the Game-of-Thrones TV Series.

Alessia Antelmi, Gennaro Cordasco, Carmine Spagnuolo, Przemysław Szufel

A Scalable Unsupervised Framework for Comparing Graph Embeddings

Abstract

Graph embedding is a transformation of vertices of a graph into a set of vectors. A good embedding should capture the graph topology, vertex-to-vertex relationship, and other relevant information about the graph, its subgraphs, and vertices. If these objectives are achieved, an embedding is a meaningful, understandable, and often compressed representations of a network. Unfortunately, selecting the best embedding is a challenging task and very often requires domain experts.

In the recent paper [1], we propose a “divergence score” that can be assigned to embeddings to help distinguish good ones from bad ones. This general framework provides a tool for an unsupervised graph embedding comparison. The complexity of the original algorithm was quadratic in the number of vertices. It was enough to show that the proposed method is feasible and has practical potential (proof-of-concept). In this paper, we improve the complexity of the original framework and design a scalable approximation algorithm. Moreover, we perform some detailed quality and speed benchmarks.

Bogumił Kamiński, Paweł Prałat, François Théberge

Assortativity and Bidegree Distributions on Bernoulli Random Graph Superpositions

Abstract

A probabilistic generative network model with n nodes and m overlapping layers is obtained as a superposition of m mutually independent Bernoulli random graphs of varying size and strength. When n and m are large and of the same order of magnitude, the model admits a sparse limiting regime with a tunable power-law degree distribution and nonvanishing clustering coefficient. This article presents an asymptotic formula for the joint degree distribution of adjacent nodes. This yields a simple analytical formula for the model assortativity, and opens up ways to analyze rank correlation coefficients suitable for random graphs with heavy-tailed degree distributions.

Mindaugas Bloznelis, Joona Karjalainen, Lasse Leskelä

Clustering Coefficient of a Preferred Attachment Affiliation Network

Abstract

It is well known that the global clustering coefficient of a standard preferential attachment random graph vanishes as the number of vertices tends to \(\infty \). We evaluate the global clustering coefficient of the preferred attachment affiliation network [4] and show that it is bounded away from zero.

Daumilas Ardickas, Mindaugas Bloznelis

Transience Versus Recurrence for Scale-Free Spatial Networks

Abstract

Weight-dependent random connection graphs are a class of local network models that combine scale-free degree distribution, small-world properties and clustering. In this paper we discuss recurrence or transience of these graphs, features that are relevant for the performance of search and information diffusion algorithms on the network.

Peter Gracar, Markus Heydenreich, Christian Mönch, Peter Mörters

The Iterated Local Directed Transitivity Model for Social Networks

Abstract

We introduce a new, deterministic directed graph model for social networks, based on the transitivity of triads. In the Iterated Local Directed Transitivity (ILDT) model, new nodes are born over discrete time-steps and inherit the link structure of their parent nodes. The ILDT model may be viewed as a directed graph analog of the ILT model for undirected graphs introduced in [4]. We investigate network science and graph-theoretical properties of ILDT digraphs. We prove that the ILDT model exhibits a densification power law, so that the digraphs generated by the models densify over time. The number of directed triads are investigated, and counts are given of the number of directed 3-cycles and transitive 3-cycles. A higher number of transitive 3-cycles are generated by the ILDT model, as found in real-world, on-line social networks that have orientations on their edges. We discuss the eigenvalues of the adjacency matrices of ILDT digraphs. We finish by showing that in many instances of the chosen initial digraph, the model eventually generates digraphs with Hamiltonian directed cycles.

Anthony Bonato, Daniel W. Cranston, Melissa A. Huggan, Trent Marbach, Raja Mutharasan

A Note on the Conductance of the Binomial Random Intersection Graph

Abstract

We establish the rapid mixing property of a binomial random intersection graph \(\mathcal {G}_{}\left( n,m,p\right) \) introduced in [11, 17]. For this purpose we show that the conductance is bounded away from zero by a positive constant. We consider the range of parameters n, m, p where the edge density is just above the connectivity threshold (our graph is connected with high probability as \(n,m\rightarrow +\infty \)). We assume in addition that \(np=\varTheta (1)\).

Katarzyna Rybarczyk, Mindaugas Bloznelis, Jerzy Jaworski

Iterated Global Models for Complex Networks

Abstract

We introduce the Iterated Global model as a deterministic graph process that simulates several properties of complex networks. In this model, for every set S of nodes of a prescribed cardinality, we add a new node that is adjacent to every node in S. We focus on the case where the size of S is approximately half the number of nodes at each time-step, and we refer to this as the half-model. The half-model provably generate graphs that densify over time, have bad spectral expansion, and low diameter. We derive the clique, chromatic, and domination numbers of graphs generated by the model.

Anthony Bonato, Erin Meger

A Robust Method for Statistical Testing of Empirical Power-Law Distributions

Abstract

The World-Wide-Web is a complex system naturally represented by a directed network of documents (nodes) connected through hyperlinks (edges). In this work, we focus on one of the most relevant topological properties that characterize the network, i.e. being scale-free. A directed network is scale-free if its in-degree and out-degree distributions have an approximate and asymptotic power-law behavior. If we consider the Web as a whole, it presents empirical evidence of such property. On the other hand, when we restrict the study of the degree distributions to specific sub-categories of websites, there is no longer strong evidence for it. For this reason, many works questioned the almost universal ubiquity of the scale-free property. Moreover, existing statistical methods to test whether an empirical degree distribution follows a power law suffer from large sample sizes and/or noisy data.

In this paper, we propose an extension of a state-of-the-art method that overcomes such problems by applying a Monte Carlo sub-sampling procedure on the graphs. We show on synthetic experiments that even small variations of true power-law distributed data causes the state-of-the-art method to reject the hypothesis, while the proposed method is more sound and stable under such variations.

Lastly, we perform a study on 3 websites showing that indeed, depending on their category, some accept and some refuse the hypothesis of being power-law. We argue that our method could be used to better characterize topological properties deriving from different generative principles: central or peripheral.

Davide Garbarino, Veronica Tozzo, Andrea Vian, Annalisa Barla

Community Structures in Information Networks for a Discrete Agent Population

Abstract

Using a game-theoretic framework, we characterize the community structure that emerges in a social (information) network. Our analysis generalizes the results in [1, 2] that were obtained for the case of a continuous population model for the agents in the social network, to the case of a discrete agent population model. We note that a discrete agent set reflects more accurately real-life information networks, and are needed in order to get additional insights into the community structure, such as for example the connectivity (graph structure) within in a community, as well as information dissemination within a community.

Peter Marbach

Backmatter

Titel: Algorithms and Models for the Web Graph
herausgegeben von: Dr. Bogumił Kamiński
Paweł Prałat
Dr. Przemysław Szufel
Verlag: Springer International Publishing
Electronic ISBN: 978-3-030-48478-1
Print ISBN: 978-3-030-48477-4
DOI: https://doi.org/10.1007/978-3-030-48478-1