Skip to main content

2012 | Buch

Algorithms and Models for the Web Graph

9th International Workshop, WAW 2012, Halifax, NS, Canada, June 22-23, 2012. Proceedings

herausgegeben von: Anthony Bonato, Jeannette Janssen

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the 9th International Workshop on Algorithms and Models for the Web-Graph, WAW 2012, held in Halifax, Nova Scotia, Canada, in June 2012. The 13 papers presented were carefully reviewed and selected for inclusion in this volume. They address a number of topics related to the complex networks such hypergraph coloring games and voter models; algorithms for detecting nodes with large degrees; random Appolonian networks; and a sublinear algorithm for Pagerank computations.

Inhaltsverzeichnis

Frontmatter
Hypergraph Coloring Games and Voter Models
Abstract
We analyze a network coloring game on hypergraphs which can also describe a voter model. Each node represents a voter and is colored according to its preferred candidate (or undecided). Each hyperedge is a subset of voters that can interact and influence one another. In each round of the game, one hyperedge is chosen randomly, and the voters in the hyperedge can change their colors according to some prescribed probability distribution. We analyze this interaction model based on random walks on the associated weighted, directed state graph. We consider three scenarios — a memoryless game, a partially memoryless game and the general game using the memoryless game for comparison and analysis. Under certain ‘memoryless’ restrictions, we can use semigroup spectral methods to explicitly determine the spectrum of the state graph, and the random walk on the state graph converges to its stationary distribution in O(m logn) steps, where n is the number of voters and m is the number of hyperedges. This can then be used to determine an appropriate cut-off time for voting: we can estimate probabilities that events occur within an error bound of ε by simulating the voting game for O(log(1/ε) m logn) rounds. Next, we consider a partially memoryless game whose associated random walk can be written as a linear combination of a memoryless random walk and another given random walk. In such a setting, we provide bounds on the convergence time to the stationary distribution, highlighting a tradeoff between the proportion of memorylessness and the time required. To analyze the general interaction model, we will first construct a companion memoryless process and then choose an appropriate damping constant β to build a partially memoryless process. The partially memoryless process can serve as an approximation of the actual interaction dynamics for determining the cut-off time if the damping constant is appropriately chosen either by using simulation or depending on the rules of interaction.
Fan Chung, Alexander Tsiatas
On a DAG Partitioning Problem
Abstract
We study the following DAG Partitioning problem: given a directed acyclic graph with arc weights, delete a set of arcs of minimum total weight so that each of the resulting connected components has exactly one sink. We prove that the problem is hard to approximate in a strong sense: If \(\mathcal P\neq \mathcal{NP}\) then for every fixed ε > 0, there is no (n 1 − ε )-approximation algorithm, even if the input graph is restricted to have unit weight arcs, maximum out-degree three, and two sinks. We also present a polynomial time algorithm for solving the DAG Partitioning problem in graphs with bounded pathwidth.
Soroush Alamdari, Abbas Mehrabian
Some Typical Properties of the Spatial Preferred Attachment Model
Abstract
We investigate a stochastic model for complex networks, based on a spatial embedding of the nodes, called the Spatial Preferred Attachment (SPA) model. In the SPA model, nodes have spheres of influence of varying size, and new nodes may only link to a node if they fall within its influence region. The spatial embedding of the nodes models the background knowledge or identity of the node, which influences its link environment. In this paper, we focus on the (directed) diameter, small separators, and the (weak) giant component of the model.
Colin Cooper, Alan Frieze, Paweł Prałat
A Sublinear Time Algorithm for PageRank Computations
Abstract
In a network, identifying all vertices whose PageRank is more than a given threshold value Δ is a basic problem that has arisen in Web and social network analyses. In this paper, we develop a nearly optimal, sublinear time, randomized algorithm for a close variant of this problem. When given a directed network G = (V,E), a threshold value Δ, and a positive constant c > 3, with probability 1 − o(1), our algorithm will return a subset S ⊆ V with the property that S contains all vertices of PageRank at least Δ and no vertex with PageRank less than Δ/c. The running time of our algorithm is always \(\tilde{O}(\frac{n}{\Delta})\). In addition, our algorithm can be efficiently implemented in various network access models including the Jump and Crawl query model recently studied by [6], making it suitable for dealing with large social and information networks.
As part of our analysis, we show that any algorithm for solving this problem must have expected time complexity of \({\Omega}(\frac{n}{\Delta})\). Thus, our algorithm is optimal up to logarithmic factors. Our algorithm (for identifying vertices with significant PageRank) applies a multi-scale sampling scheme that uses a fast personalized PageRank estimator as its main subroutine. For that, we develop a new local randomized algorithm for approximating personalized PageRank which is more robust than the earlier ones developed by Jeh and Widom [9] and by Andersen, Chung, and Lang [2].
Christian Borgs, Michael Brautbar, Jennifer Chayes, Shang-Hua Teng
Quick Detection of Nodes with Large Degrees
Abstract
Our goal is to quickly find top k lists of nodes with the largest degrees in large complex networks. If the adjacency list of the network is known (not often the case in complex networks), a deterministic algorithm to find the top k list of nodes with the largest degrees requires an average complexity of \(\mbox{O}(n)\), where n is the number of nodes in the network. Even this modest complexity can be very high for large complex networks. We propose to use the random walk based method. We show theoretically and by numerical experiments that for large networks the random walk method finds good quality top lists of nodes with high probability and with computational savings of orders of magnitude. We also propose stopping criteria for the random walk method which requires very little knowledge about the structure of the network.
Õ(n 2)
Konstantin Avrachenkov, Nelly Litvak, Marina Sokol, Don Towsley
Ranking and Sparsifying a Connection Graph
Abstract
Many problems arising in dealing with high-dimensional data sets involve connection graphs in which each edge is associated with both an edge weight and a d-dimensional linear transformation. We consider vectorized versions of the PageRank and effective resistance which can be used as basic tools for organizing and analyzing complex data sets. For example, the generalized PageRank and effective resistance can be utilized to derive and modify diffusion distances for vector diffusion maps in data and image processing. Furthermore, the edge ranking of the connection graphs determined by the vectorized PageRank and effective resistance are an essential part of sparsification algorithms which simplify and preserve the global structure of connection graphs.
Fan Chung, Wenbo Zhao
A Game-Theoretic Model of Attention in Social Networks
Abstract
We model the economics of producing content in online social networks such as Facebook and Twitter. We propose a game-theoretic model within which we quantify inefficiencies from contributions by strategic users in online environments. Attention and information are assumed to be the main motivation for user contributions. We treat attention as a mechanism for sharing the profit from consuming information and introduce a general framework for analyzing dynamics of contributions in online environments. We analyze the proposed model and identify conditions for existence and efficient computation of pure-strategy Nash equilibrium.
We prove a bicriteria bound on the price of anarchy; in particular we show that the social welfare from central control over level of contribution by users is no larger than the social welfare from strategic agents with twice as large consumption utilities. We then construct and analyze a family of production games that have an arbitrarily large price of anarchy. We also prove non-robustness of the price of anarchy for a particular instance of the introduced family, establishing a distinction between the games studied here and network congestion games.
Ashish Goel, Farnaz Ronaghi
On Certain Properties of Random Apollonian Networks
Abstract
In this work we analyze fundamental properties of Random Apollonian Networks [34,35], a popular random graph model which generates planar graphs with power law properties. Specifically, we analyze (a) the degree distribution, (b) the k largest degrees, (c) the k largest eigenvalues and (d) the diameter, where k is a constant.
Alan Frieze, Charalampos E. Tsourakakis
Mutual or Unrequited Love: Identifying Stable Clusters in Social Networks with Uni- and Bi-directional Links
Abstract
Many social networks, e.g., Slashdot and Twitter, can be represented as directed graphs (digraphs) with two types of links between entities: mutual (bi-directional) and one-way (uni-directional) connections. Social science theories reveal that mutual connections are more stable than one-way connections, and one-way connections exhibit various tendencies to become mutual connections. It is therefore important to take such tendencies into account when performing clustering of social networks with both mutual and one-way connections.
In this paper, we utilize the dyadic methods to analyze social networks, and develop a generalized mutuality tendency theory to capture the tendencies of those node pairs which tend to establish mutual connections more frequently than those occur by chance. Using these results, we develop a mutuality-tendency-aware spectral clustering algorithm to identify more stable clusters by maximizing the within-cluster mutuality tendency and minimizing the cross-cluster mutuality tendency. Extensive simulation results on synthetic datasets as well as real online social network datasets such as Slashdot, demonstrate that our proposed mutuality-tendency-aware spectral clustering algorithm extracts more stable social community structures than traditional spectral clustering methods.
Yanhua Li, Zhi-Li Zhang, Jie Bao
Dynamic PageRank Using Evolving Teleportation
Abstract
The importance of nodes in a network constantly fluctuates based on changes in the network structure as well as changes in external interest. We propose an evolving teleportation adaptation of the PageRank method to capture how changes in external interest influence the importance of a node. This framework seamlessly generalizes PageRank because the importance of a node will converge to the PageRank values if the external influence stops changing. We demonstrate the effectiveness of the evolving teleportation on the Wikipedia graph and the Twitter social network. The external interest is given by the number of hourly visitors to each page and the number of monthly tweets for each user.
Ryan A. Rossi, David F. Gleich
Multi-commodity Allocation for Dynamic Demands Using PageRank Vectors
Abstract
We consider a variant of the contact process concerning multi-commodity allocation on networks. In this process, the demands for several types of commodities are initially given at some specified vertices and then the demands spread interactively on a contact graph. To allocate supplies in such a dynamic setting, we use a modified version of PageRank vectors, called Kronecker PageRank, to identify vertices for shipping supplies. We analyze both the situation that the demand distribution evolves mostly in clusters around the initial vertices and the case that the demands spread to the whole network. We establish sharp upper bounds for the probability that the demands are satisfied as a function of PageRank vectors.
Fan Chung, Paul Horn, Jacob Hughes
Are We There Yet? When to Stop a Markov Chain while Generating Random Graphs
Abstract
Markov chains are convenient means of generating realizations of networks with a given (joint or otherwise) degree distribution, since they simply require a procedure for rewiring edges. The major challenge is to find the right number of steps to run such a chain, so that we generate truly independent samples. Theoretical bounds for mixing times of these Markov chains are too large to be practically useful. Practitioners have no useful guide for choosing the length, and tend to pick numbers fairly arbitrarily. We give a principled mathematical argument showing that it suffices for the length to be proportional to the number of desired number of edges. We also prescribe a method for choosing this proportionality constant. We run a series of experiments showing that the distributions of common graph properties converge in this time, providing empirical evidence for our claims.
Jaideep Ray, Ali Pinar, C. Seshadhri
A Fast Algorithm to Find All High Degree Vertices in Graphs with a Power Law Degree Sequence
Abstract
We develop a fast method for finding all high degree vertices of a connected graph with a power law degree sequence. The method uses a biassed random walk, where the bias is a function of the power law c of the degree sequence.
Let G(t) be a t-vertex graph, with degree sequence power law c ≥ 3 generated by a generalized preferential attachment process which adds m edges at each step. Let S a be the set of all vertices of degree at least t a in G(t). We analyze a biassed random walk which makes transitions along undirected edges {x,y} proportional to (d(x)d(y)) b , where d(x) is the degree of vertex x and b > 0 is a constant parameter. Choosing the parameter b = (c − 1)(c − 2)/(2c − 3), the random walk discovers the set S a completely in \(\widetilde{O}(t^{1-2ab(1-\epsilon)})\) steps with high probability. The error parameter ε depends on c,a and m. We use the notation \(\tilde O(x)\) to mean O(x log k x) for some constant k > 0.
The cover time of the entire graph G(t) by the biassed walk is \(\widetilde{O}(t)\). Thus the expected time to discover all vertices by the biassed walk is not much higher than in the case of a simple random walk Θ(t logt).
The standard preferential attachment process generates graphs with power law c = 3. Choosing search parameter b = 2/3 is appropriate for such graphs. We conduct experimental tests on a preferential attachment graph, and on a sample of the underlying graph of the www with power law c ~3 which support the claimed property.
Colin Cooper, Tomasz Radzik, Yiannis Siantos
Backmatter
Metadaten
Titel
Algorithms and Models for the Web Graph
herausgegeben von
Anthony Bonato
Jeannette Janssen
Copyright-Jahr
2012
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-30541-2
Print ISBN
978-3-642-30540-5
DOI
https://doi.org/10.1007/978-3-642-30541-2

Premium Partner