Algorithmic Challenge in Online Advertising

Abstract

Computational advertising is an emerging new scientific sub-discipline, at the intersection of large scale search and text analysis, information retrieval, statistical modeling, machine learning, classification, optimization, and microeconomics. The central challenge of computational advertising is to find the “best match” between a given user in a given context and a suitable advertisement. The context could be a user entering a query in a search engine (“sponsored search”), a user reading a web page (“content match” and “display ads”), a user watching a movie on a portable device, and so on. The information about the user can vary from scarily detailed to practically nil. The number of potential advertisements might be in the billions. Thus, depending on the definition of “best match” this challenge leads to a variety of massive optimization and search problems, with complicated constraints.

This talk will give an introduction to this area focusing mostly on the algorithmic challenges encountered in practice.

Andrei Z. Broder

Parallel Algorithms for Collaborative Filtering

Abstract

Collaborative filtering has been widely used to predict the interests of a user. Given a users past activities, collaborative filtering predicts the users future preferences. This talk presents techniques and discoveries of our recent parallelization effort on collaborative filtering algorithms. In particular, parallel association mining and parallel latent Dirichlet allocation will be presented and their pros and cons analyzed. Some counter-intuitive results will also be presented to stimulate future parallel optimization research.

Edward Y. Chang

On the Approximability of Some Haplotyping Problems

Abstract

In this paper, we study several versions of optimization problems related to haplotype reconstruction/identification. The input to the first problem is a set C ₁ of haplotypes, a set C ₂ of haplotypes, and a set G of genotypes. The objective is to select the minimum number of haplotypes from C ₂ so that together with haplotypes in C ₁ they resolve all (or the maximum number of) genotypes in G. We show that this problem has a factor-O(logn) polynomial time approximation. We also show that this problem does not admit any approximation with a factor better than O(logn) unless P=NP. For the corresponding reconstruction problem, i.e., when C ₂ is not given, the same approximability results hold.

The other versions of the haplotype identification problem are based on single individual haplotyping, including the well-known Minimum Fragment Removal (MFR) and Minimum SNP Removal (MSR), which have both shown to be APX-hard previously. We show in this paper that MFR has a polynomial time O(logn)-factor approximation. We also consider Maximum Fragment Identification (MFI), which is the complementary version of MFR; and Maximum SNP Identification (MSI), which is the complementary version of MSR. We show that, for any positive constant ε< 1, neither MFI nor MSI has a factor-n ^1 − ε polynomial time approximation algorithm unless P=NP.

John Abraham, Zhixiang Chen, Richard Fowler, Bin Fu, Binhai Zhu

On Acyclicity of Games with Cycles

Abstract

We study restricted improvement cycles (ri-cycles) in finite positional n-person games with perfect information modeled by directed graphs (digraphs) that may contain cycles. We obtain criteria of restricted improvement acyclicity (ri-acyclicity) in two cases: for n = 2 and for acyclic digraphs. We provide several examples that outline the limits of these criteria and show that, essentially, there are no other ri-acyclic cases. We also discuss connections between ri-acyclicity and some open problems related to Nash-solvability.

Daniel Andersson, Vladimir Gurvich, Thomas Dueholm Hansen

Discrete online TSP

Abstract

In this paper we introduce a discrete version of the online traveling salesman problem (DOLTSP). We represent the metric space using a weighted graph, where the server is allowed to modify its route only at the vertices. This limitation directly affects the capacity of the server to react and increases the risk related to each decision. We prove lower bounds on the performance of deterministic online algorithms in different scenarios of DOLTSP, and we present distinct algorithms for the problem, some of them achieving the best possible performance. We measure the performance of the algorithms using competitive analysis, the most widely accepted method for evaluating online algorithms. Besides, we perform an empirical simulation on paths, generating a significant set of instances and measuring the quality of the solutions given by each algorithm. Our experiments show that algorithms with the best competitive ratio do not have the best performance in practice.

Mauro Aprea, Esteban Feuerstein, Gustavo Sadovoy, Alejandro Strejilevich de Loma

On Approximating an Implicit Cover Problem in Biology

Abstract

In an implicit combinatorial optimization problem, the constraints are not enumerated explicitly but rather stated implicitly through equations, other constraints or auxiliary algorithms. An important subclass of such problems is the implicit set cover (or, equivalently, hitting set) problem in which the sets are not given explicitly but rather defined implicitly. For example, the well-known minimum feedback arc set problem is such a problem. In this paper, we consider such a cover problem that arises in the study of wild populations in biology in which the sets are defined implicitly via the Mendelian constraints and prove approximability results for this problem.

Mary V. Ashley, Tanya Y. Berger-Wolf, Wanpracha Chaovalitwongse, Bhaskar DasGupta, Ashfaq Khokhar, Saad Sheikh

Power Indices in Spanning Connectivity Games

Abstract

The Banzhaf index, Shapley-Shubik index and other voting power indices measure the importance of a player in a coalitional game. We consider a simple coalitional game called the spanning connectivity game (SCG) based on an undirected, unweighted multigraph, where edges are players. We examine the computational complexity of computing the voting power indices of edges in the SCG. It is shown that computing Banzhaf values is #P-complete and computing Shapley-Shubik indices or values is NP-hard for SCGs. Interestingly, Holler indices and Deegan-Packel indices can be computed in polynomial time. Among other results, it is proved that Banzhaf indices can be computed in polynomial time for graphs with bounded treewidth. It is also shown that for any reasonable representation of a simple game, a polynomial time algorithm to compute the Shapley-Shubik indices implies a polynomial time algorithm to compute the Banzhaf indices. This answers (positively) an open question of whether computing Shapley-Shubik indices for a simple game represented by the set of minimal winning coalitions is NP-hard.

Haris Aziz, Oded Lachish, Mike Paterson, Rahul Savani

Efficiently Generating k-Best Solutions to Procurement Auctions

Abstract

Procurement executives often find it difficult to articulate their preferences and constraints regarding auctions, making it difficult to cast procurement decisions as straightforward optimization problems. This paper presents an efficient algorithm to aid decision support in such situations. Instead of trying to compute a single optimal solution for the auction winner determination problem, we generate many candidate solutions in ascending order of buyer expenditure. Standard techniques such as clustering and dominance pruning can then trim this list to a compact yet diverse menu of alternatives; other analyses can illuminate the cost of constraints and the competitive landscape. Our efficient solution-generation algorithm addresses sealed-bid procurement auctions with multiple suppliers and multiple types of goods available in multiple units. It supports multi-sourcing and volume discounts/surcharges in bids. Our algorithm may optionally incorporate certain classes of hard constraints, generating only solutions that satisfy them.

Andrew Byde, Terence Kelly, Yunhong Zhou, Robert Tarjan

Integer Polyhedra for Program Analysis

Abstract

Polyhedra are widely used in model checking and abstract interpretation. Polyhedral analysis is effective when the relationships between variables are linear, but suffers from imprecision when it is necessary to take into account the integrality of the represented space. Imprecision also arises when non-linear constraints occur. Moreover, in terms of tractability, even a space defined by linear constraints can become unmanageable owing to the excessive number of inequalities. Thus it is useful to identify those inequalities whose omission has least impact on the represented space. This paper shows how these issues can be addressed in a novel way by growing the integer hull of the space and approximating the number of integral points within a bounded polyhedron.

Philip J. Charles, Jacob M. Howe, Andy King

Line Segment Facility Location in Weighted Subdivisions

Abstract

In this paper we present approximation algorithms for solving the line segment facility location problem in weighted regions. The weighted region setup is a more realistic model for many facility location problems that arise in practical applications. Our algorithms exploit an interesting property of the problem, that could possibly be used for solving other problems in weighted regions.

Yam Ki Cheung, Ovidiu Daescu

Algorithms for Placing Monitors in a Flow Network

(Preliminary Version)

Abstract

In the Flow Edge-Monitor Problem, we are given an undirected graph G = (V,E), an integer k > 0 and some unknown circulation ψ on G. We want to find a set of k edges in G, so that if we place k monitors on those edges to measure the flow along them, the total number of edges for which the flow can be uniquely determined is maximized. In this paper, we first show that the Flow Edge-Monitor Problem is NP-hard, and then we give two approximation algorithms: a 3-approximation algorithm with running time O((m + n)²) and a 2-approximation algorithm with running time O((m + n)³), where n = |V| and m = |E|.

Francis Chin, Marek Chrobak, Li Yan

Three Results on Frequency Assignment in Linear Cellular Networks

(Extended Abstract)

Abstract

In the frequency assignment problem we are given a graph representing a wireless network and a sequence of requests, where each request is associated with a vertex. Each request has two more attributes: its arrival and departure times, and it is considered active from the time of arrival to the time of departure. We want to assign frequencies to all requests so that at any time no two active requests associated with the same or adjacent nodes use the same frequency. The objective is to minimize the number of frequencies used.

We focus exclusively on the special case of the problem when the underlying graph is a linear network (path). For this case, we consider both the offline and online versions of the problem, and we present three results. First, in the incremental online case, where the requests arrive over time, but never depart, we give an algorithm with an optimal (asymptotic) competitive ratio \(\frac{4}{3}\). Second, in the general online case, where the requests arrive and depart over time, we improve the current lower bound on the (asymptotic) competitive ratio to \(\frac{11}{7}\). Third, we prove that the offline version of this problem is \({\mathbb {NP}}\)-complete.

Marek Chrobak, Jiří Sgall

Link Distance and Shortest Path Problems in the Plane

Abstract

We develop algorithms to compute Voronoi diagrams, shortest path maps, and the Fréchet distance in the plane with polygonal obstacles. Distances between points are measured either by link distance or by Euclidean shortest path distance.

Atlas F. Cook IV, Carola Wenk

Orca Reduction and ContrAction Graph Clustering

Abstract

During the last years, a wide range of huge networks has been made available to researchers. The discovery of natural groups, a task called graph clustering, in such datasets is a challenge arising in many applications such as the analysis of neural, social, and communication networks.

We here present Orca, a new graph clustering algorithm, which operates locally and hierarchically contracts the input. In contrast to most existing graph clustering algorithms, which operate globally, Orca is able to cluster inputs with hundreds of millions of edges in less than 2.5 hours, identifying clusterings with measurably high quality. Our approach explicitly avoids maximizing any single index value such as modularity, but instead relies on simple and sound structural operations. We present and discuss the Orca algorithm and evaluate its performance with respect to both clustering quality and running time, compared to other graph clustering algorithms.

Daniel Delling, Robert Görke, Christian Schulz, Dorothea Wagner

Equiseparability on Terminal Wiener Index

Abstract

Wiener index as one of the oldest chemical index has been well studied. It has been extensive used in Computational Biology, Preliminary screening of drugs and Complex Network. Based on variable Wiener index, I.Gutman et al [6] introduced the concept of equiseparable pairs of trees and chemical trees, meanwhile they gave a rule on how to construct such equiseparable pairs. D.Vukic̆ević and I.Gutman [8] proved almost all trees and chemical trees have equiseparable mates, which is a disadvantageous property of many molecular-structure graph-based descriptors. Recently, I.Gutman et al [9] proposed the concept of Terminal Wiener Index, which equals to the summation of distance between all pairs of pendent vertices of trees. Following this line, we explore the properties of terminal Wiener index, and show the fact that there still exist pairs of trees and chemical trees which can not be distinguished by it, therefore we give some general methods to construct equiseparable pairs and compare the methods in the case of Wiener index. More specifically, we show that terminal Wiener index is degenerative to some extent.

Xiaotie Deng, Jie Zhang

Effective Tour Searching for TSP by Contraction of Pseudo Backbone Edges

Abstract

We introduce a reduction technique for the well-known TSP. The basic idea of the approach consists of transforming a TSP instance to another one with smaller size by contracting pseudo backbone edges computed in a preprocessing step, where pseudo backbone edges are edges which are likely to be in an optimal tour. A tour of the small instance can be re-transformed to a tour of the original instance. We experimentally investigated TSP benchmark instances by our reduction technique combined with the currently leading TSP heuristic of Helsgaun. The results strongly demonstrate the effectivity of this reduction technique: for the six VLSI instances xvb13584, pjh17845, fnc19402, ido21215, boa28924, and fht47608 we could set world records, i.e., find better tours than the best tours known so far. The success of this approach is mainly due to the effective reduction of the problem size so that we can search the more important tour subspace more intensively.

Changxing Dong, Gerold Jäger, Dirk Richter, Paul Molitor

Optimal Auctions Capturing Constraints in Sponsored Search

Abstract

Most sponsored search auctions use the Generalized Second Price (GSP) rule. Given the GSP rule, they try to give an optimal allocation, an easy task when the only need is to allocate ads to slots. However, when other practical conditions must be fulfilled –such as budget constraints, exploration of the performance of new ads, etc.– optimal allocations are hard to obtain. We provide a method to optimally allocate ads to slots under the practical conditions mentioned above. Our auctions are stochastic, and can be applied in tandem with different pricing rules, among which we highlight two: an intuitive generalization of GSP and VCG payments.

Esteban Feuerstein, Pablo Ariel Heiber, Matías Lopez-Rosenfeld, Marcelo Mydlarz

A Note on Estimating Hybrid Frequency Moment of Data Streams

Abstract

We consider the problem of estimating the hybrid frequency moment of matrix data that is updated point-wise in arbitrary order by a data stream. In this model, data is viewed to be organized in the form of a matrix (A _i,j)_{1 ≤ i,j, ≤ n}. The entries A _i,j are updated coordinate-wise (both increments and decrements are allowed), in arbitrary order and possibly multiple times. The hybrid frequency moment F _p,q(A) is defined as \(\sum_{j=1}^n\left( \sum_{i=1}^n \lvert{A_{i,j}}\rvert^p\right)^q\) and is a generalization of the frequency moment of one-dimensional data streams.

Prior work [10] presented a nearly space-optimal algorithm for estimating F _p,q for p ∈ [0,2] and q ∈ [0,1]. Here, we complement that work by presenting a nearly space-optimal algorithm for estimating F _p,q for p ∈ [0,1] and q ∈ [0,2].

Sumit Ganguly

Two-Level Push-Relabel Algorithm for the Maximum Flow Problem

Abstract

We describe a two-level push-relabel algorithm for the maximum flow problem and compare it to the competing codes. The algorithm generalizes a practical algorithm for bipartite flows. Experiments show that the algorithm performs well on several problem families.

Andrew V. Goldberg

A More Relaxed Model for Graph-Based Data Clustering: s-Plex Editing

Abstract

We introduce the s -Plex Editing problem generalizing the well-studied Cluster Editing problem, both being NP-hard and both being motivated by graph-based data clustering. Instead of transforming a given graph by a minimum number of edge modifications into a disjoint union of cliques (Cluster Editing), the task in the case of s -Plex Editing is now to transform a graph into a disjoint union of so-called s-plexes. Herein, an s-plex denotes a vertex set inducing a (sub)graph where every vertex has edges to all but at most s vertices in the s-plex. Cliques are 1-plexes. The advantage of s-plexes for s ≥ 2 is that they allow to model a more relaxed cluster notion (s-plexes instead of cliques), which better reflects inaccuracies of the input data. We develop a provably efficient and effective preprocessing based on data reduction (yielding a so-called problem kernel), a forbidden subgraph characterization of s-plex cluster graphs, and a depth-bounded search tree which is used to find optimal edge modification sets. Altogether, this yields efficient algorithms in case of moderate numbers of edge modifications.

Jiong Guo, Christian Komusiewicz, Rolf Niedermeier, Johannes Uhlmann

Dynamic Position Auctions with Consumer Search

Abstract

Building upon the static model of Athey and Ellison [1], we demonstrate the efficient convergence of dynamic position auctions in the presence of consumer search. The entry of low-quality advertisers does not slow this convergence. Our methods are extensions of those introduced by Cary et al. [2]. The applicability of these methods in the presence of consumer search indicates the robustness of the approach and suggests that convergence of dynamic position auction models is demonstrable whenever the associated static equilibrium strategies are sufficiently well-behaved.

Scott Duke Kominers

Nonlinear Optimization over a Weighted Independence System

Abstract

We consider the problem of optimizing a nonlinear objective function over a weighted independence system presented by a linear optimization oracle. We provide a polynomial-time algorithm that determines an r-best solution for nonlinear functions of the total weight of an independent set, where r is a constant depending on certain Frobenius numbers of the weights and is independent of the size of the ground set. In contrast, we show that finding an optimal (0-best) solution requires exponential time even in a very special case of the problem.

Jon Lee, Shmuel Onn, Robert Weismantel

Improved Online Algorithms for Multiplexing Weighted Packets in Bounded Buffers

Abstract

Motivated by providing differentiated services in the Internet, we consider online buffer management algorithms for quality-of-service network switches. We study a multi-buffer model. Packets have values and deadlines; they arrive at a switch over time. The switch consists of multiple buffers whose sizes are bounded. In each time step, only one pending packet can be sent. Our objective is to maximize the total value of the packets sent by their deadlines. We employ competitive analysis to measure an online algorithm’s performance. In this paper, we first show that the lower bound of competitive ratio of a broad family of online algorithms is 2. Then we propose a (\(3 + \sqrt{3} \approx 4.723\))-competitive deterministic algorithm, which is improved from the previously best-known result 9.82 (Azar and Levy. SWAT 2006).

Fei Li

Latency Constrained Aggregation in Chain Networks Admits a PTAS

Abstract

This paper studies the aggregation of messages in networks that consist of a chain of nodes, and each message is time-constrained such that it needs to be aggregated during a given time interval, called its due interval. The objective is to minimize the maximum cost incurred at any node, which is for example a concern in wireless sensor networks, where it is crucial to distribute the energy consumption as equally as possible. First, we settle the complexity of this problem by proving its NP-hardness, even for the case of unit length due intervals. Second, we give a QPTAS, which we extend to a PTAS for the special case that the lengths of the due intervals are constants. This is in particular interesting, since we prove that this problem becomes APX-hard if we consider tree networks instead of chain networks, even for the case of unit length due intervals. Specifically, we show that it cannot be approximated within 4/3 − ε for any ε> 0 , unless P=NP.

Tim Nonner, Alexander Souza

Cutting a Cake for Five People

Abstract

Given a heterogeneous cake and 5 players, we give the first bounded algorithm for computing an envy-free division of the cake, such that each person thinks he gets the largest piece. The case with 4 players was solved in a famous paper by Brams et al. in 1997. Our algorithm can be discretized to obtain an ε envy-free division in \(O({\rm polylog} \left( 1 / \epsilon \right))\) time. The algorithm is based on augmenting the irrevocable advantage graph in a new way.

We also look at the open problem of finding discrete procedures for computing envy-free division among 4 players. We present a simple algorithm that finds an envy-free division of a portion of the cake, such that each player gets at least 1/4 of the whole cake (in his valuation).

Amin Saberi, Ying Wang

PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications

Abstract

This paper presents PLDA, our parallel implementation of Latent Dirichlet Allocation on MPI and MapReduce. PLDA smooths out storage and computation bottlenecks and provides fault recovery for lengthy distributed computations. We show that PLDA can be applied to large, real-world applications and achieves good scalability. We have released MPI-PLDA to open source at http://code.google.com/p/plda under the Apache License.

Yi Wang, Hongjie Bai, Matt Stanton, Wen-Yen Chen, Edward Y. Chang

On Job Scheduling with Preemption Penalties

Abstract

This paper studies the problem of online job scheduling in a model with preemption penalty introduced by Zheng et al. [11]. In such a model with preemption penalty parameter ρ, the scheduler has to pay a penalty of ρ times the weight of each aborted job. We consider two cases according to the scheduler’s knowledge of Δ (ratio of length between longest and shortest jobs). In the first case where the exact value of Δ is known at the beginning, we re-investigate the WAL algorithm of Zheng et al. and prove that it is ((1 + ρ)Δ + o(Δ))-competitive for sufficiently large Δ. In particular, when ρ= 1, the previous competitive ratio of 3Δ + o(Δ) proved in [11] is improved to 2Δ + o(Δ). In the second case where the online strategy only knows beforehand that Δ ≥ k ³(ρ + 1)³ for some parameter k > 1, a \((\frac{k(1+\rho)}{k-1}\Delta+o(\Delta))\)-competitive deterministic strategy is presented. For large Δ, the competitive ratio approaches that of WAL as k increases.

Feifeng Zheng, Yinfeng Xu, Chung Keung Poon

Springer Professional

Algorithmic Aspects in Information and Management

5th International Conference, AAIM 2009, San Francisco, CA, USA, June 15-17, 2009. Proceedings

Inhaltsverzeichnis

Frontmatter

Algorithmic Challenge in Online Advertising

Parallel Algorithms for Collaborative Filtering

On the Approximability of Some Haplotyping Problems

On Acyclicity of Games with Cycles

Discrete online TSP

On Approximating an Implicit Cover Problem in Biology

Power Indices in Spanning Connectivity Games

Efficiently Generating k-Best Solutions to Procurement Auctions

Integer Polyhedra for Program Analysis

Line Segment Facility Location in Weighted Subdivisions

Algorithms for Placing Monitors in a Flow Network

Three Results on Frequency Assignment in Linear Cellular Networks

Link Distance and Shortest Path Problems in the Plane

Orca Reduction and ContrAction Graph Clustering

Equiseparability on Terminal Wiener Index

Effective Tour Searching for TSP by Contraction of Pseudo Backbone Edges

Optimal Auctions Capturing Constraints in Sponsored Search

A Note on Estimating Hybrid Frequency Moment of Data Streams

Two-Level Push-Relabel Algorithm for the Maximum Flow Problem

A More Relaxed Model for Graph-Based Data Clustering: s-Plex Editing

Dynamic Position Auctions with Consumer Search

Nonlinear Optimization over a Weighted Independence System

Improved Online Algorithms for Multiplexing Weighted Packets in Bounded Buffers

Latency Constrained Aggregation in Chain Networks Admits a PTAS

Cutting a Cake for Five People

PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications

On Job Scheduling with Preemption Penalties

Backmatter