Skip to main content



Parallelism in Current and Future Processors – Challenges and Support for Designing Optimal Algorithms

(Invited Talk)
Both explicit usable and implicit transparent parallelism is nothing really new in processor technology but has been restricted to high-end computer systems accessible to only a few developers. In recent years, however, parallelism on all levels has found its way into even the cheapest desktop and notebook system and thus every algorithm being developed today should reflect this change to optimally exploit theses additional resources.
This session will outline the parallelism offered by current Intel processors and some new parallel enhancements of future architectures including the many-core approach implemented by Larrabee and the coming improvements for data-parallel (SIMD, SSE) execution. Some of these enhancements will introduce new challenges to the algorithm designer and developer. This includes the massive number of available hardware-threads, the increased size of vector operations, and non-uniform memory access. Intel actively looks for new parallel programming models to tackle these challenges including CT, Software Transactional Memory and Concurrent Collections for C++. While these models might make it into future program development environments, there are multiple developer tools for parallel program development as mature products available today – including compilers, libraries, thread checker/debugger, performance analysis tools using hardware performance counters etc. The talk will outline how some of these tools as offered by Intel and how they can facilitate the complete development cycle for parallel program development.
Heinz Bast

From Streaming B-Trees to Tokutek: How a Theoretician Learned to be VP of Engineering

(Invited Talk)
I present the cache-oblivious streaming B-tree, a high performance alternative to the traditional B-tree. Modern databases and file systems are based on B-trees. As a result, they can exhibit performance cliffs and unpredictable run times. Replacing B-trees with cache-oblivious streaming B-trees can remove some of these performance deficiencies.
I explain some of the technical issues that we needed to address to turn the streaming B-tree prototype into an industrial-strength product. Then I explain how legacy and established practice influenced our engineering decisions.
Michael A. Bender

Experimental Comparisons of Derivative Free Optimization Algorithms

(Invited Talk)
In this paper, the performances of the quasi-Newton BFGS algorithm, the NEWUOA derivative free optimizer, the Covariance Matrix Adaptation Evolution Strategy (CMA-ES), the Differential Evolution (DE) algorithm and Particle Swarm Optimizers (PSO) are compared experimentally on benchmark functions reflecting important challenges encountered in real-world optimization problems. Dependence of the performances in the conditioning of the problem and rotational invariance of the algorithms are in particular investigated.
A. Auger, N. Hansen, J. M. Perez Zerpa, R. Ros, M. Schoenauer

On Computational Models for Flash Memory Devices

Flash memory-based solid-state disks are fast becoming the dominant form of end-user storage devices, partly even replacing the traditional hard-disks. Existing two-level memory hierarchy models fail to realize the full potential of flash-based storage devices. We propose two new computation models, the general flash model and the unit-cost model, for memory hierarchies involving these devices. Our models are simple enough for meaningful algorithm design and analysis. In particular, we show that a broad range of existing external-memory algorithms and data structures based on the merging paradigm can be adapted efficiently into the unit-cost model. Our experiments show that the theoretical analysis of algorithms on our models corresponds to the empirical behavior of algorithms when using solid-state disks as external memory.
Deepak Ajwani, Andreas Beckmann, Riko Jacob, Ulrich Meyer, Gabriel Moruz

Competitive Buffer Management with Stochastic Packet Arrivals

We study a variant of Naor’s [23] online packet buffering model: We are given a (non-preemptive) fifo buffer (e.g., in a network switch or a router) and packets that request transmission arrive over time. Any packet has an intrinsic value R and we have to decide whether to accept or reject it. In each time-step, the first packet in the buffer (if any) is transmitted and our benefit of it is equal to its intrinsic value minus the time it spent in the buffer. The objective is to maximize the total benefit. From a worst-case perspective, Fiat et al. [14] gave a threshold algorithm with a competitive ratio equal to the golden ratio φ ≈ 1.618. Due to the insensitivity of the algorithms towards the input, it was conjectured that this competitive ratio is too pessimistic for packet sequences occurring in practice.
In this paper, we treat this conjecture from an analytical and experimental point of view. In the analytical part, we assume Poisson arrivals and compute a threshold for this algorithm depending on the arrival rate λ and the value R of the packets. This also yields bounds on the (expected) competitive ratio of the algorithm. We discover the phenomenon that the ratio converges to one if R grows or λ moves away from one. Thus (for fixed R) we have that the largest competitive ratios occur for λ= 1. In that case, the bound is essentially \(R / (R - \sqrt{R})\) and gives values smaller than φ for R ≥ 8.
In a second, experimental, part of our study, we compared the competitive ratios achieved by the two threshold algorithms on actual network traffic with our theoretical prediction (which assumes Poisson arrivals). It turns out that the prediction and the measured ratios for our threshold are consistent, where the prediction even tends to be pessimistic. Furthermore, the measured ratios with our threshold where substantially smaller than φ and even almost everywhere below the ratios achieved with the threshold of [14].
Kamal Al-Bawani, Alexander Souza

Fast and Accurate Bounds on Linear Programs

We present an algorithm that certifies the feasibility of a linear program while using rational arithmetic as little as possible. Our approach relies on computing a feasible solution of the linear program that is as far as possible from satisfying an inequality at equality. To realize such an approach, we have to detect the set of inequalities that can only be satisfied at equality.
Compared to previous approaches for this problem our algorithm has a much higher rate of success.
Ernst Althaus, Daniel Dumitriu

Batch Dynamic Single-Source Shortest-Path Algorithms: An Experimental Study

A dynamic shortest-path algorithm is called a batch algorithm if it is able to handle graph changes that consist of multiple edge updates at a time. In this paper we focus on fully-dynamic batch algorithms for single-source shortest paths in directed graphs with positive edge weights. We give an extensive experimental study of the existing algorithms for the single-edge and the batch case, including a broad set of test instances. We further present tuned variants of the already existing SWSF-FP-algorithm being up to 15 times faster than SWSF-FP. A surprising outcome of the paper is the astonishing level of data dependency of the algorithms. More detailed descriptions and further experimental results of this work can be found in [1].
Reinhard Bauer, Dorothea Wagner

Rotated-Box Trees: A Lightweight c-Oriented Bounding-Volume Hierarchy

We introduce a new type of bounding-volume hierarchy, the c-oriented rotated-box tree, or c-rb-tree for short. A c-rb-tree uses boxes as bounding volumes, whose orientations come from a fixed set of predefined box orientations. We theoretically and experimentally compare our new c-rb-tree to two existing bounding-volumes hierarchies, namely c-dop-tree and box-trees.
Mark de Berg, Peter Hachenberger

psort, Yet Another Fast Stable Sorting Software

psort was the fastest sorting software in 2008 according to the Pennysort benchmark, sorting 181GB of data for 0.01$ of computer time. This paper details its internals, and the careful fitting of its architecture to the structure of modern PCs-class platforms, allowing it to outperform state-of-the-art sorting software such as GNUsort or STXXL.
Paolo Bertasi, Marco Bressan, Enoch Peserico

A Heuristic for Fair Correlation-Aware Resource Placement

The configuration of network resources greatly impacts the communication overhead for data intensive tasks and constitutes a critical problem in the design and maintenance of networks. To address the issue of resource placement, we analyze and implement a semidefinite programming-based heuristic for solving a known NP-complete graph optimization problem called Maximum Size Bounded Capacity Cut. Experimental results for our heuristic demonstrate promising performance on both synthetic and real world data. Next our heuristic is used as a sub-routine to solve another known NP-complete problem called Min-Max Multiway Cut whose traits we adapt to yield a resource placement scheme that exploits correlations between network resources. Our experimental results show that the resulting placement scheme achieves a significant savings in communication overhead.
Raouf Boutaba, Martin Karsten, Maxwell Young

Measuring the Similarity of Geometric Graphs

What does it mean for two geometric graphs to be similar? We propose a distance for geometric graphs that we show to be a metric, and that can be computed by solving an integer linear program. We also present experiments using a heuristic distance function.
Otfried Cheong, Joachim Gudmundsson, Hyo-Sil Kim, Daria Schymura, Fabian Stehn

A Heuristic Strong Connectivity Algorithm for Large Graphs

We present a contraction-based algorithm for computing the strongly connected components of large graphs. While the worst-case complexity of the algorithm can be terrible (essentially the cost of running a DFS-based internal-memory algorithm on the entire graph), our experiments confirm that the algorithm performs remarkably well in practice. The strongest competitor is the algorithm by Sibeyn et al. [17], which is based on a semi-external DFS algorithm developed in the same paper. Our algorithm substantially outperforms the algorithm of [17] on most of the graphs used in our experiments and never performs worse. It thus demonstrates that graph contraction, which is the most important technique for solving connectivity problems on undirected graphs I/O-efficiently, can be used to solve such problems also on directed graphs, at least as a heuristic.
Adan Cosgaya-Lozano, Norbert Zeh

Pareto Paths with SHARC

Up to now, research on speed-up techniques for Dijkstra’s algorithm focused on single-criteria scenarios. The goal was to find the quickest route within a transportation network. However, the quickest route is often not the best one. A user might be willing to accept slightly longer travel times if the cost of the journey is less. A common approach to cope with such a situation is to find Pareto-optimal (concerning other metrics than travel times) routes. Such routes have the property that each route is better than any other route with respect to at least one metric under consideration, e.g., travel costs or number of train changes. In this work, we study multi-criteria search in road networks. On the one hand, we focus on the problem of limiting the number of Pareto paths. On the other hand, we present a multi-criteria variant of our recent SHARC algorithm.
Daniel Delling, Dorothea Wagner

An Application of Self-organizing Data Structures to Compression

List update algorithms have been widely used as subroutines in compression schemas, most notably as part of Burrows-Wheeler compression. The Burrows-Wheeler transform (BWT), which is the basis of many state-of-the-art general purpose compressors applies a compression algorithm to a permuted version of the original text. List update algorithms are a common choice for this second stage of BWT-based compression. In this paper we perform an experimental comparison of various list update algorithms both as stand alone compression mechanisms and as a second stage of the BWT-based compression. Our experiments show MTF outperforms other list update algorithms in practice after BWT. This is consistent with the intuition that BWT increases locality of reference and the predicted result from the locality of reference model of Angelopoulos et al. [1]. Lastly, we observe that due to an often neglected difference in the cost models, good list update algorithms may be far from optimal for BWT compression and construct an explicit example of this phenomena. This is a fact that had yet to be supported theoretically in the literature.
Reza Dorrigiv, Alejandro López-Ortiz, J. Ian Munro

Scheduling Additional Trains on Dense Corridors

Every train schedule entails a certain risk of delay. When adding a new train to an existing timetable, planners have to take the expected risk of delay of the trains into account. Typically, this can be a very laborious task involving detailed simulations. We propose to predict the risk of a planned train using a series of linear regression models on the basis of extensive real world delay data of trains. We show how to integrate these models into a combinatorial shortest path model to compute a set of Pareto optimal train schedules with respect to risk and travel time. We discuss the consequences of different model choices and notions of risk with respect to the algorithmic complexity of the resulting combinatorial problems. Finally, we demonstrate the quality of our models on real world data of Swiss Federal Railways.
Holger Flier, Thomas Graffagnino, Marc Nunkesser

Broadword Computing and Fibonacci Code Speed Up Compressed Suffix Arrays

The suffix array of a string s of length n over the alphabet Σ is the permutation that gives us the lexicographic order of all suffixes of s. This popular index can be used to solve many problems in sequence analysis. In practice, one limitation of this data structure is its size of n logn bits, while the size of the text is n log|Σ| bits. For this reason compressed suffix arrays (CSAs) were introduced. The size of these CSAs is asymptotically less than or equal to the text size if the text is compressible, while maintaining O(log ε n) access time to the elements (0 < ε ≤ 1). The goal of a good CSA implementation is to provide fast access time to the elements while using minimal space for the CSA. Both access time and space depend on the choice of a self-delimiting code for compression. We show that the Fibonacci code is superior to the Elias δ code for strings that are low compressible. Our second contribution are two new broadword methods that support the decoding of Fibonacci encoded numbers on 64 bit architectures. Furthermore, our experiments show that the use of known broadword methods speed up the decoding of Elias δ code for strings that are high compressible, like XML. Finally, we provide a new efficient C++ library for succinct data structures which includes a generic CSA for further experiments.
Simon Gog

Speed-Up Techniques for the Selfish Step Algorithm in Network Congestion Games

Recently, many speed-up techniques were developed for the computation of shortest paths in networks with rather static edge latencies. Very little is known about dealing with problems which rely on the computation of shortest paths in highly dynamic networks. However, with an increasing amount of traffic, static models of networks rather sparsely reflect realistic scenarios. In the framework of network congestion games, the edge latencies depend on the number of users traveling on the edges. We develop speed-up techniques for the selfish step algorithm to efficiently compute (pure) Nash equilibria in network congestion games. Our approaches
periodically compute estimations for lengths of shortest paths during the advance of the selfish step algorithm with the purpose to use A * for many path computations, and
completely save many path computations or substitute them by more efficient tests.
In comparison to an implementation of the selfish-step algorithm using Dijkstra’s algorithm we improve the total running time by a factor of 4 up to 9 on highway networks and grids.
Matthias Kirschner, Philipp Schengbier, Tobias Tscheuschner

Experimental Study of Non-oblivious Greedy and Randomized Rounding Algorithms for Hypergraph b-Matching

(Extended Abstract)
We consider the b-matching problem in a hypergraph on n vertices and edge cardinality bounded by ℓ. Oblivious greedy algorithms achieve approximations of \((\sqrt{n}+1)^{-1}\) and (ℓ + 1)− 1 independently of b (Krysta 2005). Randomized rounding achieves constant-factor approximations of 1 − ε for large b, namely b = Ω(ε − 2, ln n), (Srivastav and Stangier 1997). Hardness of approximation results exist for b = 1 (Gonen and Lehmann 2000; Hazan, Safra, and Schwartz 2006). In the range of 1 < b ≪ ln n, no close-to-one, or even constant-factor, polynomial-time approximations are known. The aim of this paper is to overcome this algorithmic stagnation by proposing new algorithms along with the first experimental study of the b-matching problem in hypergraphs, and to provide a first theoretical analysis of these algorithms to some extent. We propose a non-oblivious greedy algorithm and a hybrid algorithm combining randomized rounding and non-oblivious greedy. Experiments on random and real-world instances suggest that the hybrid can, in terms of approximation, outperform the known techniques. The non-oblivious greedy also shows a better approximation in many cases than the oblivious one and is accessible to theoretic analysis.
Lasse Kliemann, Anand Srivastav

Empirical Evaluation of Graph Partitioning Using Spectral Embeddings and Flow

We present initial results from the first empirical evaluation of a graph partitioning algorithm inspired by the Arora-Rao-Vazirani algorithm of [5], which combines spectral and flow methods in a novel way. We have studied the parameter space of this new algorithm, e.g., examining the extent to which different parameter settings interpolate between a more spectral and a more flow-based approach, and we have compared results of this algorithm to results from previously known and optimized algorithms such as Metis.
Kevin J. Lang, Michael W. Mahoney, Lorenzo Orecchia

Univariate Algebraic Kernel and Application to Arrangements

We present a cgal-based univariate algebraic kernel, which provides certified real-root isolation of univariate polynomials with integer coefficients and standard functionalities such as basic arithmetic operations, greatest common divisor (gcd) and square-free factorization, as well as comparison and sign evaluations of real algebraic numbers.
We compare our kernel with other comparable kernels, demonstrating the efficiency of our approach. Our experiments are performed on large data sets including polynomials of high degree (up to 2 000) and with very large coefficients (up to 25 000 bits per coefficient).
We also address the problem of computing arrangements of x-monotone polynomial curves. We apply our kernel to this problem and demonstrate its efficiency compared to previous solutions available in cgal.
Sylvain Lazard, Luis Peñaranda, Elias Tsigaridas

Fast Algorithm for Graph Isomorphism Testing

In this paper we present a novel approach to the graph isomorphism problem. We combine a direct approach, that tries to find a mapping between the two input graphs using backtracking, with a (possibly partial) automorphism precomputing that allows to prune the search tree. We propose an algorithm, conauto, that has a space complexity of O(n 2 logn) bits. It runs in time O(n 5) with high probability if either one of the input graphs is a G(n,p) random graph, for p ∈ [ω(ln 4 n / n ln ln n ), 1 − ω(ln 4 n / n ln ln n )]. We compare the practical performance of conauto with other popular algorithms, with an extensive collection of problem instances. Our algorithm behaves consistently for directed, undirected, positive, and negative cases. Additionally, when it is slower than any of the other algorithms, it is only by a small factor.
José Luis López-Presa, Antonio Fernández Anta

Algorithms and Experiments for Clique Relaxations—Finding Maximum s-Plexes

We propose new practical algorithms to find degree-relaxed variants of cliques called s-plexes. An s-plex denotes a vertex subset in a graph inducing a subgraph where every vertex has edges to all but at most s vertices in the s-plex. Cliques are 1-plexes. In analogy to the special case of finding maximum-cardinality cliques, finding maximum-cardinality s-plexes is NP-hard. Complementing previous work, we develop combinatorial, exact algorithms, which are strongly based on methods from parameterized algorithmics. The experiments with our freely available implementation indicate the competitiveness of our approach, for many real-world graphs outperforming the previously used methods.
Hannes Moser, Rolf Niedermeier, Manuel Sorge

A Design-for-Yield Algorithm to Assess and Improve the Structural and Energetic Robustness of Proteins and Drugs

Robustness is a property that pervades all aspects of nature. The ability of a system to adapt to perturbations due to internal and external agents, aging, wear, or to environmental changes is one of the driving forces of evolution. At the molecular level, understanding the robustness of a protein has a great impact on the in-silico design of polypeptide chains and drugs. The chance of computationally checking the ability of a protein to preserve its structure in the native state may lead to the design of new compounds that can work in a living cell more effectively. Inspired by the well known robustness analysis framework used in Electronic Design Automation, we introduce a formal definition of robustness for proteins and a dimensionless quantity, called yield, to quantify the robustness of a protein. Then, we introduce a new robustness-centered protein design algorithm called Design-For-Yield. The aim of the algorithm is to discover new conformations with a specific functionality and high yield values. We present extensive characterizations of the robustness properties of many peptides, proteins, and drugs. Finally, we apply the DFY algorithm on the Crambin protein (1CRN) and on the Oxicitin drug (DB00107). The obtained results confirm that the algorithm is able to discover a Crambin-like protein that is 23.61% more robust than the wild type. Concerning the Oxicitin drug a new protein sequence and the corresponding protein structure was discovered with an improved robustness of 3% at the global level.
Giuseppe Nicosia, Giovanni Stracquadanio

Multi-level Algorithms for Modularity Clustering

Modularity is a widely used quality measure for graph clusterings. Its exact maximization is prohibitively expensive for large graphs. Popular heuristics progressively merge clusters starting from singletons (coarsening), and optionally improve the resulting clustering by moving vertices between clusters (refinement). This paper experimentally compares existing and new heuristics of this type with respect to their effectiveness (achieved modularity) and runtime. For coarsening, it turns out that the most widely used criterion for merging clusters (modularity increase) is outperformed by other simple criteria, and that a recent multi-step algorithm is no improvement over simple single-step coarsening for these criteria. For refinement, a new multi-level algorithm produces significantly better clusterings than conventional single-level algorithms. A comparison with published benchmark results and algorithm implementations shows that combinations of coarsening and multi-level refinement are competitive with the best algorithms in the literature.
Andreas Noack, Randolf Rotta

Bulk-Insertion Sort: Towards Composite Measures of Presortedness

Well-known measures of presortedness, among others, are the number of inversions needed to sort the input sequence, or the minimal number of blocks of consecutive elements that remain as such in the sorted sequence. In this paper we study the problem of possible composition of measures. For example, after determining the blocks in an input sequence, it is meaningful to measure how many inversions of the blocks are needed to finally sort the sequence. With composite measures in mind we introduce the idea of applying bulk insertions to improve adaptive binary-tree (AVL) sorting; this is done by combining local insertion sort with bulk-insertion methods. We show that bulk-insertion sort is optimally adaptive with respect to the number of bulks and with respect to the number of inversions in the original input. As to composite measures, we define a new measure that tells how many inversions are needed when the extracted bulks form the input. Bulk-insertion sort is shown to be adaptive with respect to this measure. Our experiments show that applying bulk insertion in AVL-tree sorting considerably reduces the number of comparisons and time needed to sort nearly sorted sequences.
Riku Saikkonen, Eljas Soisalon-Soininen

Computing Elevation Maxima by Searching the Gauss Sphere

The elevation function on a smoothly embedded 2-manifold in ℝ3 reflects the multiscale topography of cavities and protrusions as local maxima. The function has been useful in identifying coarse docking configurations for protein pairs. Transporting the concept from the smooth to the piecewise linear category, this paper describes an algorithm for finding all local maxima. While its worst-case running time is the same as of the algorithm used in prior work, its performance in practice is orders of magnitudes superior. We cast light on this improvement by relating the running time to the total absolute Gaussian curvature of the 2-manifold.
Bei Wang, Herbert Edelsbrunner, Dmitriy Morozov


Weitere Informationen

Premium Partner