Top

KI - Künstliche Intelligenz

Published in:

10-11-2017 | Doctoral and Postdoctoral Dissertations

Randomized Primitives for Big Data Processing

Author: Morten Stöckel

Published in: KI - Künstliche Intelligenz | Issue 1/2018

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

A basic question on two pieces of data is: “What is the similarity of the data?” In this extended abstract we give an overview of new developments in randomized algorithms and data structures that relate to this question. In particular we provide new state of the art methods in three particular settings, that all relate to the computation of intersection sizes:

We give a new space-efficient summary data structure for answering set intersection size queries. The new summaries are based on one-permutation min-wise hashing, and we provide a lower bound that nearly matches our new upper bound.

For sparse matrix multiplication, we give new tight bounds in the I/O model, settling the I/O complexity a natural parameterization of the problem—namely where the complexity depends on the input sparsity N, the output sparsity Z and the parameters of the I/O model. In the RAM model we give a new algorithm that exploits output sparsity and which beats previous known results for most of the parameter space.

We give a new I/O efficient algorithm to compute the similarity join between two sets: two elements are members of this join if they are close according to a specified metric. Our new algorithm is based on locality-sensitive hashing and strictly improves on previous work.

previous article Robots in Committees

next article News

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

KI - Künstliche Intelligenz

The Scientific journal "KI – Künstliche Intelligenz" is the official journal of the division for artificial intelligence within the "Gesellschaft für Informatik e.V." (GI) – the German Informatics Society - with constributions from troughout the field of artificial intelligence.

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Amossen RR, Pagh R (2009) Faster join-projects and sparse matrix multiplications. In: Proceedings of the 12th international conference on database theory, ICDT ’09, St. Petersburg, Russia, 23–25 March, pp 121–126

Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings of the compression and complexity of sequences. SEQUENCES ’97. IEEE Computer Society, Washington, DC, p 21

Gall FL (2014) Powers of tensors and fast matrix multiplication. Int Symp Symb Algebraic Comput ISSAC 2014:296–303MathSciNetMATH

Jacob, R, Stöckel M (2015) Fast output-sensitive matrix multiplication. In: Algorithms, ESA 2015: 23rd annual European symposium, Proceedings. Springer, Heidelberg, pp 766–778

Jia-Wei H, Kung HT (1981) I/o complexity: the red–blue pebble game. In: Proceedings of the thirteenth annual ACM symposium on theory of computing, STOC ’81, ACM, pp 326–333

Knudsen MBT, Stöckel M (2015) Quicksort, largest bucket, and min-wise hashing with limited independence. In: Algorithms, ESA 2015, 23rd annual European symposium, Proceedings, pp 828–839

Li P, König AC (2011) Theory and applications of b-bit minwise hashing. Commun ACM 54(8):101–109CrossRef

Pagh R, Pham N, Silvestri F, Stöckel M (2015) I/O-efficient similarity join. In: Algorithms, ESA 2015: 23rd annual European symposium, Proceedings. Springer, Heidelberg, pp 941–952

Pagh R, Stöckel M (2014) The input/output complexity of sparse matrix multiplication. In: Algorithms, ESA 2014: 22th annual European symposium, Proceedings. Springer, Heidelberg, pp 750–761

10.

Pagh R, Stöckel M (2015) Association rule mining using maximum entropy. CoRR. arxiv:1501.02143

11.

Pagh R, Stöckel M, Woodruff DP (2014) Is min-wise hashing optimal for summarizing set intersection? In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS’14. ACM, pp 109–120

12.

Stöckel M (2015) Randomized primitives for big data processing. https://en.itu.dk/~/media/en/research/phd-programme/phd-defences/2015/morten-st%C3%B6ckel-phd-thesis-final-pdf.pdf?la=en

13.

Strassen V (1969) Gaussian elimination is not optimal. Numer Math 13(4):354–356MathSciNetCrossRefMATH

Title: Randomized Primitives for Big Data Processing
Author: Morten Stöckel
Publication date: 10-11-2017
Publisher: Springer Berlin Heidelberg
Published in: KI - Künstliche Intelligenz / Issue 1/2018
Print ISSN: 0933-1875
Electronic ISSN: 1610-1987
DOI: https://doi.org/10.1007/s13218-017-0515-7

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

KI - Künstliche Intelligenz

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 1/2018

Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms

News

A Quality Evaluation of Combined Search on a Knowledge Base and Text

Societal Implications of Big Data

Big data algorithms beyond machine learning

Fuzzy-Logic Controlled Genetic Algorithm for the Rail-Freight Crew-Scheduling Problem

Premium Partner