Skip to main content
Erschienen in: Knowledge and Information Systems 2/2017

22.10.2016 | Survey Paper

The (black) art of runtime evaluation: Are we comparing algorithms or implementations?

verfasst von: Hans-Peter Kriegel, Erich Schubert, Arthur Zimek

Erschienen in: Knowledge and Information Systems | Ausgabe 2/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Any paper proposing a new algorithm should come with an evaluation of efficiency and scalability (particularly when we are designing methods for “big data”). However, there are several (more or less serious) pitfalls in such evaluations. We would like to point the attention of the community to these pitfalls. We substantiate our points with extensive experiments, using clustering and outlier detection methods with and without index acceleration. We discuss what we can learn from evaluations, whether experiments are properly designed, and what kind of conclusions we should avoid. We close with some general recommendations but maintain that the design of fair and conclusive experiments will always remain a challenge for researchers and an integral part of the scientific endeavor.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
By the Twitter terms of use, we are not allowed to share this data set, unfortunately.
 
2
Hartigan’s more general model also laid the ground for density-based clustering, see the discussion by Campello et al. [26].
 
3
Müllner [56] further refines this idea, but we do not have an implementation of this.
 
4
Nevertheless, the presented observations agree with theoretical complexity results.
 
5
This issue is resolved in the next version of Shogun, after our bug report.
 
9
Also some of the involved authors have a high reputation. Our selection is not intended to harm their reputation but to put the attention of the community to these aspects of runtime evaluation. The fact that these papers by world class authors have been accepted at world class venues only shows that neither the authors nor the reviewers were sufficiently aware of the pitfalls of runtime analysis and of the limits on the conclusions such experiments support.
 
10
We experimentally measured a 500 times slower execution when using Euclidean distance implemented in the Python interpreter as opposed to the provided Euclidean distance which is compiled from Cython. There currently is no way around this overhead without integrating a just-in-time compiler: https://​github.​com/​scikit-learn/​scikit-learn/​issues/​6256.
 
Literatur
1.
Zurück zum Zitat Achtert E, Bernecker T, Kriegel H-P, Schubert E, Zimek A (2009) ELKI in time: ELKI 0.2 for the performance evaluation of distance measures for time series. In: Proceedings of the 11th international symposium on spatial and temporal databases (SSTD), Aalborg, Denmark, pp 436–440 Achtert E, Bernecker T, Kriegel H-P, Schubert E, Zimek A (2009) ELKI in time: ELKI 0.2 for the performance evaluation of distance measures for time series. In: Proceedings of the 11th international symposium on spatial and temporal databases (SSTD), Aalborg, Denmark, pp 436–440
2.
Zurück zum Zitat Achtert E, Böhm C, Kriegel H-P, Kröger P, Zimek A (2007) Robust, complete, and efficient correlation clustering. In: Proceedings of the 7th SIAM international conference on data mining (SDM), Minneapolis, MN, pp 413–418 Achtert E, Böhm C, Kriegel H-P, Kröger P, Zimek A (2007) Robust, complete, and efficient correlation clustering. In: Proceedings of the 7th SIAM international conference on data mining (SDM), Minneapolis, MN, pp 413–418
3.
Zurück zum Zitat Achtert E, Goldhofer S, Kriegel H-P, Schubert E, Zimek A (2012) Evaluation of clusterings—metrics and visual support. In: Proceedings of the 28th international conference on data engineering (ICDE), Washington, DC, pp 1285–1288 Achtert E, Goldhofer S, Kriegel H-P, Schubert E, Zimek A (2012) Evaluation of clusterings—metrics and visual support. In: Proceedings of the 28th international conference on data engineering (ICDE), Washington, DC, pp 1285–1288
4.
Zurück zum Zitat Achtert E, Hettab A, Kriegel H-P, Schubert E, Zimek A (2011) Spatial outlier detection: data, algorithms, visualizations. In: Proceedings of the 12th international symposium on spatial and temporal databases (SSTD), Minneapolis, MN, pp 512–516 Achtert E, Hettab A, Kriegel H-P, Schubert E, Zimek A (2011) Spatial outlier detection: data, algorithms, visualizations. In: Proceedings of the 12th international symposium on spatial and temporal databases (SSTD), Minneapolis, MN, pp 512–516
5.
Zurück zum Zitat Achtert E, Kriegel H-P, Reichert L, Schubert E, Wojdanowski R, Zimek A (2010) Visual evaluation of outlier detection models. In: Proceedings of the 15th international conference on database systems for advanced applications (DASFAA), Tsukuba, Japan, pp 396–399 Achtert E, Kriegel H-P, Reichert L, Schubert E, Wojdanowski R, Zimek A (2010) Visual evaluation of outlier detection models. In: Proceedings of the 15th international conference on database systems for advanced applications (DASFAA), Tsukuba, Japan, pp 396–399
6.
Zurück zum Zitat Achtert E, Kriegel H-P, Schubert E, Zimek A (2013) Interactive data mining with 3D-parallel-coordinate-trees. In: Proceedings of the ACM international conference on management of data (SIGMOD), New York City, NY, pp 1009–1012 Achtert E, Kriegel H-P, Schubert E, Zimek A (2013) Interactive data mining with 3D-parallel-coordinate-trees. In: Proceedings of the ACM international conference on management of data (SIGMOD), New York City, NY, pp 1009–1012
7.
Zurück zum Zitat Achtert E, Kriegel H-P, Zimek A (2008) ELKI: a software system for evaluation of subspace clustering algorithms. In: Proceedings of the 20th international conference on scientific and statistical database management (SSDBM), Hong Kong, China, pp 580–585 Achtert E, Kriegel H-P, Zimek A (2008) ELKI: a software system for evaluation of subspace clustering algorithms. In: Proceedings of the 20th international conference on scientific and statistical database management (SSDBM), Hong Kong, China, pp 580–585
8.
Zurück zum Zitat Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases (VLDB), Santiago de Chile, Chile, pp 487–499 Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases (VLDB), Santiago de Chile, Chile, pp 487–499
9.
Zurück zum Zitat Alsabti K, Ranka S, Singh V (1998) An efficient k-means clustering algorithm. In: Proceedings of IPPS/SPDP workshop on high performance data mining Alsabti K, Ranka S, Singh V (1998) An efficient k-means clustering algorithm. In: Proceedings of IPPS/SPDP workshop on high performance data mining
10.
Zurück zum Zitat Anderberg MR (1973) Cluster analysis for applications. Probability and mathematical statistics. Academic Press, CambridgeMATH Anderberg MR (1973) Cluster analysis for applications. Probability and mathematical statistics. Academic Press, CambridgeMATH
14.
Zurück zum Zitat Arthur D, Vassilvitskii S (2007) k-means\(++\): the advantages of careful seeding. In: Proceedings of the 18th Annual ACM-SIAM symposium on discrete algorithms (SODA), New Orleans, LA, pp 1027–1035 Arthur D, Vassilvitskii S (2007) k-means\(++\): the advantages of careful seeding. In: Proceedings of the 18th Annual ACM-SIAM symposium on discrete algorithms (SODA), New Orleans, LA, pp 1027–1035
15.
Zurück zum Zitat Arya S, Mount DM (1993) Approximate nearest neighbor queries in fixed dimensions. In: Proceedings of the 4th annual ACM/SIGACT-SIAM symposium on discrete algorithms (SODA), Austin, TX, pp 271–280 Arya S, Mount DM (1993) Approximate nearest neighbor queries in fixed dimensions. In: Proceedings of the 4th annual ACM/SIGACT-SIAM symposium on discrete algorithms (SODA), Austin, TX, pp 271–280
16.
Zurück zum Zitat Bayardo Jr RJ, Goethals B, Zaki MJ (eds) (2005) FIMI ’04, Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations, Brighton, UK, November 1, 2004, volume 126 of CEUR Workshop Proceedings. CEUR-WS.org Bayardo Jr RJ, Goethals B, Zaki MJ (eds) (2005) FIMI ’04, Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations, Brighton, UK, November 1, 2004, volume 126 of CEUR Workshop Proceedings. CEUR-WS.org
17.
Zurück zum Zitat Beckmann N, Kriegel H-P, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the ACM international conference on management of data (SIGMOD), Atlantic City, NJ, pp 322–331 Beckmann N, Kriegel H-P, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the ACM international conference on management of data (SIGMOD), Atlantic City, NJ, pp 322–331
18.
Zurück zum Zitat Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517CrossRefMATH Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517CrossRefMATH
19.
Zurück zum Zitat Beygelzimer A, Kakade S, Langford J (2006) Cover trees for nearest neighbors. In: Proceedings of the 23rd international conference on machine learning (ICML), Pittsburgh, PA, pp 97–104 Beygelzimer A, Kakade S, Langford J (2006) Cover trees for nearest neighbors. In: Proceedings of the 23rd international conference on machine learning (ICML), Pittsburgh, PA, pp 97–104
20.
21.
Zurück zum Zitat Bock H (2007) Clustering methods: a history of k-means algorithms. In: Brito P, Cucumel G, Bertrand P, Carvalho F (eds) Selected contributions in data analysis and classification. Springer, Berlin, pp 161–172CrossRef Bock H (2007) Clustering methods: a history of k-means algorithms. In: Brito P, Cucumel G, Bertrand P, Carvalho F (eds) Selected contributions in data analysis and classification. Springer, Berlin, pp 161–172CrossRef
22.
Zurück zum Zitat Bodon F (2003) A fast APRIORI implementation. In: Proceedings of the ICDM workshop on frequent itemset mining implementations (FIMI ’03), Melbourne, Florida, USA Bodon F (2003) A fast APRIORI implementation. In: Proceedings of the ICDM workshop on frequent itemset mining implementations (FIMI ’03), Melbourne, Florida, USA
23.
Zurück zum Zitat Borgelt C (2003) Efficient implementations of Apriori and Eclat. In: Proceedings of the ICDM workshop on frequent itemset mining implementations (FIMI ’03), Melbourne, Florida, USA Borgelt C (2003) Efficient implementations of Apriori and Eclat. In: Proceedings of the ICDM workshop on frequent itemset mining implementations (FIMI ’03), Melbourne, Florida, USA
24.
Zurück zum Zitat Breunig MM, Kriegel H-P, Ng R, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the ACM international conference on management of data (SIGMOD), Dallas, TX, pp 93–104 Breunig MM, Kriegel H-P, Ng R, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the ACM international conference on management of data (SIGMOD), Dallas, TX, pp 93–104
25.
Zurück zum Zitat Budak C, Georgiou T, Agrawal D, El Abbadi A (2013) GeoScope: online detection of geo-correlated information trends in social networks. Proc VLDB Endow 7(4):229–240CrossRef Budak C, Georgiou T, Agrawal D, El Abbadi A (2013) GeoScope: online detection of geo-correlated information trends in social networks. Proc VLDB Endow 7(4):229–240CrossRef
26.
Zurück zum Zitat Campello RJGB, Moulavi D, Zimek A, Sander J (2015) Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans Knowl Discov Data (TKDD) 10(1):5:1–51 Campello RJGB, Moulavi D, Zimek A, Sander J (2015) Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans Knowl Discov Data (TKDD) 10(1):5:1–51
27.
Zurück zum Zitat Campos GO, Zimek A, Sander J, Campello RJGB, Micenková B, Schubert E, Assent I, Houle ME (2016) On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min Knowl Discov 30:891–927MathSciNetCrossRef Campos GO, Zimek A, Sander J, Campello RJGB, Micenková B, Schubert E, Assent I, Houle ME (2016) On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min Knowl Discov 30:891–927MathSciNetCrossRef
28.
Zurück zum Zitat Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd international conference on very large data bases (VLDB), Athens, Greece, pp 426–435 Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd international conference on very large data bases (VLDB), Athens, Greece, pp 426–435
29.
Zurück zum Zitat Cordeiro RLF, Traina AJM, Faloutsos C, Traina C Jr (2013) Halite: fast and scalable multiresolution local-correlation clustering. IEEE Trans Knowl Data Eng 25(2):387–401CrossRef Cordeiro RLF, Traina AJM, Faloutsos C, Traina C Jr (2013) Halite: fast and scalable multiresolution local-correlation clustering. IEEE Trans Knowl Data Eng 25(2):387–401CrossRef
30.
Zurück zum Zitat Eaton JW, Bateman D, Hauberg S, Wehbring R (2014) GNU Octave version 3.8.1 manual: a high-level interactive language for numerical computations. CreateSpace Independent Publishing Platform Eaton JW, Bateman D, Hauberg S, Wehbring R (2014) GNU Octave version 3.8.1 manual: a high-level interactive language for numerical computations. CreateSpace Independent Publishing Platform
31.
Zurück zum Zitat Elkan C (2003) Using the triangle inequality to accelerate k-means. In: Proceedings of the 20th international conference on machine learning (ICML), Washington, DC, pp 147–153 Elkan C (2003) Using the triangle inequality to accelerate k-means. In: Proceedings of the 20th international conference on machine learning (ICML), Washington, DC, pp 147–153
32.
Zurück zum Zitat Eppstein D (1998) Fast hierarchical clustering and other applications of dynamic closest pairs. In: Proceedings of the 9th annual ACM-SIAM symposium on discrete algorithms (SODA), San Francisco, CA, pp 619–628 Eppstein D (1998) Fast hierarchical clustering and other applications of dynamic closest pairs. In: Proceedings of the 9th annual ACM-SIAM symposium on discrete algorithms (SODA), San Francisco, CA, pp 619–628
33.
Zurück zum Zitat Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd ACM international conference on knowledge discovery and data mining (KDD), Portland, OR, pp 226–231 Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd ACM international conference on knowledge discovery and data mining (KDD), Portland, OR, pp 226–231
34.
Zurück zum Zitat Forgy EW (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21:768–769 Forgy EW (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21:768–769
35.
Zurück zum Zitat Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu C, Tseng VS (2014) SPMF: a Java open-source pattern mining library. J Mach Learn Res 15(1):3389–3393MATH Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu C, Tseng VS (2014) SPMF: a Java open-source pattern mining library. J Mach Learn Res 15(1):3389–3393MATH
36.
Zurück zum Zitat Färber I, Günnemann S, Kriegel H-P, Kröger P, Müller E, Schubert E, Seidl T, Zimek A (2010) On using class-labels in evaluation of clusterings. In: MultiClust: 1st International workshop on discovering, summarizing and using multiple clusterings held in conjunction with KDD 2010, Washington, DC Färber I, Günnemann S, Kriegel H-P, Kröger P, Müller E, Schubert E, Seidl T, Zimek A (2010) On using class-labels in evaluation of clusterings. In: MultiClust: 1st International workshop on discovering, summarizing and using multiple clusterings held in conjunction with KDD 2010, Washington, DC
37.
Zurück zum Zitat Gan J, Tao Y (2015) DBSCAN revisited: mis-claim, un-fixability, and approximation. In: Proceedings of the ACM international conference on management of data (SIGMOD), Melbourne, Australia, pp 519–530 Gan J, Tao Y (2015) DBSCAN revisited: mis-claim, un-fixability, and approximation. In: Proceedings of the ACM international conference on management of data (SIGMOD), Melbourne, Australia, pp 519–530
38.
Zurück zum Zitat Geusebroek JM, Burghouts GJ, Smeulders AWM (2005) The Amsterdam library of object images. Int J Comput Vis 61(1):103–112CrossRef Geusebroek JM, Burghouts GJ, Smeulders AWM (2005) The Amsterdam library of object images. Int J Comput Vis 61(1):103–112CrossRef
39.
Zurück zum Zitat Goethals B, Zaki MJ, (eds) (2003) FIMI ’03, Frequent Itemset Mining Implementations, Proceedings of the ICDM 2003 workshop on frequent itemset mining implementations, 19 December 2003, Melbourne, Florida, USA, volume 90 of CEUR workshop proceedings. CEUR-WS.org Goethals B, Zaki MJ, (eds) (2003) FIMI ’03, Frequent Itemset Mining Implementations, Proceedings of the ICDM 2003 workshop on frequent itemset mining implementations, 19 December 2003, Melbourne, Florida, USA, volume 90 of CEUR workshop proceedings. CEUR-WS.org
41.
Zurück zum Zitat Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor 11(1):10–18CrossRef Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor 11(1):10–18CrossRef
42.
Zurück zum Zitat Hamerly G (2010) Making k-means even faster. In: Proceedings of the 10th SIAM international conference on data mining (SDM), Columbus, OH, pp 130–140 Hamerly G (2010) Making k-means even faster. In: Proceedings of the 10th SIAM international conference on data mining (SDM), Columbus, OH, pp 130–140
43.
Zurück zum Zitat Hamerly G, Drake J (2015) Accelerating Lloyd’s algorithm for k-means clustering. In: Celebi ME (ed) Partitional clustering algorithms, chapter 2. Springer, Switzerland, pp 41–78 Hamerly G, Drake J (2015) Accelerating Lloyd’s algorithm for k-means clustering. In: Celebi ME (ed) Partitional clustering algorithms, chapter 2. Springer, Switzerland, pp 41–78
44.
Zurück zum Zitat Hamerly G, Elkan C (2003) Learning the k in k-means. In: Proceedings of the Annual conference on neural information processing systems (NIPS), Vancouver, BC, pp 281–288 Hamerly G, Elkan C (2003) Learning the k in k-means. In: Proceedings of the Annual conference on neural information processing systems (NIPS), Vancouver, BC, pp 281–288
45.
Zurück zum Zitat Hartigan JA (1975) Clustering algorithms. Wiley, New YorkMATH Hartigan JA (1975) Clustering algorithms. Wiley, New YorkMATH
46.
Zurück zum Zitat Hartigan JA, Wong MA (1979) Algorithm AS 136: A k-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100–108 Hartigan JA, Wong MA (1979) Algorithm AS 136: A k-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100–108
47.
Zurück zum Zitat Jones E, Oliphant T, Peterson P et al (2001) SciPy: open source scientific tools for Python Jones E, Oliphant T, Peterson P et al (2001) SciPy: open source scientific tools for Python
48.
Zurück zum Zitat Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892CrossRefMATH Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892CrossRefMATH
49.
Zurück zum Zitat Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New YorkCrossRefMATH Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New YorkCrossRefMATH
50.
Zurück zum Zitat Kriegel H-P, Kröger P, Sander J, Zimek A (2011) Density-based clustering. Wiley Interdiscip Rev Data Min Knowl Discov 1(3):231–240CrossRef Kriegel H-P, Kröger P, Sander J, Zimek A (2011) Density-based clustering. Wiley Interdiscip Rev Data Min Knowl Discov 1(3):231–240CrossRef
51.
Zurück zum Zitat Leutenegger ST, Edgington JM, Lopez MA (1997) STR: a simple and efficient algorithm for R-tree packing. In: Proceedings of the 13th international conference on data engineering (ICDE), Birmingham, UK, pp 497–506 Leutenegger ST, Edgington JM, Lopez MA (1997) STR: a simple and efficient algorithm for R-tree packing. In: Proceedings of the 13th international conference on data engineering (ICDE), Birmingham, UK, pp 497–506
53.
Zurück zum Zitat MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: 5th Berkeley symposium on mathematics, statistics, and probabilistics, vol 1, pp 281–297 MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: 5th Berkeley symposium on mathematics, statistics, and probabilistics, vol 1, pp 281–297
54.
Zurück zum Zitat Mahran S, Mahar K (2008) Using grid for accelerating density-based clustering. In: Proceedings of 8th IEEE international conference on computer and information technology, CIT 2008, Sydney, Australia, pp 35–40 Mahran S, Mahar K (2008) Using grid for accelerating density-based clustering. In: Proceedings of 8th IEEE international conference on computer and information technology, CIT 2008, Sydney, Australia, pp 35–40
55.
Zurück zum Zitat Murtagh F (1985) A survey of algorithms for contiguity-constrained clustering and related problems. Comput J 28(1):82–88CrossRef Murtagh F (1985) A survey of algorithms for contiguity-constrained clustering and related problems. Comput J 28(1):82–88CrossRef
57.
Zurück zum Zitat Nijssen S, Kok JN (2006) Frequent subgraph miners: runtimes don’t say everything. In: Proceedings of the 4th workshop on mining and learning with graphs (MLG), Berlin, Germany, pp 173–180 Nijssen S, Kok JN (2006) Frequent subgraph miners: runtimes don’t say everything. In: Proceedings of the 4th workshop on mining and learning with graphs (MLG), Berlin, Germany, pp 173–180
59.
Zurück zum Zitat Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830MathSciNetMATH Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830MathSciNetMATH
60.
Zurück zum Zitat Pelleg D, Moore A (1999) Accelerating exact k-means algorithms with geometric reasoning. In: Proceedings of the 5th ACM international conference on knowledge discovery and data mining (SIGKDD), San Diego, CA, pp 277–281 Pelleg D, Moore A (1999) Accelerating exact k-means algorithms with geometric reasoning. In: Proceedings of the 5th ACM international conference on knowledge discovery and data mining (SIGKDD), San Diego, CA, pp 277–281
61.
Zurück zum Zitat Pelleg D, Moore A (2000) X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the 17th international conference on machine learning (ICML), Stanford University, CA, vol 1, pp 727–734 Pelleg D, Moore A (2000) X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the 17th international conference on machine learning (ICML), Stanford University, CA, vol 1, pp 727–734
62.
Zurück zum Zitat Phillips SJ (2002) Acceleration of k-means and related clustering algorithms. In: The 4th international workshop on algorithm engineering and experiments (ALENEX) 2002, San Francisco, CA, pp 166–177 Phillips SJ (2002) Acceleration of k-means and related clustering algorithms. In: The 4th international workshop on algorithm engineering and experiments (ALENEX) 2002, San Francisco, CA, pp 166–177
63.
Zurück zum Zitat Prim RC (1957) Shortest connection networks and some generalizations. Bell Syst Tech J 36(6):1389–1401CrossRef Prim RC (1957) Shortest connection networks and some generalizations. Bell Syst Tech J 36(6):1389–1401CrossRef
66.
Zurück zum Zitat Rohlf FJ (1973) Algorithm 76: hierarchical clustering using the minimum spanning tree. Comput J 16(1):93–95 Rohlf FJ (1973) Algorithm 76: hierarchical clustering using the minimum spanning tree. Comput J 16(1):93–95
67.
Zurück zum Zitat Sander J, Ester M, Kriegel H-P, Xu X (1998) Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Min Knowl Discov 2(2):169–194CrossRef Sander J, Ester M, Kriegel H-P, Xu X (1998) Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Min Knowl Discov 2(2):169–194CrossRef
68.
Zurück zum Zitat Schubert E, Koos A, Emrich T, Züfle A, Schmid KA, Zimek A (2015) A framework for clustering uncertain data. Proc VLDB Endow 8(12):1976–1979CrossRef Schubert E, Koos A, Emrich T, Züfle A, Schmid KA, Zimek A (2015) A framework for clustering uncertain data. Proc VLDB Endow 8(12):1976–1979CrossRef
69.
Zurück zum Zitat Schubert E, Zimek A, Kriegel H-P (2013) Geodetic distance queries on R-trees for indexing geographic data. In: Proceedings of the 13th international symposium on spatial and temporal databases (SSTD), Munich, Germany, pp 146–164 Schubert E, Zimek A, Kriegel H-P (2013) Geodetic distance queries on R-trees for indexing geographic data. In: Proceedings of the 13th international symposium on spatial and temporal databases (SSTD), Munich, Germany, pp 146–164
70.
Zurück zum Zitat Schubert E, Zimek A, Kriegel H-P (2014) Generalized outlier detection with flexible kernel density estimates. In: Proceedings of the 14th SIAM international conference on data mining (SDM), Philadelphia, PA, pp 542–550 Schubert E, Zimek A, Kriegel H-P (2014) Generalized outlier detection with flexible kernel density estimates. In: Proceedings of the 14th SIAM international conference on data mining (SDM), Philadelphia, PA, pp 542–550
71.
Zurück zum Zitat Schubert E, Zimek A, Kriegel H-P (2014) Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Knowl Discov 28(1):190–237MathSciNetCrossRefMATH Schubert E, Zimek A, Kriegel H-P (2014) Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Knowl Discov 28(1):190–237MathSciNetCrossRefMATH
72.
Zurück zum Zitat Sculley D (2010) Web-scale k-means clustering. In: Proceedings of the 19th international conference on world wide web (WWW), Raleigh, NC, pp 1177–1178 Sculley D (2010) Web-scale k-means clustering. In: Proceedings of the 19th international conference on world wide web (WWW), Raleigh, NC, pp 1177–1178
73.
Zurück zum Zitat Sibson R (1973) SLINK: an optimally efficient algorithm for the single-link cluster method. Comput J 16(1):30–34MathSciNetCrossRef Sibson R (1973) SLINK: an optimally efficient algorithm for the single-link cluster method. Comput J 16(1):30–34MathSciNetCrossRef
74.
Zurück zum Zitat Šidlauskas D, Jensen CS (2014) Spatial joins in main memory: implementation matters!. Proc VLDB Endow 8(1):97–100CrossRef Šidlauskas D, Jensen CS (2014) Spatial joins in main memory: implementation matters!. Proc VLDB Endow 8(1):97–100CrossRef
75.
Zurück zum Zitat Slonim N, Aharoni E, Crammer K (2013) Hartigan’s k-means versus Lloyd’s k-means-is it time for a change? In: Proceedings of the 23rd international joint conference on artificial intelligence (IJCAI), Beijing, China Slonim N, Aharoni E, Crammer K (2013) Hartigan’s k-means versus Lloyd’s k-means-is it time for a change? In: Proceedings of the 23rd international joint conference on artificial intelligence (IJCAI), Beijing, China
76.
Zurück zum Zitat Sneath PHA (1957) The application of computers to taxonomy. J Gen Microbiol 17:201–226CrossRef Sneath PHA (1957) The application of computers to taxonomy. J Gen Microbiol 17:201–226CrossRef
77.
Zurück zum Zitat Sonnenburg S, Rätsch G, Henschel S, Widmer C, Behr J, Zien A, De Bona F, Binder A, Gehl C, Franc V (2010) The SHOGUN machine learning toolbox. J Mach Learn Res 11:1799–1802MATH Sonnenburg S, Rätsch G, Henschel S, Widmer C, Behr J, Zien A, De Bona F, Binder A, Gehl C, Franc V (2010) The SHOGUN machine learning toolbox. J Mach Learn Res 11:1799–1802MATH
78.
Zurück zum Zitat Sowell B, Salles MAV, Cao T, Demers AJ, Gehrke J (2013) An experimental analysis of iterated spatial joins in main memory. Proc VLDB Endow 6(14):1882–1893CrossRef Sowell B, Salles MAV, Cao T, Demers AJ, Gehrke J (2013) An experimental analysis of iterated spatial joins in main memory. Proc VLDB Endow 6(14):1882–1893CrossRef
79.
Zurück zum Zitat Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. In: KDD workshop on text mining, vol 400, pp 525–526 Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. In: KDD workshop on text mining, vol 400, pp 525–526
80.
Zurück zum Zitat Steinhaus H (1956) Sur la division des corp materiels en parties. Bull Acad Pol Sci 1:801–804MathSciNetMATH Steinhaus H (1956) Sur la division des corp materiels en parties. Bull Acad Pol Sci 1:801–804MathSciNetMATH
83.
Zurück zum Zitat Vreeken J, Tatti N (2014) Interesting patterns. In: Aggarwal CC, Han J (eds) Frequent pattern mining, chapter 5. Springer, pp 105–134 Vreeken J, Tatti N (2014) Interesting patterns. In: Aggarwal CC, Han J (eds) Frequent pattern mining, chapter 5. Springer, pp 105–134
84.
85.
Zurück zum Zitat Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, BurlingtonMATH Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, BurlingtonMATH
86.
Zurück zum Zitat Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8(7):1341–1390CrossRef Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8(7):1341–1390CrossRef
87.
Zurück zum Zitat Wörlein M, Meinl T, Fischer I, Philippsen M (2005) A quantitative comparison of the subgraph miners MoFa, gSpan, FFSM, and Gaston. In: Proceedings of the 9th European conference on principles and practice of knowledge discovery in databases PKDD), Porto, Portugal, pp 392–403 Wörlein M, Meinl T, Fischer I, Philippsen M (2005) A quantitative comparison of the subgraph miners MoFa, gSpan, FFSM, and Gaston. In: Proceedings of the 9th European conference on principles and practice of knowledge discovery in databases PKDD), Porto, Portugal, pp 392–403
88.
Zurück zum Zitat Yu C, Ooi BC, Tan K-L, Jagadish V (2001) Indexing the distance: an efficient method to KNN processing. In: Proceedings of the 27th international conference on very large data bases (VLDB), Roma, Italy, pp 421–430 Yu C, Ooi BC, Tan K-L, Jagadish V (2001) Indexing the distance: an efficient method to KNN processing. In: Proceedings of the 27th international conference on very large data bases (VLDB), Roma, Italy, pp 421–430
89.
Zurück zum Zitat Zheng Z, Kohavi R, Mason L (2001) Real world performance of association rule algorithms. In: Proceedings of the 7th ACM international conference on knowledge discovery and data mining (SIGKDD), San Francisco, CA, pp 401–406 Zheng Z, Kohavi R, Mason L (2001) Real world performance of association rule algorithms. In: Proceedings of the 7th ACM international conference on knowledge discovery and data mining (SIGKDD), San Francisco, CA, pp 401–406
Metadaten
Titel
The (black) art of runtime evaluation: Are we comparing algorithms or implementations?
verfasst von
Hans-Peter Kriegel
Erich Schubert
Arthur Zimek
Publikationsdatum
22.10.2016
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 2/2017
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-016-1004-2

Weitere Artikel der Ausgabe 2/2017

Knowledge and Information Systems 2/2017 Zur Ausgabe