Skip to main content
Top

2018 | OriginalPaper | Chapter

Medoid-Shift for Noise Removal to Improve Clustering

Authors : Pasi Fränti, Jiawei Yang

Published in: Artificial Intelligence and Soft Computing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We propose to use medoid-shift to reduce the noise in data prior to clustering. The method processes every point by calculating its k-nearest neighbors (k-NN), and then replacing the point by the medoid of its neighborhood. The process can be iterated. After the data cleaning process, any clustering algorithm can be applied that is suitable for the data.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Breunig, M.M., Kriegel, H., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: ACM SIGMOD International Conference on Management of Data, vol. 29, no. 2, pp. 93–104, May 2000 Breunig, M.M., Kriegel, H., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: ACM SIGMOD International Conference on Management of Data, vol. 29, no. 2, pp. 93–104, May 2000
3.
go back to reference Brito, M.R., Chavez, E.L., Quiroz, A.J., Yukich, J.E.: Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection. Stat. Prob. Lett. 35(1), 33–42 (1997)MathSciNetCrossRef Brito, M.R., Chavez, E.L., Quiroz, A.J., Yukich, J.E.: Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection. Stat. Prob. Lett. 35(1), 33–42 (1997)MathSciNetCrossRef
4.
go back to reference Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)CrossRef Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)CrossRef
5.
go back to reference Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: International Conference on Knowledge Discovery and Data Mining, KDD, pp. 226–231 (1996) Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: International Conference on Knowledge Discovery and Data Mining, KDD, pp. 226–231 (1996)
6.
go back to reference Forgy, E.: Cluster analysis of multivariate data: efficiency vs. interpretability of classification. Biometrics 21, 768–780 (1965) Forgy, E.: Cluster analysis of multivariate data: efficiency vs. interpretability of classification. Biometrics 21, 768–780 (1965)
7.
go back to reference Fränti, P.: Efficiency of random swap clustering. J. Big Data 5(13), 1–29 (2018)MathSciNet Fränti, P.: Efficiency of random swap clustering. J. Big Data 5(13), 1–29 (2018)MathSciNet
8.
go back to reference Fränti, P., Rezaei, M., Zhao, Q.: Centroid index: cluster level similarity measure. Pattern Recognit. 47(9), 3034–3045 (2014)CrossRef Fränti, P., Rezaei, M., Zhao, Q.: Centroid index: cluster level similarity measure. Pattern Recognit. 47(9), 3034–3045 (2014)CrossRef
9.
10.
go back to reference Hautamäki, V., Kärkkäinen, I., Fränti, P.: Outlier detection using k-nearest neighbour graph. In: International Conference on Pattern Recognition, ICPR 2004, Cambridge, UK, pp. 430–433, August, 2004 Hautamäki, V., Kärkkäinen, I., Fränti, P.: Outlier detection using k-nearest neighbour graph. In: International Conference on Pattern Recognition, ICPR 2004, Cambridge, UK, pp. 430–433, August, 2004
11.
go back to reference Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: International Conference on Very Large Data Bases, New York, USA, pp. 392–403 (1998) Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: International Conference on Very Large Data Bases, New York, USA, pp. 392–403 (1998)
12.
go back to reference Kvålseth, T.O.: Entropy and correlation: some comments. IEEE Trans. Syst. Man Cybern. 17(3), 517–519 (1987)CrossRef Kvålseth, T.O.: Entropy and correlation: some comments. IEEE Trans. Syst. Man Cybern. 17(3), 517–519 (1987)CrossRef
13.
go back to reference Ott, L., Pang, L., Ramos, F., Chawla, S.: On integrated clustering and outlier detection. In: Advances in Neural Information Processing Systems, NIPS, pp. 1359–1367 (2014) Ott, L., Pang, L., Ramos, F., Chawla, S.: On integrated clustering and outlier detection. In: Advances in Neural Information Processing Systems, NIPS, pp. 1359–1367 (2014)
14.
go back to reference Pollet, T.V., van der Meij, L.: To remove or not to remove: the impact of outlier handling on significance testing in testosterone data. Adapt. Hum. Behav. Physiol. 3(1), 43–60 (2017)CrossRef Pollet, T.V., van der Meij, L.: To remove or not to remove: the impact of outlier handling on significance testing in testosterone data. Adapt. Hum. Behav. Physiol. 3(1), 43–60 (2017)CrossRef
15.
go back to reference Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: ACM SIGMOD Record, vol. 29, no. 2, pp. 427–438, June 2000 Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: ACM SIGMOD Record, vol. 29, no. 2, pp. 427–438, June 2000
16.
go back to reference Sheikh, Y.A., Khan, E.A., Kanade, T.: Mode-seeking by medoidshifts. In: IEEE International Conference on Computer Vision, ICCV, Rio de Janeiro, Brazil, October 2007 Sheikh, Y.A., Khan, E.A., Kanade, T.: Mode-seeking by medoidshifts. In: IEEE International Conference on Computer Vision, ICCV, Rio de Janeiro, Brazil, October 2007
17.
go back to reference Tsai, D.-M., Luo, J.-Y.: Mean shift-based defect detection in multicrystalline solar wafer surfaces. IEEE Trans. Ind. Inf. 7(1), 125–135 (2011)CrossRef Tsai, D.-M., Luo, J.-Y.: Mean shift-based defect detection in multicrystalline solar wafer surfaces. IEEE Trans. Ind. Inf. 7(1), 125–135 (2011)CrossRef
18.
go back to reference Yin, L., Yang, R., Gabbouj, M., Neuvo, Y.: Weighted median filters: a tutorial. IEEE Trans. Circ. Syst. II: Analog Digit. Signal Process. 43(3), 157–192 (1996)CrossRef Yin, L., Yang, R., Gabbouj, M., Neuvo, Y.: Weighted median filters: a tutorial. IEEE Trans. Circ. Syst. II: Analog Digit. Signal Process. 43(3), 157–192 (1996)CrossRef
19.
go back to reference Cheng, Y.: Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 17(8), 790–799 (1995)CrossRef Cheng, Y.: Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 17(8), 790–799 (1995)CrossRef
Metadata
Title
Medoid-Shift for Noise Removal to Improve Clustering
Authors
Pasi Fränti
Jiawei Yang
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-91253-0_56

Premium Partner