Skip to main content
Top
Published in: Natural Computing 3/2016

01-09-2016

Detecting outliers in categorical data through rough clustering

Authors: N. N. R. Ranga Suri, M. Narasimha Murty, G. Athithan

Published in: Natural Computing | Issue 3/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Outlier detection is an important data mining task with many contemporary applications. Clustering based methods for outlier detection try to identify the data objects that deviate from the normal data. However, the uncertainty regarding the cluster membership of an outlier object has to be handled appropriately during the clustering process. Additionally, carrying out the clustering process on data described using categorical attributes is challenging, due to the difficulty in defining requisite methods and measures dealing with such data. Addressing these issues, a novel algorithm for clustering categorical data aimed at outlier detection is proposed here by modifying the standard \(k\)-modes algorithm. The uncertainty regarding the clustering process is addressed by considering a soft computing approach based on rough sets. Accordingly, the modified clustering algorithm incorporates the lower and upper approximation properties of rough sets. The efficacy of the proposed rough \(k\)-modes clustering algorithm for outlier detection is demonstrated using various benchmark categorical data sets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Albanese A, Pal SK, Petrosino A (2014) Rough sets, kernel set, and spatio-temporal outlier detection. IEEE Trans Knowl Data Eng 26(1):194–207CrossRef Albanese A, Pal SK, Petrosino A (2014) Rough sets, kernel set, and spatio-temporal outlier detection. IEEE Trans Knowl Data Eng 26(1):194–207CrossRef
go back to reference Asharaf S, Murty MN, Shevade SK (2006) Rough set based incremental clustering of interval data. Pattern Recogn Lett 27:515–519CrossRef Asharaf S, Murty MN, Shevade SK (2006) Rough set based incremental clustering of interval data. Pattern Recogn Lett 27:515–519CrossRef
go back to reference Bock HH (2002) The classical data situation. In: Analysis of Symbolic Data. Springer, Berlin, pp 139–152 Bock HH (2002) The classical data situation. In: Analysis of Symbolic Data. Springer, Berlin, pp 139–152
go back to reference Cao F, Liang J, Bai L (2009) A new initialization method for categorical data clustering. Expert Syst Appl 36:10223–10228CrossRef Cao F, Liang J, Bai L (2009) A new initialization method for categorical data clustering. Expert Syst Appl 36:10223–10228CrossRef
go back to reference Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3) Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3)
go back to reference Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27:861–874CrossRef Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27:861–874CrossRef
go back to reference Huang Z (1997) A fast clustering algorithm to cluster very large categorical data sets in data mining. In: SIGMOD DMKD Workshop, pp 1–8 Huang Z (1997) A fast clustering algorithm to cluster very large categorical data sets in data mining. In: SIGMOD DMKD Workshop, pp 1–8
go back to reference Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666CrossRef Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666CrossRef
go back to reference Jiang F, Sui Y, Cao C (2009) Some issues about outlier detection in rough set theory. Expert Syst Appl 36:4680–4687CrossRef Jiang F, Sui Y, Cao C (2009) Some issues about outlier detection in rough set theory. Expert Syst Appl 36:4680–4687CrossRef
go back to reference Joshi M, Lingras P (2013) Enhancing rough clustering with outlier detection based on evidential clustering. RSFDGrC, Springer, LNCS 8170, pp 127–137 Joshi M, Lingras P (2013) Enhancing rough clustering with outlier detection based on evidential clustering. RSFDGrC, Springer, LNCS 8170, pp 127–137
go back to reference Lai JZC, Juan EYT, Lai FJC (2013) Rough clustering using generalized fuzzy clustering algorithm. Pattern Recogn 46:2538–2547CrossRef Lai JZC, Juan EYT, Lai FJC (2013) Rough clustering using generalized fuzzy clustering algorithm. Pattern Recogn 46:2538–2547CrossRef
go back to reference Li M, Deng S, Wang L, Feng S, Fan J (2014) Hierarchical clustering algorithm for categorical data using a probabilistic rough set model. Knowl-Based Syst 65:60–71CrossRef Li M, Deng S, Wang L, Feng S, Fan J (2014) Hierarchical clustering algorithm for categorical data using a probabilistic rough set model. Knowl-Based Syst 65:60–71CrossRef
go back to reference Lingras P (2002) Rough set clustering for web mining. In: IEEE FUZZ, pp 1039–1044 Lingras P (2002) Rough set clustering for web mining. In: IEEE FUZZ, pp 1039–1044
go back to reference Lingras P, Peters G (2012) Applying rough set concepts to clustering. Rough Sets: selected methods and applications in management and engineering. Springer, London, pp 23–38 Lingras P, Peters G (2012) Applying rough set concepts to clustering. Rough Sets: selected methods and applications in management and engineering. Springer, London, pp 23–38
go back to reference Lingras P, West C (2004) Interval set clustering of web users with rough k-means. J Intell Inform Syst 23(1):5–16CrossRefMATH Lingras P, West C (2004) Interval set clustering of web users with rough k-means. J Intell Inform Syst 23(1):5–16CrossRefMATH
go back to reference Maji P, Pal SK (2008) RFCM: a hybrid algorithm using rough and fuzzy sets. Fundam Inform 80(4):475–496MathSciNetMATH Maji P, Pal SK (2008) RFCM: a hybrid algorithm using rough and fuzzy sets. Fundam Inform 80(4):475–496MathSciNetMATH
go back to reference Maji P, Pal SK (2010) Fuzzy-rough sets for information measures and selection of relevant genes from microarray data. IEEE Trans Syst Man Cybern Part B 40(3):741–752CrossRef Maji P, Pal SK (2010) Fuzzy-rough sets for information measures and selection of relevant genes from microarray data. IEEE Trans Syst Man Cybern Part B 40(3):741–752CrossRef
go back to reference Maji P, Paul S (2013) Rough-fuzzy clustering for grouping functionally similar genes from microarray data. IEEE/ACM Trans Comput Biol Bioinform 10(2):286CrossRef Maji P, Paul S (2013) Rough-fuzzy clustering for grouping functionally similar genes from microarray data. IEEE/ACM Trans Comput Biol Bioinform 10(2):286CrossRef
go back to reference Masson M, Denoeux T (2008) ECM: An evidential version of the fuzzy c-means algorithm. Pattern Recogn 41:1384–1397CrossRefMATH Masson M, Denoeux T (2008) ECM: An evidential version of the fuzzy c-means algorithm. Pattern Recogn 41:1384–1397CrossRefMATH
go back to reference Mi H (2011) Discovering local outlier based on rough clustering. In: 3rd International workshop on intelligent systems and applications (ISA), IEEE, pp 1–4 Mi H (2011) Discovering local outlier based on rough clustering. In: 3rd International workshop on intelligent systems and applications (ISA), IEEE, pp 1–4
go back to reference Ng MK, Li MJ, Huang JZ, He Z (2007) On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Trans Pattern Anal Mach Intell 29(3):503–507CrossRef Ng MK, Li MJ, Huang JZ, He Z (2007) On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Trans Pattern Anal Mach Intell 29(3):503–507CrossRef
go back to reference Nguyen HS, Pal SK, Skowron A (2011) Rough sets and fuzzy sets in natural computing. Theor Comput Sci 412(42):5816–5819MathSciNetCrossRef Nguyen HS, Pal SK, Skowron A (2011) Rough sets and fuzzy sets in natural computing. Theor Comput Sci 412(42):5816–5819MathSciNetCrossRef
go back to reference Parmer D, Wu T, Blackhurst J (2007) MMR: an algorithm for clustering categorical data using rough set theory. Data Knowl Eng 63:879–893CrossRef Parmer D, Wu T, Blackhurst J (2007) MMR: an algorithm for clustering categorical data using rough set theory. Data Knowl Eng 63:879–893CrossRef
go back to reference Peters G (2006) Some refinements of rough k-means clustering. Pattern Recogn 39:1481–1491CrossRefMATH Peters G (2006) Some refinements of rough k-means clustering. Pattern Recogn 39:1481–1491CrossRefMATH
go back to reference Skowron A, Jankowski A, Swiniarski RW (2013) 30 years of rough sets and future perspectives. In: RSFDGrC, Springer, Halifax, Canada, LNCS 8170, pp 1–10 Skowron A, Jankowski A, Swiniarski RW (2013) 30 years of rough sets and future perspectives. In: RSFDGrC, Springer, Halifax, Canada, LNCS 8170, pp 1–10
go back to reference Suri NNRR, Murty MN, Athithan G (2011) Data mining techniques for outlier detection, chap 2. In: Zhang Q, Segall RS, Cao M (eds) Visual analytics and interactive technologies: data, text and web mining applications. IGI Global, New York, pp 22–38 Suri NNRR, Murty MN, Athithan G (2011) Data mining techniques for outlier detection, chap 2. In: Zhang Q, Segall RS, Cao M (eds) Visual analytics and interactive technologies: data, text and web mining applications. IGI Global, New York, pp 22–38
go back to reference Suri NNRR, Murty MN, Athithan G (2012) An algorithm for mining outliers in categorical data through ranking. In: Proceedings of 12th international conference on hybrid intelligent systems (HIS), IEEE, Pune, India, pp 247–252 Suri NNRR, Murty MN, Athithan G (2012) An algorithm for mining outliers in categorical data through ranking. In: Proceedings of 12th international conference on hybrid intelligent systems (HIS), IEEE, Pune, India, pp 247–252
go back to reference Suri NNRR, Murty MN, Athithan G (2013) A rough clustering algorithm for mining outliers in categorical data. In: Proceedings of 4th international conference on pattern recognition and machine intelligence (PReMI), Springer, Kolkata, India, LNCS 8251, pp 170–175 Suri NNRR, Murty MN, Athithan G (2013) A rough clustering algorithm for mining outliers in categorical data. In: Proceedings of 4th international conference on pattern recognition and machine intelligence (PReMI), Springer, Kolkata, India, LNCS 8251, pp 170–175
Metadata
Title
Detecting outliers in categorical data through rough clustering
Authors
N. N. R. Ranga Suri
M. Narasimha Murty
G. Athithan
Publication date
01-09-2016
Publisher
Springer Netherlands
Published in
Natural Computing / Issue 3/2016
Print ISSN: 1567-7818
Electronic ISSN: 1572-9796
DOI
https://doi.org/10.1007/s11047-015-9489-2

Other articles of this Issue 3/2016

Natural Computing 3/2016 Go to the issue

OriginalPaper

Anytime pack search

Premium Partner