research-article

A unified framework of density-based clustering for semi-supervised classification

Authors:
Jadson Castro Gertrudes

University of São Paulo, São Carlos, SP, Brazil

University of São Paulo, São Carlos, SP, Brazil
View Profile

,
Arthur Zimek

University of Southern Denmark, Odense, Denmark

University of Southern Denmark, Odense, Denmark
View Profile

,
Jörg Sander

University of Alberta, Edmonton, AB, Canada

University of Alberta, Edmonton, AB, Canada
View Profile

,
Ricardo J. G. B. Campello

James Cook University, Townsville, QLD, Australia

James Cook University, Townsville, QLD, Australia
View Profile

SSDBM '18: Proceedings of the 30th International Conference on Scientific and Statistical Database ManagementJuly 2018Article No.: 11Pages 1–12https://doi.org/10.1145/3221269.3223037

Published:09 July 2018Publication History

SSDBM '18: Proceedings of the 30th International Conference on Scientific and Statistical Database Management

Pages 1–12

ABSTRACT

Semi-supervised classification is drawing increasing attention in the era of big data, as the gap between the abundance of cheap, automatically collected unlabeled data and the scarcity of labeled data that are laborious and expensive to obtain is dramatically increasing. In this paper, we introduce a unified framework for semi-supervised classification based on building-blocks from density-based clustering. This framework is not only efficient and effective, but it is also statistically sound. Experimental results on a large collection of datasets show the advantages of the proposed framework.

References

M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander. 1999. OPTICS: ordering points to identify the clustering structure. In Proc. ACM SIGMOD. 49--60. Google ScholarDigital Library
A. J. L. Batista, R. J. G. B. Campello, and J. Sander. 2016. Active Semi-Supervised Classification Based on Multiple Clustering Hierarchies. In Proc. DSAA. 11--20.Google Scholar
M. Belkin, P. Niyogi, and V. Sindhwani. 2006. Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples. JMLR 7 (2006), 2399--2434. Google ScholarDigital Library
C. Böhm and C. Plant. 2008. Hissclu: a hierarchical density-based method for semi-supervised clustering. In Proc. EDBT. 440--451. Google ScholarDigital Library
R. J. G. B. Campello, D. Moulavi, and J. Sander. 2013. Density-based clustering based on hierarchical density estimates. In Proc. PAKDD. 160--172.Google Scholar
R. J. G. B. Campello, D. Moulavi, A. Zimek, and J. Sander. 2013. A framework for semi-supervised and unsupervised optimal extraction of clusters from hierarchies. Data Min. Knowl. Discov. 27, 3 (2013), 344--371.Google ScholarCross Ref
R. J. G. B. Campello, D. Moulavi, A. Zimek, and J. Sander. 2015. Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection. ACM TKDD 10, 1 (2015), 1--51. Google ScholarDigital Library
O. Chapelle, B. Schölkopf, and A. Zien. 2006. Introduction to Semi-Supervised Learning. MIT Press, Cambridge, MA, Chapter 1, 1--12.Google Scholar
C. A. R. de Sousa. 2015. An overview on the Gaussian Fields and Harmonic Functions method for semi-supervised learning. In Proc. IJCNN. 1--8.Google ScholarCross Ref
C. A. R. de Sousa, S. O. Rezende, and G. E. A. P. A. Batista. 2013. Influence of Graph Construction on Semi-supervised Learning. In Proc. ECML PKDD. III. 160--175. Google ScholarDigital Library
M. C. P. de Souto, I. G. Costa, D. S. A. de Araujo, T. B. Ludermir, and A. Schliep. 2008. Clustering cancer gene expression data: a comparative study. BMC Bioinformatics 9 (2008).Google Scholar
J. Demšar. 2006. Statistical Comparisons of Classifiers over Multiple Data Sets. JMLR 7 (2006), 1--30. Google ScholarDigital Library
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. KDD. 226--231. Google ScholarDigital Library
F. Fontaine, M. Pastor, I. Zamora, and F. Sanz. 2005. Anchor-GRIND: Filling the Gap between Standard 3D QSAR and the GRid-INdependent Descriptors. J. Med.Chem. 48, 7 (2005), 2687--2694.Google ScholarCross Ref
M. Friedman. 1937. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Amer. Statist. Assoc. 32, 200 (1937), 675--701.Google ScholarCross Ref
A. Gaulton, A. Hersey, M. Nowotka, A. P. Bento, J. Chambers, D. Mendez, P. Mutowo-Meullenet, F. Atkinson, L. J. Bellis, E. Cibrián-Uhalte, M. Davies, N. Dedman, A. Karlsson, M. P. Magariños, J. P. Overington, G. Papadatos, I. Smit, and A. R. Leach. 2017. The ChEMBL database in 2017. Nucleic Acids Res. 45 (2017), D945--D954.Google ScholarCross Ref
J. A. Hartigan. 1975. Clustering Algorithms. Wiley. Google ScholarDigital Library
A. K. Jain and R. C. Dubes. 1988. Algorithms for Clustering Data. Prentice-Hall. Google ScholarDigital Library
H.-P. Kriegel, P. Kröger, J. Sander, and A. Zimek. 2011. Density-based clustering. WIREs: Data Mining and Knowledge Discovery 1, 3 (2011), 231--240.Google ScholarCross Ref
H.-P. Kriegel, E. Schubert, and A. Zimek. 2017. The (black) art of runtime evaluation: Are we comparing algorithms or implementations? Knowl. Inf. Syst. 52, 2 (2017), 341--378. Google ScholarDigital Library
L. Lelis and J. Sander. 2009. Semi-supervised Density-Based Clustering. In Proc. IEEE ICDM. 842--847. Google ScholarDigital Library
J. Li, J. Sander, R. J. G. B. Campello, and A. Zimek. 2014. Active learning strategies for semi-supervised DBSCAN. In Proc. Canadian AI. 179--190.Google Scholar
M. Lichman. 2013. UCI Machine Learning Repository. (2013). http://archive.ics.uci.edu/mlGoogle Scholar
W. Liu and S.-F. Chang. 2009. Robust multi-class transductive learning with graphs. In Proc. IEEE CVPR. 381--388.Google ScholarCross Ref
D. Moulavi. 2014. Finding, Evaluating and Exploring Clustering Alternatives Unsupervised and Semi-supervised. Ph.D. Dissertation. University of Alberta.Google Scholar
P. Nemenyi. 1962. Distribution-free multiple comparisons. In Biometrics, Vol. 18. 263.Google Scholar
O. M. Rivera-Borroto, Y. Marrero Ponce, J. M. Garcia-de-la Vega, and R. d. C. GrauÁlbalo. 2011. Comparison of Combinatorial Clustering Methods on Pharmacological Data Sets Represented by Machine Learning-Selected Real Molecular Descriptors. J. Chem. Inf. Model. 51 (2011), 3036--3049.Google ScholarCross Ref
M. Sokolova and G. Lapalme. 2009. A systematic analysis of performance measures for classification tasks. Inf Process Manag 45, 4 (2009), 427--437. Google ScholarDigital Library
J. J. Sutherland, L. A. O'Brien, and D. F. Weaver. 2004. A Comparison of Methods for Modeling Quantitative Structure-Activity Relationships. J. Med.Chem. 47, 22 (2004), 5541--5554.Google ScholarCross Ref
M. Szummer and T. S. Jaakkola. 2002. Information Regularization with Partially Labeled Data. In Proc. NIPS. 1025--1032. Google ScholarDigital Library
J. Vanschoren, J. N. Van Rijn, B. Bischl, and L. Torgo. 2014. OpenML: networked science in machine learning. ACM SIGKDD Expl. Newsletter 15, 2 (2014), 49--60. Google ScholarDigital Library
K. Y. Yeung, M. Medvedovic, and R. E. Bumgarner. 2003. Clustering gene-expression data with repeated measurements. Genome Biol 4, 5 (2003), R34.Google ScholarCross Ref
L. Zhao, S. Luo, M. Tian, C. Shao, and H. Ma. 2006. Combining Label Information and Neighborhood Graph for Semi-supervised Learning. In Proc. ISNN - Advances in Neural Networks. 482--488. Google ScholarDigital Library
X. Zhu. 2005. Semi-Supervised Learning Literature Survey --- TR1530. Technical Report. University of Wisconsin, Madison.Google Scholar
X. Zhu, Z. Ghahramani, and J. D. Lafferty. 2003. Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. In Proc. ICML. 912--919. Google ScholarDigital Library
X. Zhu and A. B. Goldberg. 2009. Introduction to Semi-Supervised Learning. Morgan & Claypool Publishers. Google ScholarDigital Library

Index Terms

A unified framework of density-based clustering for semi-supervised classification
1. Computing methodologies
  1. Machine learning
    1. Learning settings
      1. Semi-supervised learning settings

Recommendations

A unified view of density-based methods for semi-supervised clustering and classification
Abstract
Semi-supervised learning is drawing increasing attention in the era of big data, as the gap between the abundance of cheap, automatically collected unlabeled data and the scarcity of labeled data that are laborious and expensive to obtain is ...
Read More
Density-based semi-supervised clustering

Semi-supervised clustering methods guide the data partitioning and grouping process by exploiting background knowledge, among else in the form of constraints. In this study, we propose a semi-supervised density-based clustering method. Density-based ...
Read More
Semi-supervised Density-Based Clustering
ICDM '09: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining

Most of the effort in the semi-supervised clustering literature was devoted to variations of the K-means algorithm. In this paper we show how background knowledge can be used to bias a partitional density-based clustering algorithm. Our work describes ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SSDBM '18: Proceedings of the 30th International Conference on Scientific and Statistical Database Management
July 2018
314 pages
ISBN:9781450365055
DOI:10.1145/3221269
Conference Chair:
Dimitris Sacharidis
TU Vienna
,
General Chair:
Johann Gamper
Free University of Bozen-Bolzano
,
Program Chair:
Michael Böhlen
University of Zurich
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 July 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
density-based clustering
semi-supervised classification
Qualifiers
- research-article
Conference

Acceptance Rates
SSDBM '18 Paper Acceptance Rate30of75submissions,40%Overall Acceptance Rate56of146submissions,38%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 187
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A unified framework of density-based clustering for semi-supervised classification

SSDBM '18: Proceedings of the 30th International Conference on Scientific and Statistical Database Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

A unified view of density-based methods for semi-supervised clustering and classification

Density-based semi-supervised clustering

Semi-supervised Density-Based Clustering