Stream clustering algorithms are traditionally designed to process streams efficiently and to adapt to the evolution of the underlying population. This is done without assuming any prior knowledge about the data. However, in many cases, a certain amount of domain or background knowledge is available, and instead of simply using it for the external validation of the clustering results, this knowledge can be used to guide the clustering process. In non-stream data, domain knowledge is exploited in the context of
In this paper, we extend the static semi-supervised learning paradigm for streams. We present C-DenStream, a density-based clustering algorithm for data streams that includes domain information in the form of constraints. We also propose a novel method for the use of background knowledge in data streams. The performance study over a number of real and synthetic data sets demonstrates the effectiveness and efficiency of our method. To our knowledge, this is the first approach to include domain knowledge in clustering for data streams.