2003 | OriginalPaper | Chapter
Generalized k-Medians Clustering for Strings
Authors : Carlos D. Martínez-Hinarejos, Alfons Juan, Francisco Casacuberta
Published in: Pattern Recognition and Image Analysis
Publisher: Springer Berlin Heidelberg
Included in: Professional Book Archive
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
Clustering methods are used in pattern recognition to obtain natural groups from a data set in the framework of unsupervised learning as well as for obtaining clusters of data from a known class. In sets of strings, the concept of set median string can be extended to the (set)k-medians problem. The solution of the k-medians problem can be viewed as a clustering method, where each cluster is generated by each of the k strings of that solution. A concept which is related to set median string is the (generalized) median string, which is an NP-Hard problem. However, different algorithms have been proposed to find approximations to the (generalized) median string. We propose extending the (generalized) median string problem to k strings, resulting in the generalizedk-medians problem, which can also be viewed as a clustering technique. This new technique is applied to a corpus of chromosomes represented by strings and compared to the conventional k-medians technique.