The availability of large microarray data has brought along many challenges for biological data mining. Following Cheng and Church , many different biclustering methods have been widely used to find appropriate subsets of experimental conditions. Still no paper directly optimizes or bounds the Mean Squared Residue (MSR) originally suggested by Cheng and Church. Their algorithm, for a given expression matrix
and an upper bound on MSR, finds
almost non overlapping biclusters whose sizes are not predefined thus making it difficult to compare with other methods.
In this paper, we propose two new Mean Squared Residue (MSR) based biclustering methods. The first method is a dual biclustering algorithm which finds (
)-bicluster with MSR using a greedy approach. The second method combines dual biclustering algorithm with quadratic programming. The dual biclustering algorithm reduces the size of the matrix, so that the quadratic program can find an optimal bicluster reasonably fast. We control bicluster overlapping by changing the penalty for reusing cells in biclusters. The average MSR in biclusterings for yeast is almost the same as for the proposed dual biclustering while the median MSR is 1.5 times larger thus implying that the quadratic program finds much better smaller biclusters.