Description: Clustering is one of the most important data mining techniques used to extract useful information from microarray data. Microarray data sets can be either clustered by samples or by genes. In this research we focus on the gene clustering problem. The objective of gene clustering is to group genes with similar expression patterns together with the common belief that those genes often have similar functions, participate in a particular pathway or response to a common environmental stimulus. Although hundreds of clustering algorithms exist, the very simple Kmeans and its variants remain among the most widely used algorithms for gene clustering by biologists and practitioners. This surprising fact may be attributed to its especial ease of implementation and use. When microarray data are normalized to zero mean and unit norm, a variant of the Kmeans algorithm that works with the normalized data would be more suitable. Since the data points are on a unit hypersphere, the algorithm is called the Spherical Kmeans algorithm (SPKmeans).
