GEMS: Gene Expression Module Sampler - Overview
This page is a mirror. Original document is located here.
Authors: Chang-Jiun Wu and Simon Kasif
Keywords: biclustering, biclustering, two-way clustering, microarray, gene expression, data mining, Gibbs sampling, module
Recent advances in high throughput profiling of gene expression have catalyzed an explosive growth in functional genomics aimed at the elucidation of genes that are differentially expressed in different tissue or cell types across a range of experimental conditions. Traditional clustering methods such as hierarchical clustering, or principal component analysis are difficult to deploy effectively for several of these tasks since genes rarely exhibit similar expression pattern across a wide range of conditions. Biclustering (also referred to as co-clustering, two way clustering, projective clustering, block clustering) of gene expression data is a promising methodology for identification of gene groups that show a coherent expression profile across a subset of conditions. While biclustering was introduced in statistics in 1974 few robust and efficient solutions exist. Here we propose a simple but promising new approach for biclustering based on a Gibbs sampling paradigm. Our algorithm is implemented in the program GEMS (Gene Expression Module Sampler). GEMS had been tested on published leukemia data sets, as well as on synthetic data generated to evaluate the effect of noise on the performance of the algorithm. In our preliminary studies we showed that GEMS is a reliable, flexible and computationally efficient approach for biclustering gene expression data. These biclusters are potential targets for genes that are functionally related or co-regulated by common transcription factors. The samples produced by the algorithm can potentially suggest sub-classes of diseases and can serve as a diagnostic tool.