The Zhou Lab

Supplemental material

1. Microarray Data Used in This Study

We integrated all yeast microarray data sets (up to January 2003), each containing at least 8 experiments, from Stanford Microarray Database (SMD) (19 cDNA array data sets corresponding to 19 SMD Subcategories) from the NCBI Gene Expression Omnibus (GEO) (4 Affymetrix data sets). We also included two cDNA array data sets (11390663 , 10657304) and the Rosetta Compendium data generated by Rosetta Inpharmatics Company. Besides Rosetta Compendium, all data sets each contained a set of expression profiles measured under relevant and coherent conditions. The Rosetta Compendium includes 300 knockout and chemical treatment experiments. Among those experiments, we only took experiments in which the deleted genes have known functions, and we further classify this subset of experiments into 14 data sets based on the GeneOntology (GO) biological process categories of the deleted genes. This resulted in a total of 39 data sets comprising 618 expression profiles. Below is the detailed description of the data sets.

# Publication Platform Experiments
1 9843569 cDNA(SMD) Alpha factor release
2 9843569 cDNA(SMD) cdc15 block release6
3 11102521 cDNA(SMD) DTT Exposure
4 9843569 cDNA(SMD) Elutriation
5 9843569 cDNA(SMD) Forkhead regulation
6 11102521 cDNA(SMD) Gamma radiation
7 11102521 cDNA(SMD) Menadione exposure
8 11598186 cDNA(SMD) DNA damage (MMS) response
9 11102521 cDNA(SMD) Nitrogen depletion
10 11102521 cDNA(SMD) Nutrition limitation
11 11102521 cDNA(SMD) Osmotic shock
12 11455386 cDNA(SMD) SIR proteins (Chromatin Silencing)
13 11102521 cDNA(SMD) Sorbitol effects
14 11102521 cDNA(SMD) H2O2 response
15 11102521 cDNA(SMD) Heat shock
16 11102521 cDNA(SMD) Heat steady
17 11206552 cDNA(SMD) CellCycle Factor
18 11102521 cDNA(SMD) YPD Stationary phase
19 11102521 cDNA(SMD) Zinc homoeostasis
20 12875747 Affymetrix(GEO) Aging
21 14555471 Affymetrix(GEO) Chitin synthesis
22 12702272 Affymetrix(GEO) Fermentation time course
23 12370439 Affymetrix(GEO) Ume6 regulon
24 10929718 Affymetrix(Rosetta Compendium) Cell cycle control
25 10929718 Affymetrix(Rosetta Compendium) Cell wall organization
26 10929718 Affymetrix(Rosetta Compendium) Chromatin assembly
27 10929718 Affymetrix(Rosetta Compendium) Ion homeostasis
28 10929718 Affymetrix(Rosetta Compendium) Nucleotide metabolism
29 10929718 Affymetrix(Rosetta Compendium) Organelle biogenesis
30 10929718 Affymetrix(Rosetta Compendium) Perception of external stimulus
31 10929718 Affymetrix(Rosetta Compendium) Protein biosynthesis
32 10929718 Affymetrix(Rosetta Compendium) Protein degradation
33 10929718 Affymetrix(Rosetta Compendium) Protein metabolism
34 10929718 Affymetrix(Rosetta Compendium) Protein phosphorylation
35 10929718 Affymetrix(Rosetta Compendium) Protein transport
36 10929718 Affymetrix(Rosetta Compendium) Pseudohyphal growth
37 10929718 Affymetrix(Rosetta Compendium) Steroid metabolism
38 11390663 cDNA(Rosetta Inpharmatics Company) Pseudohyphal growth
39 10657304 cDNA(Rosetta Inpharmatics Company) MAPK pathway

2.Functional annotation of uncharacterized genes

The large number of functionally homogenous clusters identified by CODENSE provides a solid foundation for functional annotation of uncharacterized genes. For those clusters containing unknown genes, we annotated them with the most dominating functional category. To assess the prediction accuracy of our method, we employed a "leave-one-out" approach by masking a known gene to be unknown, and assign its function based on the remaining known genes in the cluster.

We have assigned functions to 448 known genes, and achieved a prediction accuracy of 50%.