This directory contains example data for testing gclust software.
The CZ36 dataset consists of 221,764 proteins in 14 cyanobacteria, five
photosynthetic bacteria, seven plastid-containing organisms (plants and algae,
plus malaria parasite), four non-photosynthetic eukaryotes, four
non-photosynthetic bacteria, and two Archaea.
Follow the instructions given in procedure.txt.
Necessary software is:
formatdb and blastall (NCBI)
siseq (from my web site, http://nsato4.c.u-tokyo.ac.jp/old/Siseq.html)
gclust (from gclust web site, http://gclust.c.u-tokyo.ac.jp/)

A test script, test.sh, is provided in the current directory.
If you invoke test.sh with an option (please refer to the script), you will
easily perform clustering with gclust.

Example)

test.sh 114 &
tail -f log114

This will display progress of processing.

See 'procedure.txt' for the record of all processing for the production of CZ36
data. 

Naoki Sato
naokisat@bio.c.u-tokyo.ac.jp
