Additional information -- Gclust --

Additional information

Nomenclature of sequence ID and tracking links among different datasets
   Sequence ID is a combination of genome name and sequence name. Genome names are defined in the Search Menu window of each dataset. Chloroplast and mitochondrial genomes are indicated by suffixes ‘c’ and ‘mt’, respectively. Sequence name is usually identical to locus tag as defined in the RefSeq or GenBank database. In eukaryotic organisms, alternative splicing results in multiple splicing variants, which are distinguished by the gi numbers. Therefore, in Arabidopsis thaliana, for example, a sequence ID such as ‘ATH_AT1G79040_15219268’ is used. For search with sequence ID in the Search Menu, a wild card ‘*’ can be used, such as ‘ATH_AT1G79040*’. For sequences from JGI and other databases, the protein code used in the original database is used as sequence ID. This nomenclature is simple and gives a direct link to external databases.
   Unfortunately, during the update of Gclust database, sequence ID has been changed due to change of source from JGI to GenBank or update of RefSeq database. However, link among different datasets may be traced by using the BLAST search utility in the Gclust server.

Update and versions of database and datasets
   Version number of the database is indicated at the right up corner of the window, such as ‘2006-06’. Update of the whole database is scheduled every 6 months. However, in each update of the database, old datasets are retained. The version of dataset is shown by a suffix. For example, in ‘CZ16Yi’, CZ16 is the set of genomes, Y is the version of original data, and i is the version of clustering. In ‘ALL145_0x’, ALL145 represents the genome set, while 0x indicates the version of clustering. All previous versions of datasets will be retained on the server, even after addition of a renewed dataset containing a similar genome set.

Latest version
   The latest version is 2010-04, which includes the dataset Gclust2010. The data are compatible with the CyanoClust version 4, which is specialized in comparing cyanobacteria and plastids.

Availability of data
   Many of the clustering data are available for download from the web site.

Last update: May 31, 2010