Additional
information
Nomenclature
of sequence ID and tracking links among different datasets
Sequence
ID is a combination of genome
name and sequence name. Genome
names are defined in the Search Menu window of each dataset.
Chloroplast and mitochondrial genomes are indicated by suffixes ecf and emtf, respectively. Sequence name is
usually identical to locus tag as defined in the RefSeq or GenBank
database. In eukaryotic organisms, alternative splicing results in
multiple splicing variants, which are distinguished by the gi numbers. Therefore, in
Arabidopsis thaliana, for example, a sequence ID such as
eATH_AT1G79040_15219268f is used. For search with sequence ID in the
Search Menu, a wild card e*f
can be used, such as eATH_AT1G79040*f. For sequences from JGI and other
databases, the protein code used in the original database is used as
sequence ID. This nomenclature is simple and gives a direct link to
external databases.
Unfortunately,
during the update of Gclust database, sequence ID has been changed due
to change of source from JGI to GenBank or update of RefSeq database.
However, link among different datasets may be traced by using the BLAST search utility in the Gclust
server.
Update and versions of database
and datasets
Version number of
the database is indicated at the right up corner of the window,
such as e2006-06f. Update of the whole database is scheduled every 6
months. However, in each update of the database, old datasets are
retained. The version of dataset is
shown by a suffix. For example, in eCZ16Yif, CZ16 is the set of
genomes, Y is the version of original data, and i is the version of
clustering. In eALL145_0xf, ALL145 represents the genome set, while 0x
indicates the version of clustering. All previous versions of datasets
will be retained on the server, even after addition of a renewed
dataset containing a similar genome set.
Latest version
The latest version is 2010-04, which includes the dataset
Gclust2010. The data are compatible with the CyanoClust version 4, which is
specialized in comparing cyanobacteria and plastids.
Availability
of data
Many of the clustering data are available for
download from the web site.
Copyright © 2006-2010 Sato Lab. All Rights Reserved.
Last update: May 31, 2010