All caught up!

Amazingly, the Codon Usage Bias Database is all caught-up. This hasn’t been the case in years. It is now up-to-date with NCBI’s Microbial Complete Genomes Database.

To understand why this is so unusual, let’s examine the source of genome data for my database. The download process begins with NCBI’s list of Microbial Genomes (Bacterial and Archael). I identify all complete genomes, and pull their annotated files to my server.


There are now 5534 complete genomes in the CUB-DB. There are actually more listed on NCBI, but some have frame shifts that make codon identification impossible, or they have too few Ribosomal Protein Coding Genes which makes many of the CUB computations impossible.

So why is it so difficult to keep up-to-date? New genomes are added all the time. I must run all of the algorithms against these new genomes, and some of them (in particular the GA and mSCCI algorithms: you know, mine) take hours per genome to compute. Add to this that NCBI routinely updates already existing genomes, and you can see that it is a never ending battle to compute both the new gene bias levels as well as keep updated on those that have been modified.

But as of this morning, I am all caught-up. It’ll probably last about a nanosecond!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s