Submitting a Viral Genome Entry to GenBank after GATU

Submitting a Viral Genome Entry to GenBank after GATU


Although GATU generates a “GenBank file” – GenBank asks for a “FASTA file + Feature Table” if you use SEQUIN.
Fortunately, you don’t have to go through the whole annotation procedure again, but can take advantage of a file export option in the ARTEMIS program.
Important note: Starting off with the right sequence identifier (SeqID; see below) and using it correctly in file names helps immensely.  

Part A. Prepare a FASTA file of the genome to be annotated.

The minimum requirement for submitting this file includes providing your own text to be used as a sequence identifier (SeqID), which is the text in between the ‘>’ and a space in the first line of the file, followed by optional text and the sequence. The SeqID will be used by programs in subsequent steps.
eg.  >mySeqID The rest of this is other information.
ACGATCGATCAGCATCGATCAGACTACG
CGCATCAGCATAGGACGACGACGATACG
ACACGATCGACTACGACTCAGACTCAGA
Name the file:   mySeqID.fasta

Part B. Annotate the genome with GATU.

A closely related genome, in GenBank format, is used as the reference annotated sequence. The mySeqID.fasta is used as the target genome.

Save the newly annotated target genome in GenBank format.

Name the file: mySeqID

Part C. Create a Feature Table.

Download and install ARTEMIS – it’s a Java program and very easy to run.
Run ARTEMIS.
Open your mySeqID in ARTEMIS (it may be necessary need to adjust file format to “all files” so you can select a GenBank file.
Now select “Save An Entry As” and choose “Sequin Table Format”.
Name the file: mySeqID.ft
Important note: Check that the SeqID has been correctly imported into the Feature Table. It should appear as:
>Feature mySeqID

Part D. Submitting files to GenBank.

Now you can use the mySeqID.ft and mySeqID.fasta files in SEQIN.
If you have multiple genomes, it may be easier to use TBL2ASN to create a set of ASN.1 files from the pairs of mySeqID.ft and mySeqID.fasta files.

Links:

GENBANK submission at NCBI

http://www.ncbi.nlm.nih.gov/genbank/submit.html

SEQUIN at NCBI

http://www.ncbi.nlm.nih.gov/projects/Sequin/

TBL2ASN at NCBI

http://www.ncbi.nlm.nih.gov/genbank/tbl2asn2.html

SeqID and FASTA Format for Nucleotide Sequences

In FASTA format the line before the nucleotide sequence, called the FASTA definition line, must begin with a carat (“>”), followed by a unique SeqID (sequence identifier). The SeqID must be unique for each nucleotide sequence and should not contain any spaces. Use of brackets (“[]”) in the SeqID is also prohibited. The identifier will be replaced with an Accession number by the database staff when your submission is processed.

ARTEMIS at Wellcome Trust Sanger Institute