Submitting a Viral Genome Entry to GenBank after GATU
Although GATU generates a “GenBank file” – GenBank asks for a “FASTA file + Feature Table” if you use SEQUIN.
Fortunately, you don’t have to go through the whole annotation procedure again, but can take advantage of a file export option in the ARTEMIS program.
Important note: Starting off with the right sequence identifier (SeqID; see below) and using it correctly in file names helps immensely.
The minimum requirement for submitting this file includes providing your own text to be used as a sequence identifier (SeqID), which is the text in between the ‘>’ and a space in the first line of the file, followed by optional text and the sequence. The SeqID will be used by programs in subsequent steps.
eg. >mySeqID The rest of this is other information.
ACGATCGATCAGCATCGATCAGACTACG
CGCATCAGCATAGGACGACGACGATACG
ACACGATCGACTACGACTCAGACTCAGA
Name the file: mySeqID.fasta
A closely related genome, in GenBank format, is used as the reference annotated sequence. The mySeqID.fasta is used as the target genome.
Name the file: mySeqID
Download and install ARTEMIS – it’s a Java program and very easy to run.
Run ARTEMIS.
Open your mySeqID in ARTEMIS (it may be necessary need to adjust file format to “all files” so you can select a GenBank file.
Now select “Save An Entry As” and choose “Sequin Table Format”.
Name the file: mySeqID.ft
Important note: Check that the SeqID has been correctly imported into the Feature Table. It should appear as:
>Feature mySeqID
Now you can use the mySeqID.ft and mySeqID.fasta files in SEQIN.
If you have multiple genomes, it may be easier to use TBL2ASN to create a set of ASN.1 files from the pairs of mySeqID.ft and mySeqID.fasta files.
Links:
http://www.ncbi.nlm.nih.gov/genbank/submit.html
http://www.ncbi.nlm.nih.gov/projects/Sequin/
http://www.ncbi.nlm.nih.gov/genbank/tbl2asn2.html
In FASTA format the line before the nucleotide sequence, called the FASTA definition line, must begin with a carat (“>”), followed by a unique SeqID (sequence identifier). The SeqID must be unique for each nucleotide sequence and should not contain any spaces. Use of brackets (“[]”) in the SeqID is also prohibited. The identifier will be replaced with an Accession number by the database staff when your submission is processed.