Submitting a Viral Genome Entry to GenBank after GATU
Although GATU generates a “GenBank file” – GenBank asks for a “FASTA file + Feature Table” if you use SEQUIN.
Fortunately, you don’t have to go through the whole annotation procedure again, but can take advantage of a file export option in the ARTEMIS program.
Important note: Starting off with the right sequence identifier (SeqID; see below) and using it correctly in file names helps immensely.
The minimum requirement for submitting this file includes providing your own text to be used as a sequence identifier (SeqID), which is the text in between the ‘>’ and a space in the first line of the file, followed by optional text and the sequence. The SeqID will be used by programs in subsequent steps.
eg. >mySeqID The rest of this is other information.
Name the file: mySeqID.fasta
A closely related genome, in GenBank format, is used as the reference annotated sequence. The mySeqID.fasta is used as the target genome.
Name the file: mySeqID
Download and install ARTEMIS – it’s a Java program and very easy to run.
Open your mySeqID in ARTEMIS (it may be necessary need to adjust file format to “all files” so you can select a GenBank file.
Now select “Save An Entry As” and choose “Sequin Table Format”.
Name the file: mySeqID.ft
Important note: Check that the SeqID has been correctly imported into the Feature Table. It should appear as:
Now you can use the mySeqID.ft and mySeqID.fasta files in SEQIN.
If you have multiple genomes, it may be easier to use TBL2ASN to create a set of ASN.1 files from the pairs of mySeqID.ft and mySeqID.fasta files.
In FASTA format the line before the nucleotide sequence, called the FASTA definition line, must begin with a carat (“>”), followed by a unique SeqID (sequence identifier). The SeqID must be unique for each nucleotide sequence and should not contain any spaces. Use of brackets (“”) in the SeqID is also prohibited. The identifier will be replaced with an Accession number by the database staff when your submission is processed.