GATU how-to documentation
User Interface
Menu Bar
The File Menu
- “Preferences” allows you to define parameters for BLAST and NEEDLE and other annotation features within GATU; you can also set display preferences here.
- By selecting “BLAST/NEEDLE preferences (manual process only)”, you can select the BLAST parameters (ie. expect value, which matrix is used, word size etc.) for BLASTp, BLASTn, tBLASTn, psiBLAST and BLASTx. If you select the arrow at the top of this “Application Preferences” window, you will be able to select “NEEDLE” from the subsequent drop down menu; it is here that you can set the preferences for NEEDLE alignments (ie. matrix used, alignment format, gap penalty value etc.).
- To set the annotation preferences, select “Annotation Preferences (automatic process)”. You will be able to set preferences for the reference genes, unassigned ORFs, GATU genome map and GATU BLAST. These preferences include setting the gene location (top/bottom strand), seting the maximum % overlap for an ORF vs. an annotation, setting a minimum ORF length for possible annotations without a homolog, etc.
- In the “Display Preferences” window, you can set the window size and location of the menu bar.
- “Save as EMBL…” saves the annotations you selected for the genome to be annotated as an EMBL file
- “Save as XML (BSML)…” saves the annotations you selected for the genome to be annotated as an XML file
- “Save as GenBank…” saves the annotations you selected for the genome to be annotated as a GenBank file
- “Close” exits the Genome Annotator window
The Edit Menu
- “Unselect” deselect highlighted rows in any table and deselects selections in the genome maps.
- “New Annotation” appends a new row to the “Annotations” table. This allows for the manual definition of an annotation not found by the methods used in this application.
- “Find…” searches over a BLAST alignment, asks user to enter search text. Prior to selecting this feature you must select a row from the “Annotations” window.
- “Find next” searches as described in “Find” above with the last search text used
The Help Menu
- “About” summarizes the purpose of the Genome Annotator
- “Overview” displays an overview of the basic algorithm implemented
- “User Interface” introduces the components of the main window
- “Annotation Table” describes the view of the annotations main window
- “Unassigned-Orfs Table” describes the view of the unassigned ORFs main window
- “Reference Genes Table” describes the view of the reference genes main window
- “Genome Map” provides an overview of the features of the genome map
- “Buttons” explains the functions of each of the buttons
- “Menu Options” describes the GATU main menu and will direct you to this page
- “Preferences” details the preference settings within GATU
- “Tutorial” provides a step-by-step tutorial for the use of GATU
- “References” lists references used
Genome Selection
Genome Selection in the Stand-alone Version
To load either a reference genome or a genome to annotate click on the appropriate “Upload Genome File” button. If you select the button corresponding to the reference genome, you will be prompted to enter the name and location of the GenBank file you wish to use (GBxml or plain text format. Important: Make sure the selected file name does not contain any spaces). Once the file is read, the genes defined in the GenBank file are shown in the Reference Genes Table.
If you select the button corresponding to the genome to be annotated, you will be prompted for the GenBank (GBxml or plain text format) or FASTA file for the genome to be annotated. Once both files are read, the annotator can run a series of BLAST alignments (one for each gene in the reference genome).
To start the annotation process click the “Annotate” button.
Genome Selection Using the VOCs DB Admin Version
To load either a reference genome or a genome to annotate select a genome from the drop down menu. Once a selection for both genomes has been made the annotator can run a series of BLAST alignments (one for each gene in the reference genome).
To start the annotation process click the “Annotate” button.
The New Annotations Tab
In the “Annotations table” all possible gene annotations for the genome to be annotated (excluding unassigned ORFs) are displayed. These genes are either homologs of genes in the reference genome or ORFs of length greater than the ORF length defined in the preferences.
Annotations
Lists the gene annotations found for the genome to be annotated. Clicking on any row in the table results in the display of the DNA and corresponding protein sequences of the selected annotation in the narrow window directly below the “Annotations” window. Selecting a row in this table will highlight the corresponding gene in the “Genome Map” window while clicking on a column header will re-sort the list of annotations based on the column header you select. All columns are editable except for “Size”, “P.Size”, “Score” and “% Similarity”.
The columns in the table are as follows:
Product : the product property from the CDS tag in the GenBank file
Exon# : the exon number (ie 1st, 2nd, etc exon)
Start : start position of the gene
Stop : stop position of the gene
+/- : strand location of gene (“+” (5′-3′) or “-” (3′-5′))
Size : size of the gene (listed as number of basepairs)
P.Size : size of parent gene (gene in reference genome) in basepairs
GeneType : gene/fragment/mature peptide
Score : BLAST score from either the automatic BLAST run or the most recent manual BLAST run
% Similarity : BLAST similarity from either the automatic BLAST run or the most recent manual BLAST run
Accept : a box provided to allow you to select or de-select ORFs to either add or exclude them from the annotation. To include an ORF in the annotations, click on the corresponding box – a checkmark should appear in the box.
BLAST Alignments
NEEDLE Alignment
Unassigned ORFs table
Lists the unannotated ORFs found in the genome to be annotated. Clicking on any row in the table results in the display of the DNA and corresponding protein sequences of the selected ORF in the narrow window directly below the “Unassigned ORFs” window. Clicking on a column header will re-sort the list of unassigned ORFs based on the column header you select. The columns “ORF Name” and “Gene Type” are editable.
The columns in the table are as follows:
By default, GATU uses the application server running on leto.bioc.uvic.ca (142.104.33.4, TCP Port# 7777) to run the requested BLAST searches.
Reference Genes Table
Lists the genes of the reference genome. Clicking on any row in the table results in the display of the DNA and corresponding protein sequences of the reference gene in the narrow window directly below the “Reference Genes” window; the gene will also be highlighted in the “Genome Map” window. Clicking on a column header will re-sort the list of unassigned ORFs based on the column header you select. None of these columns are editable.
The columns in the table are:
By default GATU uses the application server running on leto.bioc.uvic.ca (142.104.33.4, TCP Port# 7777) to run the requested blasts.
The Genome Map
This is a graphical display of the reference genome and its genes/mature peptides (top genome) and the genome to be annotated with the potential gene annotations (bottom genome).
The uppermost genome map is that belonging to the reference genome and its annotations as displayed in the Reference Genes Table. The bottom genome map shows the genome to be annotated along with all accepted annotations from either the “Annotations table” or the “Unassigned-ORFs Table”.
Clicking on any gene shown in the map will display the name of the gene in the text box directly below the “Genome Map” window. The slider directly across from the text box provides a zoom function; you can increase the size of the image to a maximum of 20 times the original size. The reset button will reset the image to its original size and deselect all genes. The jump button will skip ahead in the genome map to the view of the genes selected in the “Annotations table”.
The GATU Buttons
These buttons provide you with access to the main functions of the applications – such as “Annotate”, “VGO” (graphical view of the annotated genome), “Base-By_Base” (an alignment editor) and “Save” features.
The Main Buttons of GATU
- “Annotate” will run the GATU BLAST to generate the list of genes to be annotated using the reference genome.
- “VGO” starts the Viral Genome Organizer program with the selected genes from the table.
- “BaseByBase” opens the Base-By-Base program and conducts an alignment using the alignment program you select in the subsequent window that pops up. The alignment is based on the first gene you select and the corresponding gene of the reference genome.
- “Save” exports the accepted annotations to a file. The file format depends on the selection you make: either GenBank, EMBL or XML (BSML).
Algorithm
Describes the basic algorithm used to annotate a genome
- Read all genes/mature peptides of the reference genome and display them in the “Reference Genes” table.
- Conduct a tBLASTn/BLASTn alignment for every gene/mature peptide of the reference genome against the genome sequence to be annotated (tBLASTn is used for single exon genes and BLASTn for multiple exon genes).
- Use the highest scoring hit and make this a possible new gene/mature peptide. If the reference gene starts with a start codon, extend this hit to a start codon for the sequence to be annotated. If the reference gene ends with a stop codon, extend this hit to a stop codon for the sequence to be annotated. If the reference gene has no internal start/stop codon (atg/tag, tga, taa), verify hit has no internal start/stop codon and use longest orf if hit has internal start/stop codon.
- Run a NEEDLE alignment for each annotation found and mapped for the newly-annotated genome against the reference gene/mature peptide.
- Find all possible ORFs and display ORFs not found in step 3 in the “Unassigned-ORFs” table.
- Display all possible new genes/mature peptides found in step 3.
- Manual review and/or manual modifications.
- Apply annotations to genome and/or save annotations to file (GenBank, EMBL or XML (BSML) file).