VGO how-to documentation

VGO is a Java-based interface for viewing and searching genomes. It can be used to display information about a genome, including its genes, ORFs, and start/stop codons. It can also be used to perform regular expression, fuzzy motif, LCS, and mass list searches.

VGO allows the user to identify related genes in multiple sequences. It also allows one to see how their search results compare to the identified genes, ORFs, and start/stop codons.

Open New Genomes in VGO From VOCs Database From Fasta From GenBank From BBB	Analysis in VGO Gene Analysis tools Import Analysis into VGO Display Graph
Viewing Sequence Information in VGO Sequence Map Legend View the details of the current sequence Viewing Top/Bottom Strand DNA View a 6 frame amino acid translation of the selected sequence View information about a gene Viewing Options for the Sequence Display Window	Searching in VGO Search Options Regular Expression Search Fuzzy Motif Search GFS (Genome-based fingerprint scanning)

Open New Genomes in VGO

From the VOCs database into VGO
To load sequences from the VOCs database into VGO, go to the sequence listing window. VGO can only load sequences from one database at a time; if the correct database is not currently open, select Choose Database from the File menu.
Database Chooser
Select the database that you want, and click the Choose button.
Once you have the correct database selected, go to the File menu, choose Open, then choose Open from VOCs DB. The following dialog window will appear.
Sequence Chooser
Choose the sequences that you want to load (multiple sequences can be selected by holding down the Apple key (Mac) or CTRL (PC)), then click on the OK button.
From a Fasta File
To load sequences from a .fasta file, go to the sequence listing window. Go to the File menu, choose Open, then choose Open from Fasta. Select the file that you want to load, and each sequence will be loaded.
Note: Fasta files do not contain any annotations, only sequence data.
From a GenBank File
To load sequences from a GenBank file, go to the sequence listing window. Go to the File menu, choose Open, then choose Open from GenBank. Select the file that you want to load, and each sequence will be loaded, along with their annotations.
From a Base-by-Base File
To load sequences from a GenBank file, go to the sequence listing window. Go to the File menu, choose Open, then choose Open from BaseByBase. Select the file that you want to load, and each sequence will be loaded, along with any annotations that have previously been added to the file in Base by Base.

Viewing Sequence Information in VGO

Sequence Map Legend
To open the sequence map simply double click on the sequence you wish to view in the home window (sequence listing window). For multiple sequences, select them using the Apple key (Mac) or CTRL Key (PC) and then click the button.

This is a representation of the data mapping that is available within VGO. Here is an explanation of each of the items indicated above, and how to use them.

	Name	Description	Single Click	Double Click
1	Sequence Search Results	Maps the locations of sequence search hits. There may be as many searches done on the sequence as you please.	Selects the region covered by the hit and displays information about the hit in the status bar.	Opens a sequence viewer with the sequence represented by the clicked on hit displayed.
2	Imported Analysis	Shows the location of features returned by some external source and brought as input to VGO.	Selects the region covered by the feature selected.	Opens a sequence viewer with the sequence represented by the feature displayed.
3	Colorized Gene Analysis	Displays the genes of this genome colored based on particular properties. See the gene analysis page for more information.	Selects the region, displaying pertinent information on the status bar.	Opens a legend describing the meaning of the colors used.
4	Open Reading Frames	Displays the open reading frames in the 3 frames of the sequence.	Selects the region covered by the selected ORF.	Opens a sequence display showing the sequence for the selected ORF.
5	Start and Stop codons	Shows the location of all start (green) and stop (red) codons in the sequence.	No Action	No Action
6	Gene Features	Shows the location of all genes in this sequence. Genes covering more than one open reading frame will be indicated with a connector. Through view settings you may display the gene label as the gene name, the family number for the gene or nothing at all for compactness.	Selects the region indicated by the feature	Opens the Gene Data Window for this gene.

View the details of the current sequence

You can view the sequence details from either the sequence listing window (the original window) or the sequence display window.

Sequence Listing Window: From the View menu, choose the Virus Info menu item.
Sequence Display Window: From the View menu, choose the Virus Info menu item.

Viewing Top/Bottom Strand DNA

The Sequence Viewer window displays a numbered, scrollable listing of the entire DNA sequence for the currently selected organism. To get to this window, go to the View menu and chose either Top Strand DNA or Bottom Strand DNA.

This window listens to appropriate selections in the sequence map window. For instance, if you have an organism open in the sequence map window, with the Top Strand DNA window also open, you can see the results of selections made in the sequence map in the DNA window.

This window should be used to view DNA sequences, and relate them to information displayed on the sequence map. To copy sequence data to the clipboard, it is recommended that you use the “Genome Subsequence” facility, from the “View” menu.

View a 6 frame amino acid translation of the selected sequence.

The framed translation facility of VGO displays a 6 frame amino acid translation of a selected portion of sequence.
Framed Translation Comparison

To get a framed translation for one or more sequence segments, select portions of one or more organisms (via the Sequence Map). Then, in the main window, select those organisms you wish to display, then choose “Framed Translation” from the “View” menu.
Note: The framed translator can handle selections of no more than 10kb in length. If a larger selection is attempted, a warning dialog box will be displayed.
The framed translation window displays each amino acid in single letter abbreviated format with stop codons marked with a “*”. The ruler along the top marks each 10 bases in the DNA sequence with a tick, and each 20 bases are marked with their position in the entire sequence for the organism being displayed.
The sequence displayed on the reverse side is the complement of that on the forward strand. Genes occurring on the top are coded from left to right, and on the bottom from right to left.

View information about a gene

This window allows for closer analysis of a particular gene. To open this window, double click on a gene in the sequence map window.

In this window, you are able to view:

The name of this gene as stored in the database
The multiple alignment of this particular gene’s family through the use of Jalview
The protein sequence for the gene (This information is selectable and may be used to copy into other applications, such as dotlet or other alignment programs)
The DNA sequence for the gene by clicking this button
Details of this gene, such as amino acid frequency, name, etc. by clicking this button
The most recent Tblastn, Psiblast or Blastp reports generated for this gene by clicking one of these buttons
The Blast reports for this gene through MView
The genes contained with in this particular gene’s family by clicking this button

Viewing Options for the Sequence Display Window
The following is a list of independent display options for the Sequence Display window. These options may be turned on or off for each individual sequence or for all sequences.

    Start/stop codons
    ORF with a minimum ORF length option
    Bottom strand
    Gene Labels
– Gene number
– Family number
– Short Gene number
– GenBank name
    Lane Descriptions
    GFS
    Repeat Regions
    BBB Comments
    BBB Primers

Searching in VGO

Search Options

On the top menu, under Analysis, you will find four search options listed.

Search All Sequences

This search searches all the sequences you currently have open with either a Fuzzy or Regular Expression search.

Search Selected Sequence

This search searches only the selected sequences with either a Fuzzy or Regular Expression search.

LCS Search

This search searches for the longest common subsequence (or a minimum length common substring) with a few options.

DLCS Search

This search searches for the shortest common subsequence with a variable number of mismatches.

GFS (Genome-based fingerprint scanning)

This search identifies the genomic origins of sample proteins by mapping their peptide mass fingerprint data directly to raw genomic sequence. GFS is developed by the Giddings lab in University of North Carolina.

Regular Expression Search

You can do a regular expression search from the sequence display window; select Reg. Expression Search from the Analysismenu. The following dialog will then appear.
Enter your regular expression in the textbox, and click on the OK button to perform the search, or the Cancel button to close the dialog window without performing the search.
Regular expressions allow one to search for precise patterns which may include optional sections and/or repeated sequences. For detailed help on regular expressions, please see The Perl Regular Expression page for more information.

Examples of Regular Expression Searching

Regular Expression	What it matches
ACT	ACT
[ AC ]T	A or C followed by T
AC[ ^T ]ACT	AC followed by anything BUT a T followed by ACT
ACT*	AC followed by 0 or more T’s
(ACT)*	ACT repeated 0 or more times
(ACT)+	ACT repeated 1 or more times
(ACT)?	ACT repeated 0 or 1 times
(ACT){n}	ACT repeated n times
(ACT){n,}	ACT repeated at least n times
(ACT){n,m}	ACT repeated at least n times but not more than m times
((AC)[ TA ]){n}	AC followed by T or A – repeated n times

Fuzzy Motif Search

You can do a fuzzy motif search from the sequence display window Select Fuzzy Motif Search from the Analysis menu. The following dialog will then appear.

Enter your fuzzy motif in the top textbox and enter the number of mismatches to allow in the lower textbox. Click on the OKbutton to perform the search, or the Cancel button to close the dialog window without performing the search.
The Fuzzy Motif Search allows users to enter in an expression pattern (see below for an explanation of the pattern grammar used) as well as a maximum number of mismatches tolerated in a search hit. VGO then searches marked sequences for this motif and displays the list of hits by location along the sequence. In addition to the ambiguities created by mismatches, users may enter in IUB ambiguity codes, which are also indicated below.

Examples of Fuzzy Motif Searching

Fuzzy Expression	What it matches
ACT	an A, C, T pattern
[ AC ]T	an A or a C followed by a T
{AC}T	Everything but an A or a C followed by a T
ACT{1,3}	An A, then C followed by 1, 2 or 3 T’s

Note: When counting mismatches, [] and {} count as a single match or mismatch. As well, if matching T(2,4) and only 1 T is found, this counts as a single mismatch.

Table of IUPAC Ambiguity Codes

IUPAC-IUB/GCG Code	Meaning	Complement
A	A	T
C	C	G
G	G	C
T/U	T	A
M	A or C	K
R	A or G	Y
W	A or T	W
S	C or G	S
Y	C or T	R
K	G or T	M
V	A or C or G	B
H	A or C or T	D
D	A or G or T	H
B	C or G or T	V
X/N	G or A or T or C	X
.	Not G or A or T or C	.

GFS (Genome-based fingerprint scanning)

You can do a GFS search from the sequence display window Select GFS from the Analysis menu.
Enter your mass list in the textbox labeled Mass List, or click on the Load Mass List File button and select a file that contains the list of masses. When you are ready, click on the OK button to perform the search, or the Cancel button to close the dialog window without performing the search. You may see another dialog window depending on your preferences where you can enter the parameters for the GFS search.

Analysis in VGO

Gene analysis tools

There are currently 3 properties that can be used to color-code genes on the sequence map in VGO. These properties are: Base Composition, Amino Acid Composition and Family Representation. When this menu item is selected, a dialog appears requesting parameters to display. Once these are set, VGO maps this data on the screen. To see a longer explanation of the parameters chosen and a range of values, double click on one of the resulting gene features.

Base Composition: This colors genes based on the percentage of one or more nucleotide bases that they contain.
Amino Acid Composition: This colors genes based on the percentage of one or more amino acids that they contain.
Family Representation: This maps the virus frequency of each gene in the displayed genome. This is to say, it maps based on the number of viruses represented in the particular gene’s family.

Import Analysis into VGO

VGO allows for the import of analyses done externally.

Currently, this is done through the use of flat text files which contain descriptions of specific regions of interest. This information includes the start position, the end position, the strand and a description of the regions. Any number of different analyses may be contained within a file, and any number of regions may be contained within an analysis.

To open a file, select “Import Analysis” from the “Analysis” menu on the sequence map window.

The file format is as follows:

>Analysis 1 name
start|stop|strand|description|color

start|stop|strand|description|color
…
>Analysis 2 name
…
etc.

An Example:

>First VGO Import Analysis Example
100|400|POSITIVE|region 1|2F4F4F
500|600|NEGATIVE|region 2|2E8B57
>Second VGO Import Analysis Example
400|500|POSITIVE|region 3|A52A2A
600|700|NEGATIVE|region 4|A52A2

This example would produce the following mapping in the VGO sequence map

Note: The analysis will fail to import (without any error messages) if the analysis file is formatted wrong. The most common issue is if the Start value is larger than the Stop value. Therefore, for genes on the negative strand, it is necessary to reverse the start and stop positions.

Display Graph of Nucleotide Base Content

Currently, VGO is able to plot nucleotide composition, sampled at customizable rates, along the entire genome. To get this information, select “Display Graph” from the “Analysis” menu in the sequence map. The currently active panel will then display a graph of the percent composition of the bases chosen.
This graph panel will appear at the bottom of the sequence map and display the following information:

A plot line in red. This plots the data long the sequence at the scale currently displayed in the map window
A description of the data being plotted
The sampling window size, which is customizable through the use of a slider
The minimum percent composition along the entire genome (Y-axis minimum)
The maximum percent composition along the entire genome (Y-axis maximum)
The mean percent composition along the currently viewable portion of sequence. This is displayed in blue.

Sequence Map Display with Graph
Graph calculations are done using a sliding window scale, sampled at regular fractions of the window size. By default, the fraction of the window size used to sample the data is 1/3. This means that with a window size of 60, data will be sampled every 20 bases. In addition, the information displayed at every 20th base will be the average from that position forward to the end of the window. For performance reasons, only the data currently displayed on the screen will be sampled. A side effect of this is that the the shape of the graph may appear to change as you scroll left or right through the map display.

VGO how-to documentation