VOCs How-to Documentation
Using the three search menus
VOCs contains three search menus, or “Filters”, that can be used to search for Gene/Protein Sequences, Ortholog Groups, or Genomes of interest.
To use a filter:
- Click on a search specification tab at the top of the main VOCs window. (Pictured above)
- Using the submenus and check boxes provided, set the search conditions you wish.
- Finally, click on the relevant button at the bottom of the VOCs window to either count or view the genes, ortholog groups, or genomes that meet your search conditions.
For more detailed descriptions of these filters go to: Main Window
Find a gene by its number/name
Finding a gene by its upstream promoter
To search for genes with specific upstream promoters, first enable the “Upstream DNA Sequence” search (found under the “Proteins/DNA Selector” heading in the Sequence Query window) by clicking on the associated square box. Select the query conditions from the pull-down menu to the right, then type a regular expression specifying the promoter sequence of interest in the white box. (Help regarding regular expressions is available here.) Finally, click the “Gene View” button at the bottom of the main window to perform the search.
Finding a gene by its description
To search for genes of a specific type, first enable the “Gene Description” search (found under the “Proteins/DNA Selector” heading in the Sequence Query window) by clicking on the associated square box. Select the query conditions from the pull-down menu to the right, then type in the gene description such as “Hypothetical Protein” or “EEV phospholipase”. Finally, click the “Gene View” button at the bottom of the main window to perform the search.
Find a gene by its DNA sequence
First, click the square box beside the “Gene DNA Sequence” heading under the “Protein / DNA Selector” category in the Sequence Query window. Select the query conditions from the pull-down menu to the right. Then, in the blank field to the right, type a regular expression specifying the DNA sequence of interest; multiple base possibilities for a given position must be entered between square brackets (e.g. [cga]) and a period should be used to denote “any base”. Additional help regarding regular expressions is available here. Finally, click the “Gene View” button at the bottom of the main window to perform the search.
Find a gene by nucleotide content
Genes can be located based on their specific nucleotide content. First, enable “DNA Constraints: Nucleotides/ Dinucleotides/Codon Composition” (bottom of Sequence Filter window) by clicking on the associated square box. To add a single constraint, click the “Add” button; this will enter a default constraint, containing three parameters:
(1) nucleotide(s) (e.g. A)
(2) operator (<, >, <=, or >=).
(3) content (e.g. 5.00% of total nucleotide content.)
The default constraint, A < 5.00%, would find all proteins for which adenine represents 5.00% or less of their total nucleotide content.
Once the default constraint has been entered, the individual parts of it can be modified to specify a user-defined constraint. Additional constraints can be added to further narrow the search. To delete a query constraint, simply highlight the appropriate line by clicking on it and hit the “Delete” button below. Once all constraints have been entered, click on the “Gene View” button at the bottom of the main window to perform the search.
Find a gene by codon usage
(1) nucleotide(s) (e.g. A)
(2) operator (<, >, <=, or >=).
(3) content (e.g. 5.00% of total nucleotide content.)
Once the default constraint has been entered, the individual parts of it can be modified to specify a user-defined constraint. To search for a given codon, enter it in the left-hand column. Additional constraints can be added to further narrow the search. To delete a query constraint, simply highlight the appropriate line by clicking on it and hit the “Delete” button below. Once all constraints have been entered, click on the “Gene View” button at the bottom of the main window to perform the search.
Find a gene by notes or annotation source
Examples of searches:
Notes: “vascular endothelial growth factor”
Annotation Source: “GenBank”, “Manual”
Find protein motifs
First, enable the “Protein Sequence” query (found under the “Proteins/DNA Selector” heading in the Sequence Query window) by clicking on the associated square box. Select the query conditions from the pull-down menu to the right, then type a regular expression specifying the promoter sequence of interest in the white box. Protein sequences are denoted by their one-letter codes. Ambivalent amino acids for a single position can be entered between square brackets, e.g. [DE]; an unspecified amino acid is denoted as “.” (As an example, the PCNA-binding PIP-box motif can be found by typing Q..[ILM]..[FHY][FHY]). Finally, click the “Gene View” button at the bottom of the main window to perform the search.
Find proteins by molecular weight
First, enable the “MW” query (found under the “Molecular Weight/Isoelectric Point/AA Count” heading of the Sequence Query) by clicking on the associated square box. In the first white box to the right, enter the lower limit for the molecular weight, then enter the upper limit for the molecular weight in the next white box. Finally, click the “Gene View” button at the bottom of the main window to perform the search.
Find proteins by size
Proteins can be found based on their total number of amino acids. First, enable the “AA Count” query (found under the “Molecular Weight/Isoelectric Point/AA Count” heading of the Sequence Query) by clicking on the associated square box. In the first white box to the right, enter the lower limit for the total amino acid count, then enter the upper limit for the amino acid count in the next white box. Finally, click the “Gene View” button at the bottom of the main window to perform the search.
Find proteins by amino acid content
(1) amino acid (e.g. A)
(2) operator (<, >, <=, or >=).
(3) content (e.g. 5.00% of total amino acid content.)
The default constraint, A < 5.00%, would find all proteins for which alanine represents 5.00% or less of their total amino acid content.
Once the default constraint has been entered, the individual parts of it can be modified to specify a user-defined constraint. Additional constraints can be added to further narrow the search. To delete a query constraint, simply highlight the appropriate line by clicking on it and hit the “Delete” button below. Once all constraints have been entered, click on the “Gene View” button at the bottom of the main window to perform the search.
Find proteins by isoelectric point (pI)
First, enable the “pI” query (found under the “Molecular Weight/Isoelectric Point/AA Count” heading of the Sequence Query) by clicking on the associated square box. In the first white box to the right, enter the lower limit for the pI (or select one from the pull-down menu), then enter the upper limit for the pI in the next white box (or select it from the pull-down menu). Finally, click the “Gene View” button at the bottom of the main window to perform the search.
View all information about a gene
Click the “Gene View” button at the bottom of the main VOCs window after selecting a virus.
Select the gene of interest > View > Gene Details
Information is broken up into 4 tabs:
- Gene Data
- Sequences
- Protein Properties
- Gene Composition
The Virus Selector Window
Next, select the viruses of interest from either the Select or Do NOT select tabs (Cannot use both simultaneously). If the program window size is too small, click and drag any corner of the main VOCs window to expand it. Note: Make sure that the ‘Enable Selection’ box is checked off (bottom of virus selector window), or you will be unable to select any viruses in either tab!
Finally, click on one of the six buttons at the bottom to display or count sequences, ortholog groups, or genomes that meet your combined criteria.
Update the Virus Selector window
Note: only the Genome Filter search conditions (not Ortholog Group or Sequence search conditions) will be applied to the list of viruses!
Run a BLAST search on a gene or protein
To run a BLAST search, begin by opening the gene of interest in VOCs. Go to the Blast menu (in the Gene Results Table window) and select the search you wish to perform. VOCs supports five types of BLAST searches: tblastn, blastx, blastp, blastn, and psiblast. When you mouse-over the search of interest, a second list of options will open.
– You may choose to run the BLAST search either against the currently open VOCs database, or against the entire NCBI GenBank database. (Note that the second option will generally take substantially longer.)
– You may also choose to view the results in text, HTML, or table (for VOCs only) format.
– For VOCs searches, you may also choose to run MView, an algorithm which produces a multiple alignment of the query and resulting sequences. (Note: Running times for MView may be comparatively slow!)
After the BLAST search completes, a results window will open showing your BLAST results.
Create a multiple sequence alignment
VOCs is linked with BBB (Base-by-Base), our multiple alignment viewer. To create a multiple alignment of several genes or proteins, begin by selecting the genes of interest (by holding down the CTRL or Shift key) from the Gene Results Table.
– To align several genes from the same organism (e.g. VACV) choose the species from the Genome List. Click Gene View; this will open a list of all genes in VACV. Choose the genes of interest from this list. (This can also be done with multiple species of interest.)
– To align several genes from the same ortholog group, you can use the ortholog group selector on the Main Console.
- Find the name of the Ortholog Group(s) you are interested in, hit the align button for a protein alignment of all genes within the highlighted ortholog group(s) . To refine your parameters you can click View Family and this will open a list of all the genes in the selected ortholog groups. Choose the genes of interest from this list.
- Once you have selected your genes, go to the Alignment menu and choose either DNA sequence alignment, protein sequence alignment, upstream sequence alignment, or Genomic DNA sequence alignment. A variety of alignment algorithms (ClustalΩ, T-Coffee, MUSCLE, Needle, etc.) are available. You can also choose to open the unaligned sequences.
- Click on the alignment algorithm you wish to use. A window containing alignment parameters will open; adjust if desired, then click Continue. When the algorithm is finished running, a Base-by-Base window will open containing the alignment. (NOTE: For large alignments, particularly of genomic DNA, running time for this step can be lengthy; please be patient!)
Create a sequence alignment with concatenated genes
You may want to create a sequence alignment with multiple genes from the same virus concatenated. To concatenate several genes begin by selecting the viruses of interest on the right side of the main VOCs window. Click the OrtGrpView icon to open the Ortholog Groups window. From there, select the ortholog groups that you would like to be included in the concatenation (by holding the CTRL or SHIFT key). Once the ortholog groups have been selected click on the Analysis menu at the top of the screen and select the Concatenate Selected DNA or Concatenate Selected Amino Acids to perform the concatenation. The new sequences will be opened in the BaseByBase program with the ortholog group names appended to the virus names. The BaseByBase program can be used to perform various types of alignments.
Generate a Hydrophobicity Plot
For an example, suppose you are comparing four different members of the Membrane Glycoprotein protein family from Coronaviridae. As this protein contains three transmembrane domains, you want to generate a single graph comparing the hydrophobicity of all four family members. How can one do this in VOCs?
Open a table containing the genes of interest (in this case, this is most easily done by opening the “Membrane (matrix) Glycoprotein M” ortholog group in Coronaviridae. Select the genes you wish to graph by clicking on them and holding down CTRL or Shift key. Then choose “Hydrophobicity Plot” from the Analysis menu. A graph will open containing a plot of all four proteins, represented by lines of contrasting colours.
For more information on navigating in Hydrophobicity Grapher and adjusting parameters, click here.
Create a DNA Skew for several genomes or genes
You may at some point wish to compare several genomes or genes by visualizing their DNA base content graphically. For example, suppose you wish to create the following two graphs: 1) a purine skew [(C+T)-(A+G)] for SARS-Tor2, IBV-Beaudette, and TGEV-Purdue; 2) a DNA walk graph for the spike genes of all three genomes.
Creating a whole-genome DNA skew is simple. Open VOCs with the database of interest (in this example, Coronaviridae). From the Virus Selector list, choose the genomes of interest (SARS-Tor2, IBV-Beaudette, and TGEV-Purdue) by holding down the CTRL or Shift key and selecting with the mouse. Then choose DNA Grapher from the Tools menu. A new window will open with a purine skew (default graph) of the selected genomes.
Creating a DNA skew for single genes requires a bit more work. From the Ortholog Group filter on the main console, choose the family of interest (e.g. Spike glycoprotein) and click GeneView. A list of genes in the spike family will open. Select the three members of interest (SARS-TOR2-019, IBV-Bea-018, and TGEV-PUR-019), by holding down the CTRL or Shift key and selecting with the mouse. Then choose DNA Grapher from the Analysis menu. A new window will open with a graph of the three spike genes. Since the default graph type is a purine skew, you will have to manually change the graph type to a DNA walk. From the View menu in the graph window, select “Graph Type” and then “DNA Walker”.
Create a Table of Genes and Selected Statistics
Suppose you wish to create a table of all genes in Vaccinia Virus Copenhagen (VV-Cop), their Genbank name, ortholog group name, molecular weight, pI, number of amino acids, and A+T%, and to sort the table by molecular weight before exporting it. How can you go about doing this?
Begin by selecting the virus of interest (in this case, Vaccinia virus strain Copenhagen) from the Virus Selector window on the main console. Click GeneView to bring up the gene results table.
In the gene results window, choose “Selection” from the Column menu. A list of data available for display will open. Check off all boxes for columns you wish to display; uncheck all boxes for columns you are uninterested in. Some (e.g. Gene number, ORF start/stop, etc.) will be displayed by default when the table opens, unless you have previously changed your default display preferences. (You can set the default display preferences to whatever columns are currently selected by choosing “Save default settings” from the File menu.)
If there are many boxes already checked, it might be easier to unselect all before choosing the ones you wish to display. To do this, choose “Unselect all” from the Select menu in the list window (not the Result Table window!)
Once you have made your selections the Gene Results Table will now have the columns you’re interested in (You may close the ‘Selection’ window). Click on the heading above any column to sort the table. For example, click on the Molecular Weight column to sort from smallest to largest molecular weight. (Click again to sort from largest to smallest.)
Finally, choose “Write all to File” from the File menu. A tab-deliminated plain-text file will be created in the location you choose. This file can be opened in a plaintext editor (e.g. Notepad, Wordpad), in a text editor like Microsoft Word, or a spreadsheet editor like Microsoft Excel.
To open the file in Excel, select “Open” from the Excel File menu and change the “Files ot type” setting to “Text Files.” (Note: You may first have to manually append the .txt ending to the exported file if your computer does not recognize it as a plaintext file.) Select the file name, click Open, and follow the instructions in the Text Import wizard to import your file into Excel.
Create a FASTA file of sequences
Suppose you want to create a single FASTA file containing the upstream sequences for every predicted ORF in the Baculovirus Maruca vitrara MNPV and a second file containing the corresponding protein sequences. Since there are a large number of genes (over 125!), this would be an extremely tedious task to do manually. Fortunately VOCs allows you to obtain these wp-content/uploads/2018/12 in a matter of seconds.
Open the Baculovirus database in VOCs. From the Virus Selector list, choose Maruca vitrata MNPV and click GeneView. A table will open containing all the genes in this virus, sorted by gene number (corresponding to ORF start position). (NOTE: You can of course re-sort the table by clicking on any column header; however, the file you create will always be sorted by gene number.)
Click and drag to select all the rows in the table or hit CTRL A to select all. Then, from the Sequence menu, choose Save…Upstream Sequences. You will be asked to select a file location and name. A FASTA file will be created containing the upstream sequences. Note.
To create the second FASTA file with the corresponding protein sequences, repeat the above procedure but this time chooseSave…Protein Sequences. You can also create similar wp-content/uploads/2018/12 containing Genomic and Gene sequences.
Note: although wp-content/uploads/2018/12 will be created in the FASTA format, your computer may not recognize them as such. If this happens, simply append the .fasta or .txt file designation to them as required.
Obtaining the GenBank file for a genome
To obtain the most recent copy of the GenBank file for a genome, simply choose the genome from the Virus Selector List. Then choose “Fetch and View Genbank file” from the View menu. VOCs will find the corresponding GenBank file at NCBI and load it into a new window. From here you can edit the file if wished and save it to your local computer. It is also possible to export a GenBank file or other formats via the Export menu.