Regular Expression, Fuzzy Motif Searches, and GFS Searches


Regular Expression, Fuzzy Motif, GFS Searches


Regular Expression Search

How to perform a regular expression search on a sequence.
You can do a regular expression search from the sequence display window; select Reg. Expression Search from the Analysismenu. The following dialog will then appear.
Enter your regular expression in the textbox, and click on the OK button to perform the search, or the Cancel button to close the dialog window without performing the search.
Regular expressions allow one to search for of precise patterns which may include optional sections and/or repeated sequences. For detailed help on regular expressions, please see The Perl Regular Expression page for more information.
Examples of Regular Expression Searching

Regular Expression What it matches
ACT ACT
[ AC ]T A or C followed by T
AC[ ^T ]ACT AC followed by anything BUT a T followed by ACT
ACT* AC followed by 0 or more T’s
(ACT)* ACT repeated 0 or more times
(ACT)+ ACT repeated 1 or more times
(ACT)? ACT repeated 0 or 1 times
(ACT){n} ACT repeated n times
(ACT){n,} ACT repeated at least n times
(ACT){n,m} ACT repeated at least n times but not more than m times
((AC)[ TA ]){n} AC followed by T or A – repeated n times

Fuzzy Motif Search

How to do a fuzzy motif search on a sequence.
You can do a fuzzy motif search from the sequence display window Select Fuzzy Motif Search from the Analysis menu. The following dialog will then appear.
Fuzzy Motif Dialog
Enter your fuzzy motif in the top textbox and enter the number of mismatches to allow in the lower textbox. Click on the OKbutton to perform the search, or the Cancel button to close the dialog window without performing the search.
The Fuzzy Motif Search allows users to enter in an expression pattern (see below for an explanation of the pattern grammar used) as well as a maximum number of mismatches tolerated in a search hit. VGO then searches marked sequences for this motif and displays the list of hits by location along the sequence. In addition to the ambiguities created by mismatches, users may enter in IUB ambiguity codes, which are also indicated below.
Examples of Fuzzy Motif Searching

Fuzzy Expression What it matches
ACT an A, C, T pattern
[ AC ]T an A or a C followed by a T
{AC}T Everything but an A or a C followed by a T

Note: When counting mismatches, [] and {} count as a single match or mismatch. As well, if matching T(2,4) and only 1 T is found, this counts as a single mismatch.
Table of IUPAC Ambiguity Codes

IUPAC-IUB/GCG Code Meaning Complement
A A T
C C G
G G C
T/U T A
M A or C K
R A or G Y
W A or T W
S C or G S
Y C or T R
K G or T M
V A or C or G B
H A or C or T D
D A or G or T H
B C or G or T V
X/N G or A or T or C X
. Not G or A or T or C .

GFS Search

How to do a GFS search on a sequence.
You can do a GFS search from the sequence display window Select GFS from the Analysis menu. The following dialog will then appear.

Enter your mass list in the textbox, or click on the Load Mass List File button and select a file that contains the list of masses. When you are ready, click on the OK button to perform the search, or the Cancel button to close the dialog window without performing the search. You may see another dialog window depending on your preferences where you can enter the parameters for the GFS search.