Sequence Searcher

Sequence Searcher


NOTE: This tool requires Java Version 8. Download it here: Java Runtime Environment.

Sequence Searcher is an easy-to-use Java tool for searching protein and DNA sequences for user specified sequence motifs.  The Sequence Searcher can search multiple sequences in a single pass.  The target sequences may be imported from your computer in FASTA format or you can manually paste the sequence(s) into a text window.
With this program, you can:
  • search both strands of the nucleotide sequence
  • search protein or nucleotide sequences
  • search one or more sequences at a time
  • search for:
  • results (all or only a selection of them) can be saved to a file
  • choose how to organize your search results (by result, confidence or start/stop).
The Sequence Searcher tool (SeqS) is also built into VOCS.
For a quick How-To for the integrated search just click here.

How the search works

Sequence Searcher looks for every occurrence of the pattern in the target sequences. In the case of nucleic acids, the top strand will be searched in the 5′ -> 3′ direction. Then, the bottom strand will be searched in its 5′ -> 3′ direction for the same pattern. The bases are numbered from 1 at the 5′ end, to the length of the sequence at the 3′ end. See the image below for an example. Searching on an amino acid sequence is straightforward: the sequence is searched for the specified pattern from its start to its stop.

Performance and Limitations

Sequence Length

Sequence Searcher imports sequences approximately at 1 Mbp per second. The program was tested on an older  G5 iMac  (768MB RAM; 1.6GHz)  and found to support sequences for a total of 170 Mbp. This limit is set by the memory of the Java virtual machine and could be increased if required.

Search speed

The following tests were done using a newer Intel iMac (4GB RAM; 2.4 GHz Core 2 Duo).

Fuzzy search
Performance
Search Sequence Number of mismatches Time to execute (sec) Number of hits
ACGTACGT 0 2 216
ACGTACGTA 0 2 44
ACGTACGTA 1 3 1808
ACGTACGTA 2 3 22124
ACGTACGTACGT 0 1 0
ACGTACGTACGT 1 3 24
ACGTACGTACGT 2 3 672
ACGTACGTACGTACGT 3 4 80

An exact search (pattern: ACGATCGATC; no mismatches) on five sequences (for a total of 106.48 Mbp) takes approximately 18 seconds. The higher the number of mismatches, the longer the search takes. The same search with 1 mismatch allowed took 83 seconds; with 3 mismatches it took 147 seconds to find 744068 results. Due to the very high number of results, the same Fuzzy search with 4 mismatches used all the Java Virtual Machine memory and could not be completed.

Regular expression search
Search Sequence Time to execute (sec) Number of hits
ACGTACGT 1 216
ACGTACGTA 1 44
ACG[TA]ACGTA 1 108
ACG[TA]ACGT[AC] 1 244
ACG[TA]A[CGTA]GT[AC] 2 948
ACG[TA]A.G{1,3}T[AC] 2 1272
ACG[TA]A.{1,50}G{1,3}T[AC] 3 23991


Getting Started


If you’re new to Sequence Searcher, click on the launch button (on the right) and use the Quick Start Page to learn the basics (or if you’re like us…  just start clicking!).

The VBRC also provides additional help resources for Sequence Searcher:


References


If you use this resource please cite the relevant papers (publication list).


Troubleshooting


If your system does not launch Viral Orthologous Clusters, it is probably missing Java Runtime Environment 1.8. Please see Java Web Start Setup and Java Web Start Tips for help.


Submit a feature request


Is there a feature that you think this tool needs? Submit a wish.