Genome-based Peptide Fingerprint Scanning (GFS) Documentation

Genome-based Peptide Fingerprint Scanning (GFS) Documentation

Main Help Page


Genome-based fingerprint scanning (GFS) is developed by the Giddings lab in University of North Carolina.
The program identifies the genomic origins of sample proteins by mapping their peptide mass fingerprint data directly to raw genomic sequence. The advantage of this technique is that it does not require predefined ORF or protein annotations. The program inputs an experimentally obtained peptide mass fingerprint and a genome sequence of interest, and outputs the most likely regions of the genome from which the mass fingerprint is derived.
The Upton lab developed this JAVA interface to GFS and is grateful to the Giddings lab for access to the software and their help. This version of GFS should only be used with viral genomes (or similar sized DNA sequences) at our website (It can be used with larger genomes at the Giddings Lab site).
GFS is also integrated into our VOCs and VGO tools. A viral genome in our database will be shown on the GFS interface by selecting a virus from a menu through one of these tools. When using GFS from VGO, the results are plotted on a graphical genome view.

GFS Input Windows

The GFS Java User Interface – data input

The software first generates a theoretical mass list by translating the genome of interest in 6 reading frames (3 each on the forward and reverse strands) and digesting the resulting proteins in silico according to cleavage rules associated with the specified protease (trypsin in this case).  The algorithm then finds matches (within a given mass tolerance) between these theoretical masses and the input experimental masses.  These matches (or hits) are grouped into high-density regions on the genome that can be scored according to a number of criteria and ranked by statistical significance. These regions are derived by scanning across the genome with a fixed-sized window and then each window is scored according to criteria detailed in the original GFS publication.  The subset of windows whose scores exceed a window score cutoff are then selected for an additional scan (subject to a separate extension score cutoff) that extends the start and stop position of the window into a larger region.
This is the window that appears when the GFS Java User Interface is launched:

  • Mass List
    Input the mass spectrometry data. You can either choose to paste your data (newline or tab-delimited) in the text field provided, or choose a delimited (space, newline, comma, or tab) file from your computer for upload.
  • Input Sequence
    Input the genome sequence that GFS will scan.

The Input Options will be shown in the External Application Preferences window after clicking the “Continue” button or “File->External Application Preferences” menu.

  • Isotopic Mode
    The mass type (monoisotopic-“m” or average-“a”) of experimental masses. Default is monoisotopic.
  • Max. Missed Cleavage
    The number of allowed maximum missed trypsin cleavages. The highest value is 2.
  • Window Size
    Size of scanning window (in nucleotides). Default is 500.
  • Delta Percent Matching Tolerance
    The tolerance for matching in silico peptide masses to experimental masses. Expressed as a percentage of the peptide mass. Specify the approximate mass accuracy of the MS instrument used to generate the experimental data. In our experience it is often better to use a value slightly greater than the stated mass accuracy of the instrument. Default value is 0.01 (i.e., 0.01 % or 100 ppm).

GFS Output Windows

The GFS Java User interface – viewing the GFS results

The top table shows the hit regions.

  • Name: The sequence name that GFS have scanned.
  • Start: The start at the hit region.
  • Stop: The end at the hit region.
  • # Matches: The number of hits in the region.

The bottom table shows the matches in one hit region. The updated matches can be shown for the hit region after clicking on an row in the top table.

  • T-Mass: theoretical mass.
  • E-Mass: Experimental mass.
  • DP: Delta Percent Matching Tolerance.
  • Start: The start position of the fragment.
  • Stop: The end position of the fragment.
  • MC: Maximum missed cleavages.
  • Frame: The frames are 0, 1 or 2.
  • AA Seq: Amino Acid Sequence.
  • Frag Seq: DNA Sequence.

The Amino Acid or DNA sequence will be displayed below the table if you double click on an “AA Seq” or “Frag Seq”.