PECAAN Workflow Description

Join the Nuts

PECAAN Acknowledgements

Contact Information

PECAAN Workflow Overview

Phage setup in PECAAN

Initially, DNA Master is used to call genes using it’s built in Glimmer and GenMark engines. Two files from DNA Master are used as input into PECAAN: the Internal Start Analysis (Fig. 1) derived from the “Analyze All Gene Starts” option under the “Genome” menu and a text copy of the “Documentation” output (Fig. 2).

Figure 1Workflow

Figure 2

The sequence comes from a fasta file. Additional input files from Starterator and GenMark output can also be attached to PECAAN during the initial setup (Fig. 3). Upon addition of a new genome, PECAAN does BLAST searches of the Phagesdb and NCBI databases. It also searches HHPRED and the Conserved Domain Database. All of this information is stored in the database for display upon selection of the “Genes” menu in PECAAN (Fig. 4).

Figure 3Phage Input

Figure 4
Genes Window

Accessing a Phage in PECAAN

Stored phages are selected from the “Summary” menu in PECAAN (Fig. 5) and genes are selected for annotation in the “Genes” menu (Fig. 4). Once a gene has been selected the sequence of its DNA and protein can be viewed and copied by selecting the “Sequence” menu (Fig. 6).

Figure 5.Summary

Figure 6.Sequence

The annotation process in PECAAN

During the annotation process three questions need to be answered: 1. Is this a valid gene? 2. What is the correct translation start site? 3. What is the function and it’s associated evidence? 1- The “Gene Inclusion” check box and the “Add Gene” button at the top of the “Genes” display (Fig. 5) allows called genes to be unattached or added to the genome. 2- The start site evidence for Glimmer, Genemark, Starterator and scores for all of the start sites is found in the Gene Candidates table (Fig. 4). The start site is selected by the checkbox on the right of the Gene Candidates table. Verification of inclusion of all of the external host-specific GenMark coding capacity is done with the adjacent drop down box. Whenever a new start site is selected, PECAAN automatically updates the evidence tables for Phagesdb, NCBI, HHPRED, and the Conserved Domain Database. 3- The function annotations begin by filling in the function box and then putting checkmarks by the evidence supporting the functions from each of the databases: Phagesdb, NCBI, HHPRED, and the Conserved Domain Database. The evidence from each of these databases can be filtered to bring relevant functions to the top of the list. This facilitates a quick comprehensive search of the other databases once a function has been found in one database without the need to tab through all of the pages. Once the gene start and function have been set, any changes that have been made are recorded in a log, along with the notes and the annotator’s name, by clicking on the “Mark as Checked” button at the bottom of the “Genes” screen (Fig. 4). Checking the correctness of the start and function calls are easily done by others and their verification is recorded in the change log by clicking the “Mark as Checked” button.

PECAAN export files

Several reports are available under the “Export” menu (Fig. 7). The “Export CDS Function” and the “Export CDS Full Annotation” files can be copied and pasted into the “Documentation” screen of DNA Master as a replacement of the existing text. By pressing the “Parse” button in DNA Master’s “Documentation” screen, all of the annotation changes made in PECANN are programmed back into the DNA Master database, including frameshift and tRNA calls. A sample of the “Export CDS Full Annotation” file is shown in Fig. 8. All of the “/note=…” documentation will be imported into the notes section of DNA Master’s “Features” screen. This makes the output of PECAAN portable and available for future analysis. A future feature will also add the option to output a full GenBank formatted file.

Figure 7.CDS Function Export

Figure 8.CDS Full Annotation Export File