How do I assemble overlapping Sanger reads to create a single contiguous sequence?
The SnapGene "Assemble Contigs" tool uses the CAP3 assembler to assemble reads into one or more contiguous assemblies.
This tool is designed primarily for assembly of a small set of Sanger reads, all derived from the same clonal source, and all of which are expected to overlap to form a contiguous sequence.
Edit Sanger Reads (optional)
De novo assembly of overlapping Sanger traces may be more reliable if you edit and correct Sanger reads prior to assembly. To do this first, see Edit a Sanger (.ab1, .scf or .ztr) Trace File.
Assemble Contigs
Click Tools → Assemble Contigs... to open the "Assemble Contigs" window.
Alternatively, open the folder containing the reads as a project in the side panel.
Select the reads in the side panel and click Assemble Contigs....
Add Reads
To add reads for assembly, click the dropdown and choose Import Sequences to Assemble → Import Sequence Files.
Alternatively, drag and drop Sanger trace files (.ab1 format) into the window to import them.
As a general rule you should only use this tool to assemble reads derived from the same clonal source
Assemble the Sequences
Ensure the option to "Trim low-quality ends of sequences before running CAP3" option is checked.
Provide a name for your assembly.
Click the Browse button and set the location to save the assembly.
Click Assemble to run the CAP3 assembler.
Trimming by SnapGene is performed by the same algorithm as that used when hiding chromatogram ends at medium stringency, except low quality ends are removed rather than hidden prior to passing the reads to the CAP3 assembler (see Set the Default Stringency for Hiding Chromatogram Ends)
Note that the CAP3 algorithm may separately trim reads based on quality.
A Settings... button is provided to allow users to alter the default CAP3 assembler settings. However, unless you are familiar with CAP3 we recommend you DO NOT alter these settings.
View the Assembly
The new unsaved sequence file opens as an "Alignment to reference" with the Alignment side panel displayed.
The reference sequence (labelled Original Sequence in Sequence view) is the alignment consensus created by the CAP3 assembler.
In Map view all aligned reads will be depicted above the consensus as arrows. To learn more see How do I Interpret the "Align to Reference" Map View?
If reads assemble into two or more separate contiguous sequences (contigs) then multiple files will be created and saved in a new folder. The folder will be opened in the side panel as a project.
Verify the Assembly
Switch to Sequence view to view and edit the assembly.
- The top panel shows the initial CAP3 consensus (Original Sequence).
- The bottom panel shows the CAP3 consensus (Original Sequence) and the aligned trace sequences within the field of view.
Click the right "Jump" triangle in the bottom panel to jump to the first mismatch or gap discrepancy in the alignment.
View the Trace Sequence Chromatogram
Click on the disclosure triangles to expand each trace view. Option-click (macOS) or ALT-Click (Windows/Linux) on any disclosure triangle to expand all traces simultaneously.
Determine the cause of the disagreement, in the example above a compression in one read has resulted in two A peaks being called as a single A.
The expanded trace view provides controls and information:
- The colored arrow indicates the read orientation in the assembly.
- The "Show trace data" button shows peak and quality information (if present).
- The "Show sequence with annotations" shows features associated with the reference sequence (if present).
- The slider allows adjustment of peak heights.
- The summary provides details of the alignment and any discrepancies.
Correct Miscalled/Disagreeing Errors in Traces
Select the sequence to be corrected and type to add sequence, or hit delete to remove sequence. In this example, the sequencer has miscalled a compressed AA double peak as a single A, so we have selected the gap and typed "a".
Click Insert to accepted the insertion of an "a".
Click the right "Jump" triangle in the bottom panel to jump to the next mismatch or gap in the alignment, continue editing until all reads are in agreement. Add IUPAC ambiguity codes if peaks are mixed, see View or Use IUPAC Ambiguity Codes
Use lower case when manually editing so that the edits can be identified at a later date.
Update and Save the Verified Sequence
After correcting miscalls or gaps in the reads it is likely that the reads no longer agree with the "Original Sequence (the initial CAP3 consensus).
To replace the "Original Sequence" with the corrected sequence defined by the edited reads, select all reads then click Aligned Sequences → Replace Original with Aligned → Update this File.
Click menu File → Save to save the verified/revised sequence file to an appropriate location on your computer.