(taken from the software documentation)
A pivotal step in electrophoresis sequencing is the conversion of the raw, continuous
chromatogram data into the actual sequence of discrete nucleotides, a process referred
to as basecalling. The accuracy of the computational algorithm employed for basecalling
directly impacts the quality of the resulting sequence and determines its usability
for in-silico SNP detection.
Here we describe a novel algorithm for basecalling implemented in the program LifeTrace.
Like Phred, currently the most widely used basecalling software program, LifeTrace
takes processed trace data as input. It was designed to be tolerant to variable peak
spacing by means of an improved peak detection algorithm utilizing local chromatogram
information rather than global properties.
LifeTrace is shown to generate high-quality basecalls and reliable associated quality
scores. It proved particularly effective when applied to MegaBACE capillary sequencing
machines. In a benchmark test of 8,372 dye-primer MegaBACE chromatograms, LifeTrace
generated 17% fewer substitution errors, 16% fewer insertion/deletion errors, and
generates 2.4% more alignable bases to the finished sequence than Phred.
For two sets totaling 6,624 dye-terminator chromatograms, the performance improvement
was 15% fewer substitution errors, 10% fewer insertion/deletion errors, and 2.1% more
aligned bases. The processing time required by LifeTrace is comparable to Phred's.
The predicted quality scores were in line with observed quality scores permitting
a direct use for quality clipping and in-silico SNP detection.
Furthermore, we introduce a new type of quality score associated with every basecall:
The gap-quality. It estimates the probability of a deletion error between the
current and the next following assigned basecall; i.e. another true base between
the two was not called. This additional quality score allows a better detection of
single basepair deletions as it helps locating potential basecalling errors during
the alignment.
We also describe a new protocol for benchmarking basecaller performance that we believe
discerns better the performance differences and may find broad application in future
benchmark tests.
lifetrace -------------------------------------- -g -c <chromatogram file> -s <seq_search_string> -phd -lt -md <display> -d <abi_dump_file> -rc -a <seqID> -loc <location> -pos <seq_position> -out <file_name> -v -prob -bg <color> -f <gif_file_name> -ASC -ABI -wphd <phd_file_name> -wdphd -if <file_name> arguments -------------------------------------- -g graphical output. -c <chromatogram file> chromatogram file may be gzipped (*.gz). -s search for sequence <seq_search_string> (sequence as in chromatogram-file) and display centered and magnified at this position. Best match in the sequence is chosen. -phd include phred base calls. -lt include lifetrace base calls. -md set display to your machine (example: -md todlr:0.0) be sure to grant permission to the remote machine (xhost+<remote host>). -rc generate reverse complement. -a look-up seqID from a fasta-format index file, set: setenv CHROM_INDEX <idx_file>. -loc show centered at <location> with a window of 100 trace points. -pos show centered at <seq_postion> (according to seq. contained in chromatogram). -LTpos as [-pos], but LifeTrace position instead. -out <file_name> write sequence to file_name.fasta and q-scores to file_name.qscore.fasta. -wphd <phd_file_name> write phd-file style output to phd_file_name. -wdphd write phd-file style output to default name (<chrom_name>.phd.1). -v verbose. -prob save base probabilities to file file_name.base_prob. -bg set background to color <color>. -bg generate gif-file <gif_file_name>, requires Java progr. JTrace. -nogapq don't draw gap-qscores. -ASC output numbers in ASCII coding. -ABI write out calls contained in the chromatogram. -LOC write out lifetrace peak locations. -3700 quality score calibration for 3700, 377 and MegaBace should be recognized automatically. -j separate out gap-quality score and draw it. -if <file_name> read input chromatogram filenames from file <file_name>. -pd <dirname> write *.phd.1 file(s) to <dirname>. mouse actions -------------------------------------- left: set a flag middle: shift+left: chromatogram right: scale in x-direction shift+right: scale in y-direction Close window by clicking into the upper-left corner of the display window Ticks indicate peak locations with lengths corresponding to quality scores. Horizontal lines correspond to q-scores of 0 and 15, respectively. Example: lifetrace -g -loc 2000 -c example_chrom.bin -phd
NOTE: The idea of a gap-quality score is a newly introduced concept by
LifeTrace. Because existing post-processing code is not prepared to handle two quality
scores per base, we decided to allow folding the gap-quality score into the regular
base-quality score. This combined quality score is the default setting for LifeTrace
(not for Windows version).
To fully harness the benefits of the gap-quality score
the "-j" flag has to be set in the command line. As a result, individual base quality
scores can be higher as they are not lowered by low gap-quality score assignments
Option [-out file_name_stem] Example: -out lt_test will create files: lt_test.fasta = LifeTrace called sequence lt_test.qscore.fasta = LifeTrace quality scores lt_test.gap_qscore.fasta = LifeTrace gap-quality scores if in addition -LOC option is used: lt_test.lt.loc.fasta = all peak locations With -phd option (also call with phred) the files lt_test.phd.fasta = phred called sequence lt_test.phd.fasta.qual = phred quality scores will be created. All output files are in fasta format.
Folker Meyer, fm@Genetik.Uni-Bielefeld.DE
Version: 1.2, Feb 2002
Authors: D.Walther, G. Bartha, M.Morris
Incyte Genomics, Inc.
dwalther@incyte.com
affiliation Bielefeld University»CeBiTec»
Basecalling with LifeTrace.(2001) Genome Res 11: 875-888