4.1.A the lifetrace basecaller


(taken from the software documentation)

A pivotal step in electrophoresis sequencing is the conversion of the raw, continuous chromatogram data into the actual sequence of discrete nucleotides, a process referred to as basecalling. The accuracy of the computational algorithm employed for basecalling directly impacts the quality of the resulting sequence and determines its usability for in-silico SNP detection.

Here we describe a novel algorithm for basecalling implemented in the program LifeTrace. Like Phred, currently the most widely used basecalling software program, LifeTrace takes processed trace data as input. It was designed to be tolerant to variable peak spacing by means of an improved peak detection algorithm utilizing local chromatogram information rather than global properties.

LifeTrace is shown to generate high-quality basecalls and reliable associated quality scores. It proved particularly effective when applied to MegaBACE capillary sequencing machines. In a benchmark test of 8,372 dye-primer MegaBACE chromatograms, LifeTrace generated 17% fewer substitution errors, 16% fewer insertion/deletion errors, and generates 2.4% more alignable bases to the finished sequence than Phred.

For two sets totaling 6,624 dye-terminator chromatograms, the performance improvement was 15% fewer substitution errors, 10% fewer insertion/deletion errors, and 2.1% more aligned bases. The processing time required by LifeTrace is comparable to Phred's.

The predicted quality scores were in line with observed quality scores permitting a direct use for quality clipping and in-silico SNP detection.

Furthermore, we introduce a new type of quality score associated with every basecall: The gap-quality. It estimates the probability of a deletion error between the current and the next following assigned basecall; i.e. another true base between the two was not called. This additional quality score allows a better detection of single basepair deletions as it helps locating potential basecalling errors during the alignment.

We also describe a new protocol for benchmarking basecaller performance that we believe discerns better the performance differences and may find broad application in future benchmark tests.

4.1.A.2 USAGE

	-g -c		<chromatogram file>
	-s		<seq_search_string>
	-phd -lt -md	<display>
	-d		<abi_dump_file>
	-rc -a		<seqID>
	-loc		<location>
	-pos		<seq_position>
	-out		<file_name>
	-bg		<color>
	-f		<gif_file_name>
	-wphd		<phd_file_name>
	-if		<file_name>
	-g	graphical output.

	-c	<chromatogram file> chromatogram file may be gzipped (*.gz).

	-s	search for sequence <seq_search_string> (sequence as in chromatogram-file)
		and display centered and magnified at this position. Best match in the 
		sequence  is chosen.

	-phd	include phred base calls.

	-lt	include lifetrace base calls.

	-md	set display to your machine (example: -md todlr:0.0) be sure to grant 
		permission to the remote machine (xhost+<remote host>).

	-rc	generate reverse complement.

	-a	look-up seqID from a fasta-format index file, set: setenv CHROM_INDEX 

	-loc	show centered at <location> with a window of 100 trace points.

	-pos	show centered at <seq_postion> (according to seq. contained in chromatogram).

	-LTpos	as [-pos], but LifeTrace position instead.

	-out	<file_name> write sequence to file_name.fasta and q-scores to

	-wphd	<phd_file_name> write phd-file style output to phd_file_name.

	-wdphd	write phd-file style output to default name (<chrom_name>.phd.1).

	-v	verbose.

	-prob	save base probabilities to file file_name.base_prob.

	-bg	set background to color <color>.

	-bg	generate gif-file <gif_file_name>, requires Java progr. JTrace.

	-nogapq	don't draw gap-qscores.

	-ASC	output numbers in ASCII coding.

	-ABI	write out calls contained in the chromatogram.
	-LOC	write out lifetrace peak locations.

	-3700	quality score calibration for 3700, 377 and MegaBace should be recognized 

	-j	separate out gap-quality score and draw it.

	-if	<file_name> read input chromatogram filenames from file <file_name>.

	-pd	<dirname> write *.phd.1 file(s) to <dirname>.

	mouse actions
	left: set a flag middle: 
	shift+left: chromatogram  
	right: scale in x-direction 
	shift+right: scale in y-direction

	Close window by clicking into the upper-left corner of	the display window
	Ticks indicate peak locations with lengths corresponding to quality scores.  
	Horizontal lines correspond to q-scores of 0 and 15, respectively.	
     	Example: lifetrace -g -loc 2000 -c example_chrom.bin -phd

NOTE: The idea of a gap-quality score is a newly introduced concept by LifeTrace. Because existing post-processing code is not prepared to handle two quality scores per base, we decided to allow folding the gap-quality score into the regular base-quality score. This combined quality score is the default setting for LifeTrace (not for Windows version).

To fully harness the benefits of the gap-quality score the "-j" flag has to be set in the command line. As a result, individual base quality scores can be higher as they are not lowered by low gap-quality score assignments

4.1.A.3 OUTPUT

	Option [-out file_name_stem]
	Example: -out lt_test will create files:

		lt_test.fasta = LifeTrace called sequence
		lt_test.qscore.fasta = LifeTrace quality scores
		lt_test.gap_qscore.fasta = LifeTrace gap-quality scores
	if in addition -LOC option is used:
		lt_test.lt.loc.fasta = all peak locations

	With -phd option (also call with phred) the files

		lt_test.phd.fasta = phred called sequence
		lt_test.phd.fasta.qual = phred quality scores

	will be created.

	All output files are in fasta format.

4.1.A.4 the AUTHOR

Local maintainer

Folker Meyer, fm@Genetik.Uni-Bielefeld.DE


Version: 1.2, Feb 2002


Authors: D.Walther, G. Bartha, M.Morris
Incyte Genomics, Inc.


affiliation Bielefeld University»CeBiTec»

See also:

Basecalling with LifeTrace.(2001) Genome Res 11: 875-888