A project in SEQtools is simply a collection of one or more sequences of the
same type (nucleotide, protein or primer). It is not possible
to include different sequence types in the same project. If you wish to create a project
from more that one sequence file, all files to be loaded must be located in the same
folder.
You can add more sequence files to an existing project from other
folders. In most cases SEQtools will auto-detect both the file type (nucleotide, protein
or primer), sequence format (SEQtools, embl, fasta, genbank, etc.) and file format (single,
trace, multi-sequence - or a mixture of the three) and create the project from the selected
files without your intervention.
Saving a project is most conveniently done by using the standard SEQtools multi-sequence format
which saves all sequences in the project in a single file (with or without compressing the
file).
The file menu contains the following menu items (described in more detail in
separate sections below):
SEQtools issues a warning before closing the current project offering to save the sequences. Closing a project without saving the data will cause irreversible loss of editorial changes to the sequences as well as all information added to the sequence headers.
Sequence files to be included in a project can be selected in different ways as
indicated in the screenshot of the Open Sequence Files menu shown below.
SEQtools attempts to determine sequence type and format and file format before loading the data into a new project. In most cases this does not require user intervention provided all sequences to be loaded are of the same type (nucleotide, primer or protein).
The project type (nucleotide, primer or protein) is determined by the first sequence loaded. If a sequence of a different type is encountered a warning is issued and loading is interrupted.
SEQtools recognises and loads four sequence formats either as single sequence files or as collections of sequences in multi-sequence files: SEQtools, EMBL, Genbank and Fasta
Before the file selection form is loaded the Project Preferences form
is opened to enable you to give the project a title and to set various parameters for the new
project.
The File Selection form is used to select the sequence files for the
project. A drive list box and a file list box allows you to navigate between
drives and directories to locate the sequence files you wish to include in the project.
The top file list contains all files in the selected directory. The bottom file list shows
the files currently selected for loading.
Files are selected from the directory file list by pointing or dragging the mouse pointer to highlight one or more file names. A discontinuous series of files is created by holding down the <CTRL> key while clicking the filenames to be included in the project. Clicking the Add Files command button activates the selection. File names can be removed from the list of selected file names by clicking the file name.
Files with the following extensions (cab, log, fof, exe, ini, sys, com, hlp, bat, oof, cof, msg, cut, cod, lst, zip, dat, qscore.fasta, gap_qscore.fasta) cannot be selected and loaded into a project unless the Options/File Exclusion Enabled/Disabled option is set to File Exclusion Disabled.
It is possible to add a case-insensitive filter to the selection by typing characters in the text field. Only files which include or do not include - depending on the selected option - these characters in their file names will be selected/deselected when the Add To List command button is clicked.
When the auto-backup option is active (Preferences/Project
Settings/Timed Backup) a complete backup of all sequences and sequence headers of
the project is saved - at the specified time interval - to a multi-sequence file (*.fms)
located in the main application folder (normally c:\SEQtools 8.3\BackupData\).
If you need to load a backup copy of a previous project select the Load project
backup file(s) option on the load form to set the path to this folder and load the *.fms
multi-sequence file into a new project.
If you are loading more than 300 sequences into a project, SEQtools offers to turn off the
timed backup function. This function is often not required for large projects and turning it
off saves resources for processing other functions.
When selection is completed, clicking the Load Files command button causes the selected files to be loaded into the specified project. It is not possible to select the same file twice nor is it possible to select files from different directories when a new project is created. Additional files can be added to the project later.
If you already know that the sequences to be loaded are contained in a multi-sequence file (SEQtools, Genbank or Fasta format) just select the Multi-Sequence Files... menu item. This opens a standard Windows file dialog box for selecting the multi-sequence file. The file selection form is not loaded in this case.
It is possible to select and load a mixture of normal single files and multi-sequence files.
When sequence loading is completed and a new project created SEQtools
displays a summary of the annotation (primarily a list of blast search results) available
for the loaded sequences. This is described in more detail under 4.8 Header menu
and its sub-items.
SEQtools auto-detects if the file to be loaded is a chromatogram produced by an
automated sequencer. Extraction of the plain DNA sequence from the trace file is, by default,
carried out by the convert_trace program from the Staden package while viewing the
traces is done by Chromas (see screenshot below).
The link between the extracted sequence and the chromatogram is the Long
Filename of the sequence and the path to the trace file folder set in
Preferences/Project Settings/Trace File Folder.
Provided this association is intact the chromatogram can be retrieved later
and viewed with the Chromas program.
To maintain this connection it is important that the long sequence name is not changed in SEQtools. If you alter the long file name for a sequence, the link is broken and can only be re-established if you enter the name of the trace file corresponding to the SEQtools sequence again.
If you want to check a certain position in your sequence against the
chromatogram, highlight the region in the main SEQtools editor and press CTRL+C
to copy the region to the clipboard. The highlighted region in the sequence is
coloured blue to facilitate locating it.
In Chromas, click Edit/Find... to display the search form. Press CTRL+V to
paste the selected region of your sequence into the search form of Chromas and
click Find. SEQtools removes spaces, CR, LF, and numbers from the selected region,
so it does not matter if your selection spans two lines.
The advantage of keeping SEQtools formatted sequences and the original trace files separate is that all SEQtools functions, including automated annotation for example generated by blast searching can be maintained in the sequence headers.
4.1.4.1 Convert_Trace is the default program used by SEQtools to extract plain nucleotide information from chromatogram files. The extracted nucleotide sequence is generated by the basecalling performed by the application which created the chromatogram and does not allow the user to modify/adjust the way the basecalling is carried out.
4.1.4.2 LifeTrace on the other hand is a stand-alone basecaller which uses information
included in the chromatogram to perform de-novo basecalling utilising its own algorithm for
calling bases.
LifeTrace runs on Linux/Unix systems and requires a more complex setup than convert_trace. In brief: Sequences must be copied to a Linux/Unis computer running LifeTrace to generate the data files used by SEQtools to post-process the basecalling. The advantage is that the user has full control over the basecalling operation as well as of the post-processing by SEQtools. Take a look at the preferences form above to get an impression of the options available when LifeTrace is used for basecalling/extraction of the nucleotide sequence from a chromatogram.
LifeTrace is particularly effective when applied to MegaBACE capillary sequencing machines. A detailled description of the LifeTrace /SEQtoolssetup and interactionand thecommandline argumentsare given on separate pages of this manual.
If a *.psp (project save paths) or a *.plp (project load paths) for
a project exists it is possible to re-open the project from the Open Existing Project
menu. The *.psp and *.plp files are lists of full paths to all sequence files included in
the project. The files may be located in different directories and can be single or
multi-sequence files - or a mixture of the two types.
The *.plp and *.psp files can be saved by clicking the Project/Project File Lists
as shown by the screenshot below.
The *.plp file is auto-generated when the project is created while the *.psp
file is auto-built/re-built each time the project is saved. This option is enabled in
Preferences/General Settings/Project Files
In case you wish to enter sequences manually either by typing the sequence
or by copy/paste from other applications or from additional instances of SEQtools you need
to tell SEQtools which type (nucleotide, primer or protein) of sequences you intend
to include in the project. When you choose this option, SEQtools sets the project type and
opens an empty file ready for receiving the new sequence.
Each additional sequence requires that you first create a new, empty, page (see below) to
hold the sequence before you start typing or copy/paste. Remember that a project can only
hold one type of sequence
SEQtools stores the last 20 opened sequence files (single and multi-sequence)
in the Open Recent Project or Sequence list for easy loading of often accessed
files. It is only possible to select and load one file from the list at a time. Note
that this list may include sequence files belonging to different sequence types.
The different sequence file formats are indicated by different icons. To clear the list
of recently opened files, click the title line of the list.
Once a project is created more sequence files can be added to the project
using the load form described in sections 4.1.3.. and 4.1.5. Note, however, that using
the 4.1.3 sub-menu will close the current project and create a new SEQtools project
while the Add Files To Project... add the selected files to the existing
project.
Apart from this difference the load form works exactly in the way described in
section 4.1.3.
It is also possible to add more sequences to the project using the Add Recent
Project Or Sequence
While adding sequence files to the project SEQtools warns you if you load
sequences with filenames already present in the project. If you choose to override
the warning and accept multiple files with identical names, SEQtools will modify
the filenames of such files if the project is saved as single sequence files in order
to avoid overwriting the first saved file with subsequent sequence files with the
same name.
Notice that the file type (nucleotide, primer or protein) of files to be
added to an existing project must be of the same type as the files in the project.
Sequences loaded with this function are appended to the list of sequences already
in the project.
Before you can add sequences to an existing project by typing the sequence or by copy/pasting the sequence from a different source you must first add an empty page to the the project to hold the sequence. Click Add Empty File To Project to append an empty page to the end of an existing project.
Occasionally it is convenient to be able to perform a blast search on Genbank
databases with oligonucleotides designed for microarrays. This can most easily be done by loading
the oligonucleotides into a primer project in SEQtools and subsequently convert the project to
a nucleotide project. This function Convert Project Type enables you to convert primer
projects to nucleotide projects and vice versa.
Important note: Converting a nucleotide project to a primer project will irreversibly remove all information stored in sequence headers due to the different design of the header structure of the two project types in SEQtools.
To remove a single sequences from a project simply highlight the sequence to be removed in the sequence list and click Remove Sequence From Project. The removed sequence is not removed from the hard disk, just no longer a member of the project.
To remove a selection of sequences from a project proceed as follows:
Hold down <CTRL> while clicking the sequences to be removed. <Shift+Right-Click> on
the sequence list to open the pop-up menu. Select Close Selected Sequences to remove
the selected sequences from the project. Again, the sequences are not deleted from the hard disk but only
removed from the project.
This function File/Export Formats formats the sequence and its header
so that they can be loaded into other nucleotide and protein analysis programs. There is a special
function which allow you to customise the single line header - the Definition Line - used
in Fasta format.
The different save/export formats supported by SEQtools are shown in the
screenshot of the save/export form. Additional options are available for several of the
export formats. Among these is an option for compressing multi-sequence SEQtools files which facilitates loading the file into a SEQtools project and
saves disk space.
Printing projects is usually not a relevant option. In most cases the amount of
data included in a project makes printing meaningless. As a consequence the printing facilities
in have not been revised for a long time and may not work as indicated on the print form. Users
in need for more sophisticated printing options are welcome to contact me for an update of the
print functions. Till then I intend to leave things as they are...
With this function you can send the currently displayed plain sequence by e-mail with an attached comment. In case you need to send the entire project the sequences must be saved in a multi-sequence file and e-mailed as an attachment using the standard e-mail Windows program.
Before SEQtools closes the user is adviced - twice - to save the project. Keep in mind
that SEQtools keeps all project data in RAM until the project is saved. Closing SEQtools without
saving the project will lead to irreversible loss of all data of the project.
Note that large batch blast search jobs - which may last several days - includes an option to
auto-save the project every time a specified number of searches has been performed. This reduces
the risk of data loss (in case of power failure for example) while the batch searching is
running. See section 4.4 of the manual for a more detailed description of this
option.