Skip to content

kaibader/CaSSiS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

CaSSiS

CaSSiS is a fast and scalable software for computing comprehensive collections of sequence- and sequence-group-specific oligonucleotide signatures from large sets of hierarchically clustered nucleic acid sequence data. CaSSiS determines sequence-specific signatures and perfect group-covering signatures for every node within a cluster (i.e. target groups). For groups lacking a perfect common signature, it is able to find the signatures with maximal group coverage (sensitivity) within a user-defined range of non-target hits (specificity). An upper limit of tolerated mismatches within the target group, as well as the minimum number of mismatches with non-target sequences, can be predefined.

Please cite the corresponding Bioinformatics publication if you find this project useful: Comprehensive and relaxed search for oligonucleotide signatures in hierarchically clustered sequence datasets

The former CaSSiS project page page at the TU München was shut down in the mean time. I've therefore created this 'mirror' of the source CaSSiS source code.

Building CaSSiS from Source

CaSSiS uses the cmake build system. Please make sure to have it installed.

Please make sure to build the source code in an empty directory, i.e. create a new (sub)directory and then run cmake from there. Running make in the source directory is prevented to avoid build errors.

$ tar xvfj CaSSiS-0.5.2-src.tar.bz2

$ mkdir /path/to/build-dir

$ cd /path/to/build-dir

$ cmake /path/to/cassis-src-dir

$ make

I've also prepared a video tutorial: How To build CaSSiS from Source showing the installation process.

Usage

Please make sure that the environment variable LD_LIBRARY_PATH points to the correct directory, if necessary.

export LD_LIBRARY_PATH=/path/to/cassis/lib

CaSSiS Command Line Interface

CaSSiS usage

cassis {1pass|create|process|info} [options]

cassis 1pass (new CaSSiS-LCA approach)
Mandatory Options: -seq [... -seq] -tree
Optional: -all -dist -gc -idx -len -mis -og -rc -temp -wm

cassis create
Mandatory Options: -bgrt -seq [... -seq]
Optional: -all -dist -gc -idx -len -mis -rc -temp -wm

cassis process
Mandatory Options: -bgrt -tree
Optional: -og

cassis info
Mandatory Option: -bgrt

Options (alphabetical)

Parameter Function
all Evaluate all 4^len possible signatures. (Not recommended, may take forever... Default: off)
bgrt BGRT file path and name.
dist number Minimal mismatch distance between a signature candidate and non-targets. (Default: 0.0 mismatches)
gc min-max Only allow signatures within a defined G+C content range. (Default: 0 - 100 percent)
idx Defines the used search index: minipt='MiniPt Search Index' (Default)
len length|min-max Length of the evaluated oligonucleotides. Either a fixed length or a range. (Default: 18 bases)
mis number Number of allowed mismatches within the target group. (Default: 1.0 mismatches)
og limit Number of outgroup hits up to which group signatures are computed. (Default: 0)
rc Drop signatures, if their reverse complement matches sequences not matched by the signature itself. (Default: off)
seq MultiFasta file as sequence data source Multiple sequence sources can be defined.
tree Signature candidates will be computed for every defined (i.e. named) node within a binary tree. Accepts a Newick tree file as source.
temp min-max Only allow signatures with a melting temperature within the defined range. (Default: -273 -- 273 degree Celsius)
v Verbose output
wm Enable "weighted mismatch" values. (Default: off)

Caution: Combining the -gc and -temp filters can cause unwanted side effects as they influence each other.

CaSSiS Result Files

CaSSiS was designed to compute comprehensive signature sets based on large sequence databases. To be able to further process/parse the result files of CaSSiS, the simple but flexible CSV format was chosen. Smaller result files can be read/edited with spreadsheet software (Excel, LibreOffice, ...). Larger files should be parsed with an appropriate editor or scripts.

When directly run, CaSSiS creates two different types of result files in the CSV format in the current directory: result_array.csv and results_xxx.csv where xxx is defined by the outgroup hits' range.

result_array.csv

This file gives an overview over the results. Rows represent the range from 0 to max. outgroup hits. Columns represent the nodes of the hierarchical cluster. All information within this file is also included in the following ones.

results_xxx.csv

This file contains the actual results of CaSSiS, separated by the number of outgroup hits xxx. Within the files, the first column contains the node identifier of the hierarchical cluster. The second and third column show the number of organisms within the group/node (for leafs/organisms: 1) and the number of actually matched organisms. The appropriate signatures follow subsequently.

License

CaSSiS

CaSSiS is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. CaSSiS is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with CaSSiS. If not, see http://www.gnu.org/licenses/

MiniPT

MiniPT is derived from the ARB PT-Server search index: The ARB software and documentation are not in the public domain. External programs distributed together with ARB are copyrighted by and are the property of their respective authors unless otherwise stated. All other copyrights are owned by the Lehrstuhl für Mikrobiologie, TU München. See the respective license.txt files in the binary/source packages for more details.

About

Comprehensive and Sensitive Signature Search

Resources

Stars

Watchers

Forks

Packages

No packages published