Skip to content

realbigws/DeepAlign

Repository files navigation

=======================
DeepAlign_Open (v1.35)
date: 2018.08.15
=======================

Author: Sheng Wang 
Contact email: realbigws@gmail.com , wangsheng@uchicago.edu, wangsheng@ttic.edu

please refer to my homepage:
http://ttic.uchicago.edu/~wangsheng/
--------------------------------------------------------


1. Protein structure analysis 
[Reference]
--------------------------------------------------------
Jianzhu Ma, Sheng Wang.
ALGORITHMS, APPLICATIONS AND CHALLENGES OF PROTEIN STRUCTURE ALIGNMENT.
Advances in Protein Chemistry and Structural Biology, 2014 (invited).
http://www.ncbi.nlm.nih.gov/pubmed/24629187
--------------------------------------------------------


2. Pairwise protein structure alignment or structure homolog search
[Reference]
--------------------------------------------------------
Sheng Wang, Jianzhu Ma, Jian Peng and Jinbo Xu.
    PROTEIN STRUCTURE ALIGNMENT BEYOND SPATIAL PROXIMITY
                     Scientific Reports, 3, 1448, (2013)
http://www.ncbi.nlm.nih.gov/pubmed/23486213
--------------------------------------------------------


3. Multiple protein structure alignment
[Reference]
------------------------------------------------------------
Sheng Wang, Jian Peng and Jinbo Xu.
ALIGNMENT OF DISTANTLY-RELATED PROTEIN STRUCTURES: ALGORITHM,
                 BOUND AND IMPLICATIONS TO HOMOLOGY MODELING.
                  Bioinformatics. 2011 Sep 15;27(18):2537-45
http://www.ncbi.nlm.nih.gov/pubmed/21791532
------------------------------------------------------------


========
Upgrade:
========

For DeepAlign_Open (v1.2)

1. allow user to specify the function to be optimizated. ( -s score_func option)

2. allow user to specify the distance cutoff. ( -C distance_cut option)

3. multiple solution output is now functional. ( -p out_option option)

4. allow user to input 'mask' files for setting the region segments. (-A, -B option)


========
Install:
========

1. download the package

git clone https://github.com/realbigws/DeepAlign/
cd DeepAlign/

--------------

2. compile

cd source_code/
        make
cd ../

--------------

3. update the package

git pull



=====================
DeepAlign Web Server:
=====================

The Web Server is located at http://raptorx.uchicago.edu/DeepAlign/submit/



==========================
Linux system and CPU type:
==========================

DeepAlign_Open has been compiled and tested under the following Linux system,
	Linux version 3.2.0-72-generic (buildd@toyol) 
	gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) 

with the following CPU type,
	Intel(R) Xeon(R) CPU E5-4607 0 @ 2.20GHz



===================================
Setup for structure homolog search:
===================================

In this package please find an example PDB database in 'databases/reference_pdb'
'reference_list' contains all the proteins in this folder. 
Please do not delete or change 'reference_pdb' and 'reference_list'.

Users can download some protein structure databases from http://raptorx.uchicago.edu/download/

------

1. protein lists
copy bc*_list to databases/, where '*' stands for 40, 70, 90, 100 

------

2. PDB database
If databases/pdb_BC100/ does not exist, create one.
copy or link all the .pdb files to databases/pdb_BC100/

Note: If you just want to do pairwise or multiple structure alignment, 
you do not have to download these databases, which are supposed to be used for only structure homolog search.
You can also build your own database for structure homology search.

------

3. reference PDB database
Note: The reference PDB database locates at databases/reference_pdb



==============
Main programs:
==============

1. Evaluate quality scores of a given pairwise protein alignment.
(i.e., GDT, TMscore, MaxSub, and DeepScore)
	./DeepScore protein_1 protein_2 <-a alignment>

2. Pairwise structure alignment
To run pairwise alignment, type the following command,
	./DeepAlign protein_T protein_S

3. Multiple structure alignment
To run multiple alignment, type the following command,
	./3DCOMB -i input_list

4. Database search
To run database search, type the following command,
	./DeepAlign_Search.sh -c NP -q query_file -l pdb_list -d path_to_folder_containing_PDB_files

Usage information for all the programs are available by simply running these programs without any arguments.




=========
Examples:
=========

Note: the first two examples are located in examples/ directory

1. Run pairwise structure alignment for 1col.pdb and 1cpc.pdb
	cd examples/DeepAlign_example
		./DeepAlign 1col.pdb 1cpc.pdb

2. Run multiple structure alignment for proteins in input_list
	cd examples/3DCOMB_example
		./3DCOMB -i input_list

3. Run database search for 1pazA.pdb against bc40 template database
	./DeepAlign_Search.sh -c 8 -q examples/1pazA.pdb -l databases/reference_list -d databases/reference_pdb/ -O 1pazA.rank

4. Run batch pariwise comparison
	util/batch_strucsco_mat.sh -i examples/ref_top10_list -r databases/reference_pdb -I examples/ref_top10_list -R databases/reference_pdb -p DeepAlign -f '.pdb'



======================
Screenout description:
======================

name1 name2 len1 len2 -> BLOSUM CLESUM DeepScore ->  LALI RMSD TMscore -> MAXSUB GDT_TS GDT_HA -> SeqID nLen

name1:     Name of the first protein
name2:     Name of the second protein
len1:      Length of the first protein
len2:      Length of the second protein
BLOSUM:    BLOSUM62 score of the alignment (in 0.5 bit)
CLESUM:    CLESUM score of the alignment (in 0.5 bit)
DeepScore: DeepScore of the alignment, calculated by
             {max(0,BLOSUM(i,j))+CLESUM(i,j)}*d(i,j)*v(i,j)
LALI:      Length of alignment
RMSD:      RMSD (Root Mean Square Deviation) of the alignment
TMscore:   TMscore of the alignment
MAXSUB:    Maxsub score of the alignment
GDT_TS:    GDT_TS score, using d<1, d<2, d<4 and d<8
GDT_HA:    GDT_HA score, using d<0.5, d<1, d<2 and d<4
SeqID:     the number of identical amino acid matches in the alignment
nLen:      Normalization length for TMscore, MAXSUB, GDT_TS, and GDT_HA




=========================================
Specify residue range by -x or -y option:
=========================================

symbol explanation:
-------------------

':' for PDB numbering, that the range is selected according to the residue number in the PDB file.
'.' for SEQUENCE numbering, that the range is selected according to sequential order, starting from 1.
'-' to indicate a certain beginning till the end.
'+' to indicate a certain beginning followed by the length range.
'@' to indicate the beginning or the ending of a certain chain.
',' to separate between two PDB chains.
'!' to select all chains in the structure.
'_' to use the first chain in the structure.


example:
--------
A:21-179,B.45-@


description:
------------
Two segments from the input structure are selected.
For chain A we select from residue 21 to residue 179 according to PDB numbering,
whereas for chain B we select from position 45 to the end in sequential order.

__________________________________________________________