GitHub - lashleigh/sequence-database-cpp: Translating the python sequence database tool into C++

lashleigh / sequence-database-cpp Public

Notifications You must be signed in to change notification settings
Fork 0
Star 1

Translating the python sequence database tool into C++

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
AminoAcidMasses.h		AminoAcidMasses.h
Makefile		Makefile
README		README
SGDyeast_subset.fasta		SGDyeast_subset.fasta
constants.hpp		constants.hpp
header.hpp		header.hpp
helpers.cpp		helpers.cpp
lists.cpp		lists.cpp
main.cpp		main.cpp
peptide.cpp		peptide.cpp
peptide.hpp		peptide.hpp
printHelpers.cpp		printHelpers.cpp
protein.cpp		protein.cpp
protein.hpp		protein.hpp
test.fasta		test.fasta

Repository files navigation

This is the library creation module of a much larger project. 
The basic idea is to generate peptides from artificially digested 
proteins. 

Simply type 'Make' and then 'make run' to run the supplied fasta file, 
test.fasta. Make generates an executable called 'main'. To digest other 
fasta files simply type:  ./main test.fasta

The premise of the program is simple, given a protein:

LLHSLKIHNNTASQKTALMEQYDRYLIVENLYYRGLVSQDINIMQNVFYKELLAHVDTIP

If cleaved at all K's and R's (trypsin), this protein would generate fragments like this:
691.438 LLHSLK
993.499 IHNNTASQK
1107.5  TALMEQYDR
1326.7  YLIVENLYYR
1684.94 LLHSLKIHNNTASQK
1849.94 GLVSQDINIMQNVFYK
2101    IHNNTASQKTALMEQYDR
2434.2  TALMEQYDRYLIVENLYYR
3176.64 YLIVENLYYRGLVSQDINIMQNVFYK

Where the first column correspond to the neutral mass of the 
fragments. Amino acid masses are defined in AminoAcidMasses.h.

It is also possible that trypsin might not digest everything perfectly. 
There are two types of imperfect digestion, missed cleaveage and semi-tryptic. 
A missed cleaveage would correspond to a simple concatenation of two adjacent 
fully tryptic peptides:

LLHSLK + IHNNTASQK = LLHSLKIHNNTASQK

Semi-tryptic means that only one end of the peptide has a tryptic termini, 
all semi tryptic peptides of GLVSQDINIMQNVFYK:
G LVSQDINIMQNVFYK
GL VSQDINIMQNVFYK
GLV SQDINIMQNVFYK
GLVS QDINIMQNVFYK
GLVSQ DINIMQNVFYK
GLVSQD INIMQNVFYK
GLVSQDI NIMQNVFYK
GLVSQDIN IMQNVFYK
GLVSQDINI MQNVFYK
GLVSQDINIM QNVFYK
GLVSQDINIMQ NVFYK
GLVSQDINIMQN VFYK
GLVSQDINIMQNV FYK
GLVSQDINIMQNVF YK
GLVSQDINIMQNVFY K

About

Translating the python sequence database tool into C++

Readme

Activity

1 star

1 watching

0 forks

Report repository

Releases

No releases published

Packages

No packages published

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AminoAcidMasses.h

AminoAcidMasses.h

Makefile

Makefile

README

README

SGDyeast_subset.fasta

SGDyeast_subset.fasta

constants.hpp

constants.hpp

header.hpp

header.hpp

helpers.cpp

helpers.cpp

lists.cpp

lists.cpp

main.cpp

main.cpp

peptide.cpp

peptide.cpp

peptide.hpp

peptide.hpp

printHelpers.cpp

printHelpers.cpp

protein.cpp

protein.cpp

protein.hpp

protein.hpp

test.fasta

test.fasta

Repository files navigation

About

Releases

Packages

Languages

lashleigh/sequence-database-cpp

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages