edwardotis/hivm
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
HIVM Usage: You can type in: hivm --help or hivm -h to see the usage at the command prompt. Sample scripts are available in samples/windows and samples/linux directories. Presently, hivm has these two functions: model-selection ( cross-validate and grid search for cost and gamma parameters ) prediction (run classification on the test data using c,g parameters chosen from cross-validation) Allowed options: -h [ --help ] produce help message -p [ --purpose ] arg model-selection or prediction -d [ --drug ] arg HIV drug to be tested -t [ --thresholds ] arg Thresholds for high and low drug fold resistance . Please use only 1 or 2 thresholds. Default: 2 and 10 -w [ --wild-type ] arg Wild Type Enzyme Sequence File -f [ --hivdb-file ] arg HIVDB Susceptibility Data File -s [ --seed ] arg (=42) Optional: Seed for randomly splitting susceptibility file into training and testing sets. Positive integers only. Default: 42 -e [ --suscep-type ] arg (=all) Optional: Type of susceptibility experiment used: clinical, lab, or all. -c [ --log-cost-c ] arg (=0) Log Cost. Required: test operation. Ignored for CrossValidation and SelfValidation operations -g [ --log-gamma-g ] arg (=0) Log Gamma. Required: test operation. Ignored for CrossValidation and SelfValidation operations -x [ --log-cost-low ] arg (=0) Low Log Cost. Required: cross-validate . For Grid Search of Parameters -y [ --log-cost-high ] arg (=0) High Log Cost. Required: cross-validate . For Grid Search of Parameters -z [ --log-cost-increment ] arg (=0) Log Cost Increment. Required: cross-validate . For Grid Search of Parameters -l [ --log-gamma-low ] arg (=0) Low Log Gamma. Required: cross-validate . For Grid Search of Parameters -m [ --log-gamma-high ] arg (=0) High Log Gamma. Required: cross-validate . For Grid Search of Parameters -n [ --log-gamma-increment ] arg (=0) Log Gamma Increment. Required: cross-validate . For Grid Search of Parameters -o [ --output ] arg Optional: Prefix for output files. Default: current timestamp Analysis: A spreadsheet comparing cross-validation results to test results of a prechosen cost, gamma parameter pair is available in: samples/test-results.xls This file contains results for 3 experiments with different threshold categories: 2 fold only 10 fold only 2 fold and 10 fold simultaneously The "optimal" cost, gamma pair was chosen for the test was the one that created the greatest difference between True Postive Rate and False Positive Rate. i.e. Maximized True Positive Rate and Minimized False Positive Rate Please use the sample scripts below to run any cost, gamma pairs that you would like to compare. Samples: (In Linux, executable permissions may need to be set for all scripts and executablers) The main sample scripts have already been run, and the results are available: samples/linux/results or samples/windows/results Precompiled x86, 32bit binaries for linux and windows are available in the samples directory. Short Samples: Short samples are available to play with. They are much less computationally intensive than the full model selection scripts. Full scripts search a large number of possible cost, gamma parameter pairs, and may take several hours for each experiment to complete on an Intel 3Ghz Xeon. Samples Cache: The cache consists of precomputed of Local Alignment scores in comma separated values format. It is 30MB in size uncompressed, so it was compressed and placed into CVS repository. Please extract it if you wish to use it: samples/linux/cache/*.csv samples/windows/cache/*.csv Samples Output Files: Each model selection run results in 5 output files. They are described below: *results.csv - Statistics and results for every cost, gamma pair tried in an experiment. *cmdline.txt - Contains all the program options used by hivm in a particular experiment. Additionally, it creates a one line cmd line script so that the experiment can be easily run again. *.log - A log file that may have some useful messages writ_ten during the program execution. *gnuplot_script.gpl - A gnuplot script for creating a ROC curve image. *roc_data_points.csv - Datapoints for the gnuplot script. -- *.png - ROC curve image output by gnuplot script. LINUX: hivm was developed and tested using Boost.org libraries distribution: 3.1.13 The x86 binaries libraries are included in this distribution of hivm. If these binaries do not work for you: Run src/build_libs.sh to build the appropriate Boost binaries. HIVM Build Instructions: Run src/autogen.sh and src/make Compilation is controlled using autoconf and automake. Copy hivm.exe into samples/linux/ Run: Open a drug script. like IDV.sh Comment lines in or out to run different model selection or prediction routines View output in samples/linux/results. ROC Curve Graphs: Find *.gpl scripts in samples/linux/results Run gnuplot script *.gpl to see a ROC curve of model selection results. From model selection, pick a c,g pair and run it in prediction mode, and then compare it to model selection. Linux Unit Tests: To run unit tests on Linux, first compile a test version of hivm, then run it. To compile: $ cd src $ make check $ cp test/testhivm test/Debug To run: $ cd test/Debug $ ./testhivm In order to control which tests are run, use src/test/Definitions.hpp By commenting out definitions, you can control which tests will be compiled into testhivm. TEST_ALL: full regression test of every possible test.* LONG_TESTS: In any given test class, there are some tests that take a long time to run. This definition can be used to turn these on an off. Especially useful in conjunction with 'Classname'Tests = Definitions used to control explicitly which classes you want to test. *Isolation Tests: Certain classes can only be tested in Isolation. Turn off (comment out), all other definitions before running these one at a time. WINDOWS: hivm was developed and tested using Boost.org libraries distribution: 3.1.13 The appropriate win32 libraries are included in this distribution of hivm. If you need to build the libraries yourself for any reason, use this command: Run src/build_libs.bat HIVM Build Instructions: Open hivm.sln in MS Visual Studio.(Created with v7.1 aka MS Visual Studio 2003) Choose Release version of the hivm project and build. Copy Release/hivm.exe into samples/windows/ Note: MS Visual Studio 2008 users. If you let MS Visual Studio 2008 update the sln file, you need to do manually update the referenced boost unit test library. These are located in: hivm\src\3rd_party\boost\lib Instructions after opening the MS sln file: Right Click the 'testhivm' project. Goto Properties. Select Configuration: All Configurations Open Linker > Input > Additional Dependencies. Change 'libboost_unit_test_framework-vc71-mt-sgd-1_33_1.lib' to 'libboost_unit_test_framework-vc80-mt-sgd-1_33_1.lib' Run: Open a drug script. like IDV.bat Use start_IDV.bat to start IDV.at using a Low Process Priority. Comment lines in or out to run different model selection or prediction routines View output in samples/windows/results. ROC Curve Graphs: Find *.gpl scripts in samples/windows/results To see a ROC curve of model selection results. Run gnuplot script *.gpl From model selection, pick a c,g pair and run it in prediction mode, and then compare it to model selection. Windows Unit Tests: Building Open trunk/hivm.sln in MS Visual Studio. (Created with v7.1) Choose Release version of the testhivm project and build. Running: src/test/Debug/testhivm.exe In order to control which tests are run use: src/test/Definitions.hpp By commenting out definitions, you can control which tests will be compiled into testhivm.exe TEST_ALL: full regression test of every possible test.* LONG_TESTS: In any given test class, there are some tests that take a long time to run. This definition can be used to turn these on an off. Especially useful in conjunction with 'Classname'Tests = Definitions used to control explicitly which classes you want to test. *Isolation Tests: Certain classes can only be tested in Isolation. Turn off (comment out), all other definitions before running these one at a time. Other: # Samples Sizes for single thresholds with seed 42 Train Test Drug 300 137 APV 53 21 ATV 286 136 IDV 165 65 LPV 308 143 NFV 249 119 RTV 303 143 SQV Single Class Warning: "Warning! Training data is all in same class. Predictions were all for same class as well. Please view Readme.txt to see ways to avoid this problem. View log to see details of training data." If you saw this warning, then all your training data is in the same class. Cookbook to solve this problem: a. Check the drug resistance threshold you used. If it is very low or very high then that will force all your training data to be in the same class. b. Change the seed used to randomly split training and test data. Perhaps you just got a bad draw with the previous seed. c. If available, use more data for training.
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published