Skip to content

gpalma/annsim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This a source code release of AnnSim.
It has been tested on GNU/Linux and OS X.

AnnSim
===========
AnnSim is an similarity metric for Annotation Graphs

1) LICENSE
===========
GNU GENERAL PUBLIC LICENSE Version 2

2) CONTENT
===========
*    src: source code.
*    doc: information about the datasets.
*    test: set of test.
**   test/ACM_BCB/dataset2: Dataset 2 used in the publication [1].
***  test/ACM_BCB/dataset2/ncit: NCIt graph and list of terms used in Dataset 2.
**   test/ACM_BCB/dataset3: Dataset 3 used in the publication [1].
***  test/ACM_BCB/dataset3/go: GO graph and list of terms used in Dataset 3.
**   test/ACM_BCB/dataset4: Dataset 4 used in the publication [1].
***  test/ACM_BCB/dataset4/go: GO graph and list of terms used in Dataset 4.
**   test/DATABASE_Journal : used in the publication [2] see test/DATABASE_Journal/README.

[1] Palma, Guillermo; Vidal, Maria-Esther; Haag, Eric; Raschid, Louiqa; Thor, Andreas. 
Measuring Relatedness Between Scientific Entities in Annotation Datasets. 
Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics (ACM BCB). 
Washignton DC, USA. 2013

[2] Palma, Guillermo; Vidal, Maria-Esther; Haag, Eric; Raschid, Louiqa; Thor, Andreas. 
Determining Similarity of Scientific Entities in Annotation Datasets
Database.
ACM-BCB 2014 Virtual Issue.

3) REQUIREMENTS
================
* GNU Colecction Compiler (gcc)
* GNU make (make).

4) INSTALLATION
================
For GNU/Linux clean and generate necessary files

   $>make veryclean
   $>make

The executable file annsim is generated in the annsim directory

5) USAGE
=========
The executable annsim have 4 mandatory command line options.
Here, mandatory means that without specifying this option, the program won't work.

annsim command synopsis:

	 annsim [-v] <graph> <terms> <annotations 1> <annotations 2>

The following are the command line options:

-v  	      	      # verbose output
<graph>	      	      # Ontology graph file
<terms>		      # File with the terms of the ontology
<annotations 1>	      # File with annotations from the first concept
<annotations 2>	      # File with annotations from the second concept


6) RUNNING SOME SAMPLES
========================
1) Compute the value of similarity between two genes.

$>./annsim -v test/ACM_BCB_dataset3/go/goGraph.txt test/ACM_BCB_dataset3/go/go-term-desc.txt test/ACM_BCB_dataset3/5-f-atpase/AtVHA-A.annt test/ACM_BCB_dataset3/5-f-atpase/AtVHA-C.annt

2) Compute the value of similarity between two drugs.

$>./annsim test/ACM_BCB_dataset2/ncit/graphNCI.txt test/ACM_BCB_dataset2/ncit/nci-term-desc.txt test/ACM_BCB_dataset2/Catumaxomab.txt test/ACM_BCB_dataset2/Panitumumab.txt

7) AnnSim File Formats
==========================
The following explains the formats of the files used in the computation of the metric.

7.1) Graph File Format
=======================
The graph of the ontology must be an Direct Acyclic Graph (DAG).
The vertices of the graph are the identifiers of the terms of the ontology.
The first line has the following format:

<<number of nodes>>[TAB]<<number of arcs>>

Where <<number of nodes>> is the number de vertices of the graph,
[TAB] is the tab space, and <<number of arcs>> is the number the arcs
of the graph. The following lines in the file specify
a source node, a  target node, and the cost of the arc:

<<source node1>>[TAB]<<target node1 >>[TAB]<<cost1>>
..
..
..
<<source nodeN>>[TAB]<<target nodeN>>[TAB]<<costN>>

7.2) File format of identifiers
================================
We have a file with the IDs of the terms of the ontology and its description
The format is as follows:
<<Number of terms>>
<<Term1>>[TAB]<<Description of the term1>>
..
..
..
<<TermN>>[TAB]<<Description of the termN>>

Note that the number of terms must be equal to the number of nodes of the graph.

7.3) Annotation File Format
============================
The format is as follows:

<<Number of annotations>>
<<annotation1>>
..
..
..
<<annotationN>>

Where <<annotationX>> is the identifier of the annotation in the ontology.

8) CONTACT
===========
I hope you find AnnSim an useful tool. Please, let me know
any comment, problem or suggestion.

Guillermo Palma
http://www.ldc.usb.ve/~gpalma
gvpalma@usb.ve

About

Automatically exported from code.google.com/p/annsim

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages