GitHub - amaunz/embrel: Euclidean Embedding of Co-Proven Queries: a 2D embedding based on co-occurrence due to H. Schulz et. al.

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
CMakeModules		CMakeModules
build		build
doc		doc
results		results
src		src
svg		svg
README		README
convertFMiner2Embrel.py		convertFMiner2Embrel.py
extractAndSplitSDF.py		extractAndSplitSDF.py
pointsxml2svg.pl		pointsxml2svg.pl

Repository files navigation

=============================================
|        Embedded Relational Learning       |
=============================================

This program produces SVG graphics describing the relationships between queries
and interpretations in a relational learning setting.



Prerequisites
---------------------------------------------

for the c++ program
- a current c++ compiler (>=4.1)
- boost >= 1.35
- for boost.ublas to work, you need to install the LAPACK backend.
- boost-numeric-bindings
	This is a header-only library you can get from:
	http://mathema.tician.de/software/boost-bindings
	just drop it in /usr/local/include/boost-numeric-bindings
- matio for matlab output

for the xml to svg conversion script
- perl >= 5.1
- The Perl modules SVG, Convert::Color, File::Util.


Building
---------------------------------------------

$ tar xzvf erl.tar.gz
$ cd erl
$ mkdir build
$ cd build
$ cmake ../src
$ make -j2


Installing
---------------------------------------------

not done yet, just run from build directory.



Running
---------------------------------------------

try: 
$ ./erl --help

$ ./erl {action} [options]

there are two main actions:

- count
  this is a Molfea implementation for SDF files
	count does currently not compile on 64bit systems due to known bugs in boost.

- code
	this is a specific RPROP-based CODE implementation for visualization
	purposes.


Configuration File
---------------------------------------------

You can put all options in the configuration file as well.
The configuration file defaults to config.dat and can be changed using the -c
parameter. 

Configuration files have a INI-syntax.
The options translate as follows:

the option:
	--count.out_base="something"
becomes
	[count]
	out_base = something




Examples  -- count
---------------------------------------------
you need SDF-files (the LONG format!!!)

the output of the count stage is cached, therefore you need to specify "-r" if
you want the program to re-count and re-read everything.

an example call could be:

$ ./erl count --count.out_base=base --SDFReader.files molecules_a.sdf:1,molecules_i.sdf:0 --output-dir=`pwd` -r -v

Explanation:
This count action generates two csv-files in output-dir, base.csv and
base-names.csv needed for the code action (see below).
It uses two SDF-files, molecules_a.sdf and molecules_i.sdf, with class 1/0,
respectively. The output files (and the cache) is written to the current
directory, re-counting is forced. The program is verbose.





Examples  -- code
---------------------------------------------
you need two files: the matrix file and the names file.

matrix file
- contains the description of an interpretation in each line
- a line consists of
  - a name for the interpretation (currently ignored)
	- for every query, a number of co-occurrences (usually 1/0)
	- a class label (should be an int)

names file
- contains a list of names for the queries, one per line.

use this name scheme for the files:
Matrix file:   BASENAME.csv
Names  file:   BASENAME-names.csv

you can then run erl using the following command

$ ./erl code --code.input_file BASENAME   --output-dir=`pwd`

currently, the output-dir parameter is _mandatory_.


good luck!