Skip to content

erichpeterson/pgfim

Repository files navigation

IMPORTANT: 
Two pieces of software must be installed to compile PGFIM:
1. The high-precision static library ARPREC, which one can download and install 
from the following website: http://crd-legacy.lbl.gov/~dhbailey/mpdist/
2. The mathematics library Boost, which one can download and install from the
following website: http://www.boost.org/

COMPILING:
To compile on a Linux-based machine, modify the Makefile file, to your environment and
locations of the two software libraries installed above. One can find where the locations 
are set in the Makefile, next to the CXX_CFLAGS and CXX_LDFLAGS variables.

Once your environment is set correctly, simply type 'make' at the command-line
in the directory of the source code, and the executable PGFIM should be created.

RUNNING:
./PGFIM -tau <TAU> -inputDB <INPUT_DB_FILENAME> -inputXML <INPUT_XML_FILENAME> [optional_args]

Sample Execution:
./PGFIM -tau 0.9 -minsup 3000 -inputDB input.data -inputXML taxonomy.xml


Command Line Options:
	-minsup minimum support threshold: [0.0-Size of DB] default: 1
	-tau Tau frequent probability threshold: (0.0-1.0] (Required)
	-inputDB Input DB file name (Required)
    -inputXML Input XML taxonomy (Required)
    -calc Frequentness calculation method: {dynamic} default: dynamic
	-output Output file name
	-exec Execution stats file name
	-print Print found itemsets: {0 | 1} (0 = false, 1 = true) default 0
	
DATABASE FILE & TAXONOMY FORMAT:
Attributes in the database file are to be space-delimited, and transactions/instances 
should be new-line delimited. Currently, only the natural numbers can be used as 
attribute names / items. Following each attribute name is a colon and then the 
probability of that attribute occurring. 

IMPORTANT: The last item and probability of each transaction must be followed by a space.

Example database with 3 transactions:
1:0.9 2:0.23 3:0.6 
2:0.2 4:0.8 
1:0.4 2:0.54 

An XML file is used to represent the taxonomy to be used. See a sample taxonomy in the
datasets directory, as well as, example databases.

CITATION:
Erich A. Peterson and Peiyi Tang. Mining Probabilistic Generalized Frequent Itemsets in Uncertain Databases. In Proceedings of the 51tst ACM Southeast Conference (ACMSE), Savannah, GA, USA, April 2013.

About

Probabilistic Generalized Frequent Itemset Mining (PGFIM) in Uncertain Databases

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published