Classification using the normalized compression distance

pip install -r requirements.txt

to install dependencies.

In classify.py, specify the classes you want to train the classifier on. For example:

BASE_DIR = "data"
DIRECTORIES = [
  "fragmented_csv",
  "fragmented_jpg"
]

The files for each class should be in their own directory. The class label is the file extension.

Select the number of anchors per class and the number of items per class:

ITEMS_PER_CLASS = 1000
ANCHORS_PER_CLASS = 10

Then run python classify.py. A report is automatically written to reports/ and to stdout. For parameters mentioned above:

precision    recall  f1-score   support

.csv     0.9950    0.9950    0.9950      1000
.jpg     0.9950    0.9950    0.9950      1000

avg / total     0.9950    0.9950    0.9950      2000

CV specifies the number of folds used in cross validation. GRID_SEARCH_CV specifies the number of folds used in cross validation for estimating the fitness of the model parameters. For each training partition of the data set, the best parameters within the specified space are found, and are used to predict the test partition.

You can specify the parameter space as follows:

PARAM_GRID = [
    {'kernel': ['rbf'], 'gamma': [ 2 ** n for n in numpy.arange(-9, 2, 1)
    ], 'C': [ 2 ** n for n in numpy.arange(-2, 9, 1) ] }
]

References

scikit-learn

A Practical Guide to Support Vector Classification

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
data		data
python		python
reports		reports
representative_anchors		representative_anchors
source		source
virtualenvs/py2.7		virtualenvs/py2.7
.gitignore		.gitignore
README.md		README.md
classify.py		classify.py
clean_blogs.rb		clean_blogs.rb
fragmented_jpg_10anchors_2015-07-01 16:44:42.349605.pickle		fragmented_jpg_10anchors_2015-07-01 16:44:42.349605.pickle
fragmentize.py		fragmentize.py
ncds.py		ncds.py
requirements.txt		requirements.txt
select_representative_anchors.py		select_representative_anchors.py
select_representative_anchors.pyc		select_representative_anchors.pyc
support_vector_machine.py		support_vector_machine.py
test_lzma.py		test_lzma.py

Kappie/support_vector_machine

Folders and files

Latest commit

History

Repository files navigation

Classification using the normalized compression distance

References

About

Resources

Stars

Watchers

Forks

Languages