Skip to content

polorumpus/dClass

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dClass - Device Classification Engine


HOWTO


To compile the test client, run make in the src directory.

To build with varnish and nginx, please reference the varnish and nginx
subdirectories.

To integrate with the dClass API:

  -include the dclass header file:
    #include "dclass_client.h"

  -define a dclass_index:
    dclass_index dci;

  -populate the index using a dtree file or OpenDDR resource file:
    dclass_load_file(&dci,"/path/to/file.dtree");
    -OR-
    openddr_load_resources(&dci,"/path/to/openddr/resources");

  -classify a string against the index and get the resulting kv data:
    dclass_keyvalue *kv=dclass_classify(&dci,"this is a string");
    char *id=kv->id;
    char *field_xyz=dclass_get_kvalue(kv,"xyz");

  -freeing the index:
    dclass_free(&dci);


AUTHORS


Reza Naghibi (rnaghibi@weather.com)
Joe Pearson (jpearson@weather.com)
Anthony Watson (awatson@weather.com)
Eric Honer (ehoner@weather.com)
Luke Kolin (lkolin@weather.com)
Ivan Kozhuharov (ikozhuharov@weather.com)
Chris Hill (chill@weather.com)
Chris McClellen (cmcclellen@weather.com)

OpenDDR (http://openddr.org/)


NOTES


The goal of this project is to quickly and accurately classify internet browser
user-agent strings. This project was built ground up to do so.

Please reference test.dtree in the dtrees subdirectory for detailed pattern
notes.

Performance is highly dependant on the conciseness of the patterns and not on
the number of patterns. On a 2.2Ghz Intel i7, the average time spent in
dclass_classify() using the OpenDDR pattern set and a random set of user agents
is 800ns. Out of the box, memory usage for the OpenDDR pattern set is right at
3mb. This value can go up or down depending on the key fields loaded from
OpenDDR (OPENDDR_KEYS) and various index settings.

All US-ASCII alphanumeric characters are pattern searchable. Non alphanumeric
pattern searchable characters are defined in DTREE_HASH_SCHARS. Add or remove
characters as needed. Indexed US-ASCII print characters (0x20 thru 0x7E) which
aren't pattern searchable are replaced with DTREE_PATTERN_ANY and can match on
any character. All pattern matching is US-ASCII case insensitive.

Write operations on the index are not thread safe. Read operations are thread
safe (with at most one writer). Read operations have the dclass index parameter
designated with a 'const'. The index is designed for multiple readers and
writers, however such a feature is currently outside of the foreseeable scope
of this project.

Memory limits are tightly bounded. Default configuration allows for 65k search
nodes and 8mb of general memory. Adjusting DTREE_DT_PACKED* will allow for more
search nodes and increasing DTREE_M_MAX_SLABS will allow for more general use
memory.

About

Device Classification Engine

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published