Skip to content

exodist/GSD_Dictionary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

= GSD Dictionary =

GSD Dictionary is a dictionary type written in C. GSD Dictionary is designed
for use by the GSD programming language.

== Version ==

Current version is 1.0.0.

The left-most number is the major version, this only bumps up for
reverse-incompatible changes.

The middle number is the minor version, this bumps up for new features that do
not break reverse-compatibility.

The final number is the revision, it is bumped for bigfixes and enhancements
that do not change or add features.

== Features ===

 * Designed to be highly concurrent:
   * Any number of threads can read and write.
   * Transactional model for some operations:
     - insert: Will fail if key already exists (or if another thread inserted
       it first)
     - update: Will fail if key does not already exist
     - delete: Will fail if key does not exist
     - dereference: Will fail if there is no reference to remove
     - cmp_update: Update value with an atomic compare and swap only if
       the current value is what we expect
     - cmp_delete: Delete value with an atomic compare and swap only if
       the current value is what we expect
   * 'update', 'delete', and 'get' operations will never lock, block, or spin.
   * Optimistic model for insert and removal operations:
     - An insert is usually simple, but sometimes they require some internal
       structures to be rebuilt. An insert may spin if an internal structure is
       being rebuilt in another thread.
     - A key removal (dereference) can result in a partial rebuild.
     - You can control when rebuilds are triggered via max_imbalance.
 * Allows for graceful handling of pathological data:
   * When an operation results in a pathological tree, the operation still
     succeeds, but a DICT_PATHO_ERROR is returned to notify you.
   * The dictionary can be completely rebuilt with a new configuration in a
     thread-safe way.
     - The dictionary stores a 'meta' void pointer for your use. You can use
       this to store a hash seed for example. On pathological data you can
       rebuild with a new hash seed.
     - You can use the rebuild to dynamically adjust the number of hash slots.
     - You can use the rebuild to adjust the max_imbalance
 * Internal structure is a hash table, each table slot becomes a binary tree if
   multiple keys resolve to the same slot (hash conflict).
 * You specify how deep a tree imbalance should be before it is balanced
 * An imbalance in one tree only rebalances that tree, other hash slots are not
   effected.
 * Can use anything as keys and values
   * You specify how to compare keys (for sorting, for the trees)
   * You specify how to get a slot number (hash key) for a key
   * Can provide a callback for reference counting on keys and values.
   * 'meta' void pointer is passed to your compare and hashing functions
 * You can link together keys within a dictionary, or even across multiple
   dictionaries. This means that updating a key in one dictionary can also
   update other keys in the same dictionary, or even other dictionaries.
 * Internal structures are garbage collected using a thread-safe combination of
   reference counting and a custom designed 'epoch' system.
 * The internal structure of the hash can be dumped to the graphviz DOT format
   for visualization (up to a certain size of dictionary).

== Overview ==

GSD Dictionary is a dictrionary library. It works as an associative array.
Under the hood a dictionary is a hash table that degrades to a binary tree when
there is a hash conflict. You can use anything you want as keys and values, and
you specify the hashing and compare functions.

== Epoch Garbage Collector ==

Terminology:
 - Dispose: When we dispose an structure, no operations in any thread will be
   able to reach that structure moving forward. It is important to note that
   operations in any thread started before the disposal may still have access
   to the structure.
 - Free: This refers to use of free()
 - Epoch: Epochs are sequential counters of active operations that also track
   disposals.

=== Internal Structures ===

All internal structures that can be referenced in more than 1 place are
reference counted. Currently there are only 2 such structures, and both are
related to storing values. These are only disposed of when their ref count
reaches 0. Once a ref count reaches 0 nothing is allowed to bring it back up
above 0. If an operation encounters a situation where it is asked to add a
reference to a structure with a 0 ref count it will return a transaction error.

Every internal structure has a specific level. At this time it looks like this:

    Dictionary->Set->Slot->[Tree Nodes]-+------------------>xtrn->{key}
                                        +->(USRef)->(SRef)->xtrn->{value}

    [] Tree nodes can be nested within eachother, but are never cyclical.
    () USRef and SRef are reference counted

As the operation delves into the dictionary, it always copies the pointer to
the item into which it is about to descend, from that point forward it always
uses its own copy of the pointer. This means if another operation 'disposes' of
that pointer and replaces it with a new one, the operation will continue to use
the original one it had.

=== Operations ===

Every operation in the API (see gsd_dict_api.h) will start by joining the
current epoch. When the api call completes it will finish by leaving the epoch
that it joined (which may no longer be current). Joining an epoch means
incrimenting the 'active' counter within that epoch, leaving means decrementing
it. Changing the 'active' counter is done with an atomic operation.

=== Disposal ===

When an operation decides a structure is no longer needed, it will use an
atomic swap to update the pointer that references it, once this is done no new
operation will be able to reach the structure. The structure is then disposed
of. Disposal of a structure adds a pointer for that structure to an epoch.

=== When is the garbage freed? ===

The 'active' counter of an epoch has 2 special values. '0' means the epoch is
completely free. '1' means the epoch has ended, no new operations can
join it. When an operations joins an epoch at '0' it will set it to '2'
(atomic). If an operations exit from an epoch results in an 'active' count of
'1', that operation knows that it is the last thread in the epoch, and that no
other threads can join, it is now safe to free any garbage within the epoch.

=== How do we ensure that an epoch eventually ends? ==

No operations on the dictionary can run forever. If an operation was defined
that did run forever, it would have to be written to release and re-aquire
epochs.

=== What triggers a disposal? ===

A disposal will occur any time a tree needs to be rebalanced. Rebalances can
occur for any of the following operations:

 * Dereferencing a key
 * Inserting a key
 * reconfiguring the dictionary
 * Referencing will insert up to 2 keys if they don't already exist, it also
   will usually result in an sref being disposed. This means as many as 3
   epochs may be needed for a reference operation, which means 4 epochs will be
   needed in total.

Disposal also sometimes occurs when you dereference or delete keys.

About

Threaded 'Dictionary' type in C for the GSD Programming Language

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published