Skip to content

quietfanatic/ltm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ltm: Little Tree Matcher
A versatile tree-based regex engine.

By quietfanatic, under the terms of the Artistic License.

Under Construction. :)

This document is more of a self-reference than a guide for (non-existent)
users.  Several things will be (and already are) out of date.





Okay, here's what's going on.

To start matching, you need an MSpec tree.  Use the Create_<nodetype>()
functions to make one.  Use the destroy_mspec() function to recursively free
any MSpec tree.

Each MSpec object has a particular type, such as a character, or group, or
alternation.  When matched to a string, the MSpec node will produce a Match
node, usually of the same type as the MSpec node but not always.  Most
notably, a node that fails to match will usually produce a NoMatch node.


Nodes are generally named mixed case, but to produce the integer id of a node
type, use the name in all caps.  For example, MCharClass describes a character
class node, but MCHARCLASS is the id of a MCharClass.



Here are the node types, with any required arguments to their constructor:


NoMatch
	The only match type not beginning with M.  A Match of this type indicates
	failure to match, and an MSpec of this type will never succeed in matching.

MNull
	This is a zero-width match.  It always succeeds in matching.

MBegin
	Zero-width.  Matches at the beginning of the string.

MEnd
	Zero-width.  Matches at the end of the string.

MAny
	Matches any one character.  The only time it'll ever fail is at the end of
	the string.

MChar (MChar_t c)
	Matches one character if it's equal to c.

MCharClass (int nranges, MSpecCharRange ranges...)
	Matches one character in a class.  The class is specified by multiple
	MSpecCharRange structs, each of which has a MChar_t .from and MChar_t .to.
	A range with the same .from and .to will match only that character.

MGroup (int nelements, MSpec elements...)
	Matches its elements in a sequence. one after the other.  If one fails,
	it'll backtrack through the ones before it to see if there are any other
	ways they can match.

MOpt (MSpec possible)
	An optional match.  Matches possible if possible, otherwise produces MNull
	rather than failing.

MAlt (int nalts, MSpec alts...)
	Goes through alts in order and matches the first one that matches.
	Backtracking will make it try the next one, and so forth.

MRepMax (MSpec child, int min, int max)
	Matches child at least min times, and at most max times.  Prefers to match
	more rather than less, but will match less if backtracked into.

MRepMin (MSpec child, int min, int max)
	Same as MRepMax except it matches as little as possible, matching more only
	if backtracked into.  This and MRepMax will be consolidated, with a flag to
	differentiate them.

MScope (MSpec child)
	Creates a lexical scope for captures.  Any MCapture and MCaptureName nodes
	that are matched inside this will be directly accessible from here (by
	number or by name, respectively).  The MSpec object will know how many
	captures it contains, and captures that are not matched will be NULL.
	This node will likely be placed automatically by whatever regex language
	you're running.  If the Scope is not the outermost node or is not in a 
	Capture node, it may be difficult to access.

MCap (MSpec child)
	Captures the child match for quick access.  It'll be numbered by its
	position in the tree of the MScope that contains it.  Since a capture can
	be matched multiple times (due to being in a repetition), the MScope will
	see an array of captures.  A capture not in a scope will generate an error.

MNameCap (MSpec child, char* name)
	Captures the child match, but assigns it a name instead of a number.
	Multiple captures with the same name will be arrayed together.

MMultiCap
	This is a special node for storing the results of captures matched multiple
	times.  You can't create it yourself.

MRef
	Pointer to another MSpec object.  This won't be descended into during
	destruction.  This allows for recursion and referencing other rules without
	problems.  For versatility this is another node type rather than a flag.
	It won't produce its own Match node, it'll just collapse instead.

...UNIMPLEMENTED...

MObject (MObjectSpec* object)
	This is an extensible type for your own matching needs.  Uses a struct
	filled with function pointers implementing matching behavior.  Details TBD.

About

Little Tree Matcher

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages