Skip to content

Simple NuPIC C++ Spatial Pooler with Sparse Random Data Sets, Simple Classifier, and Random Noise

Notifications You must be signed in to change notification settings

pluto-skaalhelarsen/ClassifiedNoise

Repository files navigation

Simple NuPIC C++ Spatial Pooler with Sparse Random Data Sets, Simple Classifier, and Random Noise

This is another extension of the Numenta NuPIC C++ core library example code. It uses the Spatial Pooler (SP) only, and offers several ways of perturbing the SP and displaying the results. A (very) simple classifier estimates the probability that the final data sequence matches the learned sequence. A simple pseudo-random noise source can optionally add non-zero bits to the final data sequence as directed via a command line adjustable noise level.

The NuPIC SP parameters can be read from a text file, which is optionally named on the command line. By pre-constructing a variety of parameter files, the command line can be used to construct random input data sets of various sizes, probe the learning progress of the SP, and examine the robustness of the resulting network in the presence of added random noise at various levels.

The Hierarchical Temporal Memory (HTM) SP is the topic of a recent paper from a research group at Rochester Institute of Technology:

J. Mnatzaganian, E. Fokoue, and D. Kudithipudi, "A Mathematical Formalization of Hierarchical Temporal Memory Cortical Learning Algorithm's Spatial Pooler," arXiv preprint arXiv:1601.06116, 2016.

It's a paper that many people interested in the HTM SP will read and possibly pursue related experimentation with the NuPIC parameters. The ClassifiedNoise code provides a convenient way to store many SP configurations and access them from the program invocation command. A brief discussion of the mapping between the parameters discussed in the paper and the NuPIC SP code is present in this directory in the PDF file RitComment. The LaTex typesetting of mathematical symbols and Greek letters is more convenient in PDF format.

NuPIC is an open source project providing a computer model of algorithms originating from analysis of the neurobiology of the mammalian neocortex. For more information, see numenta.org. Although much of the research at Numenta is done in Python, the core algorithms are available in a C++ implementation.  This code links to the NuPIC libraries _algorithms.so and _engine_internal.so available at numenta.org. If you are looking for a simple way to access this development environment, you might peruse the instructions titled "Cheater's Guide..." on this site.My environment is Linux with Gnu C++ 4.8 or higher, but others also work. See the numenta.org documents.

The ClassifiedNoise program is not a user application. It is a programmer's tool. Although minimal programming skill is required to compile and run the program and experiment with the HTM concepts, the primary audience is programmers, who will also experiment with it "as is" before hacking at the interesting bits themselves. As such, the code is somewhat oversimplified and obvious, specifically to make it more malleable. If the admittedly primitive noise source is inadequate for your purposes, drop in one of the many rng devices out there, set it up to delete as well as add, and hack away. That's my attitude. I do it, and have many different versions. This is the generic, transparent version.

That said, it is pretty simple, really. There's a makefile named ClassifiedNoise.m (make -f ClassifiedNoise.m) that is set up for the environment described in the Cheater's Guide. That environment is also simple and is essentially a Python virtual env, only I just use it as a way of setting the library path, etc. If you've set that up, there is a tiny script named Setup.sh (source Setup.sh) that activates the virtual env in preparation for the make. The venv also helps the Linux loader find the Numenta shared libraries in a local directory.

Overview of the Program Usage

When the ClassifiedNoise starts up, it creates an SP from the parameters in a param file, or defaults if you didn't name one on the command line. It goes into a loop for a count of Epochs invoking the SP compute function with learning enabled each pass. The data source is a buffer of 10000 points generated by a pseudo-random number generator. On each pass the input buffer is filled with a new set of random "bits". If it was free running and all the input data was random it would create the classic learning chaos where the large changes in the net take place throughout the learning cycle. Maybe this is something you want to watch. It's interesting, but there is another mode that more structured.

The random input is initialized from a seed number, and if the seed is used to reinitialize the rng while the Epochs loop is running, then the input sequence repeats. This provides a simple way to create input pattern sequences of various sizes. If you re-seed the rng every n passes through the loop, for all the Epochs of the learning cycle, then the SP has been presented with the same sequence of random patterns in the same sequence for that entire cycle. It learns them, and the SP output eventually stabilizes to some repeated sequence of outputs (or does it always...Ohhh, under what conditions?) which can then be used to recognize those patterns. Hence the name.

So, you can set both the Epochs and the RngSequenceCount from the command line. If RngSequenceCount = Epochs, you get one long chaotic learning cycle. If RngSequenceCount is much smaller than Epochs you get many repeated sequences of the same random patterns. The number of patterns in the set is RngSequenceCount. If you don't really care what the pattern is, random is good, then you can experiment with a lot of variations quickly with this technique.

At the end of the learning cycle is one more pass through the SP compute cycle for each pattern in the set, that is, it runs through once for each pattern with the SP leaning disabled. The SP output on each pass is collected as a learned representation of the random pattern set. One final pass over the pattern set is made executing the SP compute step, this time running each SP output through a simple classifier that  computes the degree of match between that output and the stored pattern set. That match-ness can be printed as a real number loosely indicating confidence in the best match it finds.

It is a very interesting exercise to have this learning cycle available to probe around in the running code and adjust the cogs and wheels, but really, that final classifier just compared the output of the final pass to the output of the next to final pass. Not a challenging test, but, an opportunity to look at the noise immunity of the SP. So, when the input buffer for the SP is set up on that final pass, random noise is introduced into the input buffer. This can be disabled by setting the NoiseLevel=0 on the command line. If the noise level is greater than zero, that many bits are randomly added to the SP input buffer before it calculates the output used by the classifier.

It can tolerate quite a bit of noise and still successfully classify a fairly small set. How much noise is dependent on the size and complexity of the pattern set, so that relationship can be interesting, too. What happens is that the SP output doesn't change at all for low levels of noise, then a few bits here and there do change. You can easily see that happen by setting command line flags to display the final classification pass as a summary showing the locations of set bits, or as a dump of the actual SP output bit pattern itself.

For a more detailed look at the changing SP outputs for each pattern during the learning cycle, you can use a command line flag to --display_while_learning. The options available on the command line are only a very basic set. The C++ class that processes them is pretty easy to add to and the places in the main program where these are used stand out, since the command line string is named when the value is retrieved. The param file code is very similar, if you want to add parameters that ae not strictly part of the SP initialization.

Command Line Usage

There is a text file named Test.par that is a parameter file with the normal default parameters for the Spatial Pooler constructor. Look at the file SpParamFile.cpp in the routine InitTables() to see the names of the fields in the text file (they are the same as the NuPIC variable names) and their local default values. I typically copy and rename several param files and change parameters in each, then pull them into the program using --SpParams=Test.par on the command line.

The number of compute iterations in the learning cycle is called the learning epochs and is set from the command line as: --Epochs=500 or whatever. A large, complex pattern set might require several thousand iterations.

Different random number generator seeds generate different random pattern sets, so setting the seed on the command line: --RngSeed=6907 creates a repeatable pattern set.

As mentioned above, the input pattern set is just a sub sequence of the rng output. The number of input buffers in the set is controlled from the command line with: --RngSequenceCount=6 or whatever. the larger this count, the more random patterns in your set.

A typical usage is:

./ClassifiedNoise --sp_summary --Epochs=500 --RngSeed=6907 --RngSequenceCount=6 --NoiseLevel=200 --display_while_learning --classify


Sci-Fi Movie Smoking Gun

We are looking at the possibility of an unsupervised learning machine that can, in the presence of noise, identify a pattern, but then it'll have to shoot you. That pattern is classified.


About

Simple NuPIC C++ Spatial Pooler with Sparse Random Data Sets, Simple Classifier, and Random Noise

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published