This is a rough draft and a work in progress, if/when I get through the
first pass then I will go back and prune and tweak and re-write.

This is my implementation of the Ridiculously Simple Computer.  The
instruction set is defined here:

I dont know how many times I have said this but I think there is a lot
that can be taught and learned using this architecture.  Assembly language,
machine language, instruction set simulator, watching a program run inside
a processor, whatever else I can think of...

A prerequisite for understanding and using this material is a working
knowledge of the C programming language.  If you have no programming
experience at all, that is definitely required, I am not teaching you
programming concepts, just another language.  Perhaps start with Python
the Hard Way by Zed A. Shaw, it isnt really hard or the hard way the
key to it is you are actually learning to program by typing the programs
in first, then work out the typos and mistakes you made doing it which
we all do, then when have actually typed it in right, compiled and run
it to match the output in the book, then you are told what the program
does and way.  You are spoon fed through a path of learning the basic
concepts of the language into more complicated things.  Other books
will lead you along the same path, but they expect you to invent the
programs yourself.  Learn Python is his well known book but it appears
he has a Learn C book as well in some state of completion.

Now that you know how to program in C we can move forward.

First, some terminology.  Some of these terms have different definitions
depending on who you are talking to, which can make it difficult at times.

Instruction Set Architecture, sometimes abbreviated ISA, there are other
things called ISA so dont get this confused.  Also sometimes just called
instruction set.  It is the set of instructions that a processor knows
how to execute.  These instructions are also called machine language
or machine code.  The processor reads bits/bytes from memory, each
instruction has a specific bit pattern also called an opcode that tells
the processor what you want it to do.  When you place specific instructions
in a specific order you can get the processor to do complicated things
using sequences of simple instructions.

Machine language or machine code.  This is your program, sequences of
processor instructions, in binary form.  A bunch of bits that are hard
for humans to read but easy for the processor.

Assembly language.  Assembly language is the programming language that
you use to create the machine code.  Like other programming languages
it is text, written using a text editor.  Unlike other programming
languages though, each instruction set can an often does have a
different assembly language than some other programming language. ARM
assembly language although similar in some respects is different from
x86 assembly languat or mips assembly language, etc.  All three have
an add instruction for example and the word "add" is in the instruction
but the syntax can and will vary from one processor family to another.
Since the machine code is the goal, and assembly language is just a way
for us to program at this low level in a human readable/writeable form
it does happen from time to time that an assembly language may change
or there may be more than one assembly language for an instruction set.
Usually the company or individual that creates the processor and at
the same time creates the instruction set which means they define the
machine code, each instruction, for that processor.  They tend to also
create the first assembly language definition for that processor.  The
document used to describe the instruction set will often contain both
the assembly language, words like and and xor and things like that along
with register names, etc.  Also it will often contain the bit definitions
for the machine code.  Some companies are better than others.  Also
in order to sell these chips they often create or have someone create
an assembler (see below) for this instruction set.  But, so long as
you create the right machine code and follow the rules for the processor
there is no reason why you cant make your own assembly language for
a particular processor.  x86 is a very well known example, intel created
the processor.  Microsoft was at first well known for being the assembler
that most people used, no doubt intel had their own, but I bet it was
pricy.  Borland also had an assembler, turbo assembler.  Microsofts
masm and Borlands turbo assembler used very similar syntax but not exact.
Today the gnu tools dominate, and the gnu folks completely messed up the
intel x86 assembly language.  This is known as the at&t syntax where the
classic x86 assembly language is known as intel syntax.  nasm is a
popular assembler that honors the real intel assembly language where
gnu as defaults to using at&t.

Assembler.  This word is a bit tricky, I like to try to use this word
when I am talking about the program that takes assembly language and
converts it to machine code.  Like a compiler, but the term compiler
is for programming languages that are higher level, not a one to one
relationship between the line of code or operation and the machine
code instruction.  Where the confusion comes is the word assembler is
also often used when referring to the assembly language itself.  So when
you hear folks say, you might have to look at the assembler for that
program, or some of that was written in assembler.  They mean the
programming language, assembly language, not the program that reads and
converts assembly language to machine code.  Now just like compilers
for example C compilers, may have compiler specific directives that you
can put in the code, assembly language has language items that are
specific to the assembler to make the programming job easier or to
be more exact about what you want the assembler to do.

Macro.  You may from time to time come across the words macro assembler.
All that means is the assembler and the assembly language it accepts
provides a mechanism to make macros.  Very similar to the macros you find
in the C programming language (assembly language came first naturally,
but if you are reading this you learned C first).  Macros, as with C
have a function like feel but are inline, so using macros in your
assembly language you can create instruction sequences that you may wish
to repeat in your code and 1) not have to type them every time and 2)
if you want to change the sequence you dont have to change it all the
places you used it, change it in one place.

Instruction set simulator.  This is generally a program, software,
that takes the bits and bytes of machine code for a particular processor,
and like that processor exctracts the machine code instructions and
executes them.  Using variables or arrays or whatever it pretends
to have the same registers as the real processor, execute the instrutions
like the real processor but the instruction set simulator software may
not actually be written for or compile to run on the same processor
that it is simulating.  This will make more sense when we get to it.
Two main reaons for instruction set simulators are one to allow you
to run programs that have been compiled for that processor or more
specifically a platform.  For example a Gameboy Advance simulator or
a Play Station simulator, etc.  Think about or go look at the mame
project, that project contains many different instruction set simulators
and the goal when writing those simulators was to run fast, before
our desktop and laptop computers were as fast as they are today you
wanted to play these games and have them run the same speed as the
arcade.  An instruction set simulator can also be used for virtual
machines, on Linux you can run Windows on qemu for example, you can
run a Linux compiled for the ARM instruction set on an x86 computer
using a qemu for the ARM instruction set for example.  Now on a tangent
you may find that many virtual machines dont simulate every instruction
but instead use features of the instruction set to let a fair percentage
of the instructions run on the real processor for that machine, then
when the virtual machine say wants to send a network packet and talks
to what it thinks is the network card, then the emulator comes in
and pretends to be a network card such that the operating system
on the virtual machine doesnt know the difference.  An instruction
set simulator strives to resemble the hardware to the point that
the program doesnt know that it is not the real hardware.  How does
a program "know" something like this?  It doesnt, what I mean by that
is the program will crash or otherwise not run correctly if the
instruction set simulator is not close enough to the real thing.

Enough terminology for now.  In the processor world there are times more
in the past than in the present where the core of the processor is
designed (the part that actually parses and runs the machine code)
one time and that is the processor everyone uses for that instruction
set, no variations.  Take the 6502 for example, at least originally,
my understanding is you created those processors by creating all of the
silicon and metal layers in the chip by hand, all of the polygons on
a drafting table.  Very much like the way a coin is created by an
artist at a much larger scale than the actual coin, and from that model
of the coin a machine is used to create the master dies that are used
to stamp the coins at the proper size.  The old processors you made
by hand what was essentially the master print or negative used in the
photographic like process used to make a processor.  With that much work
involved if you then wanted to take the generic 6502 processor and
then create the processor in the Vic-20 or in the Commodore 64 you
would probably just re-use the same drawings and add more stuff
around it and not actually re-invent a clone of that original processor

We do nothing of the sort today.  Hardware, logic, is design using
programming languages very similar to and inspired by the software
programming languages we used today.  To the point that there are
the equivalent of compilers and linkers that compile that high level
language down into assembly like fundamental logic gates or blocks
and then different modules are glued together very much the way a
linker glues together objects created by a compiler.  Just like the
C programming language can be compiled down into any number of
different assembly languages, the same source into many different
implementations.  The hardware design languages can be compiled down
into different mixtures of logic blocks depending on the target.  First
off you have programmable logic like cplds and fpgas, these are chips
that are filled with various fundamental logic blocks and arranged with
a large network of interconnects, what makes them programmable is the
interconnects can be changed temporarily or permanantly.  If I want
to xor two bits together and have the result feed into some other
logic block then the tools for that target knows how to take the high
level hardware description langauge and describe the connections between
logic blocks, then it is a matter of having a utility program all of those
little connections in the programmable device.

The chips we most often see today are not programmable logic, they are
built from the ground up to serve a specific purpose, to implement a
single design.  If you were to build a car from scratch you are either
going to have to invent every little thing yourself, an engine, the
pistons in the engine the header, cam, lifters, etc.  Or you might just
buy an existing engine built by someone, and design your frame so that
engine fits in it.  Likewise the transmission the drive shaft the
gearboxes, brakes, etc.  So if you are a company like intel that builds
the machinery and factories housing that machinery from the ground up to
make chips, well you are basically designing your own engines and
transmissions to go in your car.  But most companies hire an intel
or some other chip maker to make the chips for them.  Each chip maker
has developed a cell library, a collection of different logic blocks
very much like machine code.  The hardware description language is then
converted into basically lists of connections between the inputs and
outputs of these logic blocks.  And just like we have assembly language
to represent machine code, there are human readable ways to represent
the cell library items for a particular foundry (chip factory).

Just like we dont have to create newspapers by taking, by hand, little
metal letters and arranging them in rows of words to stamp the page,
we use computers to create the light and dark spots on a master that
becomes the letters on a printed page, we use computers to arrange
the massive rats nest of wires and logic blocks on a chip, just like
a professionally made magazine or other printed material, sometimes
some hand tweaking is required to make the thing perfect.

So what was all of that about?  What that was all about is that many
of the processors you write programs for today are 1) created using
a programming language 2) that language is compiled down to create
a chip using the generation of technology available at the foundrys
at the time this chip was made (and remain available so long as that
chip is still profitable enough to stay in production). 3) (here it comes)
if that processor is successful enough to warrant new processors then
either using the same hardware source code or the same code with
features added (new instructions, bugs fixed, etc) may be used to
create the new generation of that processor or 4) enough changes are
made or completely new source code is created such that it is similar
enough to the prior processor to execute most of the same instructions but
perhaps does it in a more efficient or different way for some reason
(speed, power, etc).  5) those new processors made a year or a few years
later may be implemented using newer generations of chip technology
making them perform faster for example.  Put all of that together and
just like the evolution of a popular program like Microsoft Word
it may open documents saved by prior versions, but it also has new stuff
that is not reverse compatible.  You will find most popular instruction
sets evolve, new instructions, new features to old instructions, old
undesireable instructions or features removed possibly.  And the
assembly language and assemblers have to choose to either evolve to
handle everything frome the old to the new, or chose to draw a line
in the sand and one side is one tool the other side is another.

When you learn an assembly language that you didnt know before, as often
as not, you are going to have to be aware that some of the instructions
may not work on the processor you are trying to write a program for.
There is no governing body to define the rules for assembly languages
much less the documentation.  Each processor inventor and/or chip company
has its own documentation, sometimes good, sometimes bad and often
somewhere in the middle.   You may wonder how a chip with bad docs
actually survives in the market, but they can and do.

When reading a instruction set reference manual, you want to be looking
for a few things.  You want to obviously be paying attention to the
syntax of the assembly language, that is your primary learning
exercise when picking up this manual.  But also if there are bit
definiitions for the machine code you want to be paying attention to
that, you should not have to go to and ask the question
why is it that on x86 I can load a 32 bit register with a 32 bit
number like 0x12345678 in one instruction, but on mips I can only load
16 bits in a single instruction and on arm only 8 bits in a single
instruction?  Why can this jump instruction only jump 128 bytes but
this other jump can jump anywhere in memory?   The answers to all of
those questions and more are and always have been, right there in
the processor vendors documentation.  If you are using some web page
and not the processor vendors documentation, go get the processor
vendors documentation.  If the processor vendors documentation only
has the syntax for the assembly language and not the machine code
definition, you might want to find another processor to use, this is
going to be more painful than normal.

Now let us stretch this a bit more.  What about this RiSC16 instruction
set, do a little googling and you will find a number of places that
use it in their computer science or engineering colleges.  mips or
dlx also are seen far and wide.  And some of those classes your job
is to either create an instruction set simulator or create the hardware
design using a hardware design language.  Now what does it mean to you
if several thousand students every year take the same definition of
something and are then sent off to create a program that matches or
meats that definition?   Sure many are going to fail to get it right,
that is a given, but even of the ones that did make a program that
perfectly matches the instruction set definition, most of those
implementations are going to be different, sometimes wildly different.
I have created my own RiSC16 simulator and hardware description language
based on the instruction set as defined on the link at the top of this
document.  If you read down into that page and follow the code and
links provided you will see that for such a simple instruction set
considerable work has been put into making programs and a logic with
caches, branch prediciton and all kinds of stuff that seems extremen
for something that on the surface is so simple.  Most instruction sets
that have that level of complexity have hundreds of instructions.  My
implementation is meant to be easy to use, easy to follow, something
that you might learn something from.  There is no interest in speed
as I dont expect you to actually write complicated programs that
would warrant such speed, I expect you to learn some fundamentals
here then take that knowledge on to the next instruction set and the
next and the next.  I firmly believe you should learn a few
instruction sets if you are going to bother to learn any.  The secon
and third and Nth are significantly easier than the first as the
concepts and even syntax can be very similar from one to the next.  just
like learning new programming langauges the basics of programming doesnt
change with a new laguage, more often than not it is learning a new

If and when you see my implementation of this processor, understand that
first as a profession I do not do the hardware design langauge for chips.
As a profession I do look at a lot of that code from other people, debug
that code, sometimes propose fixes to that code, but I write software
to test that code or write code to boot that processor or drivers to
initialize peripherals within or around that processor.  These github
projects allow me to explore, with you, my own designs based on what I
have learned from others.  And the more important issue is that when
you see this implementation, you see these waveforms representing all
of the interconnect signals, and the bits in registers, etc, every
implementation is going to look different inside, you might get used
to this one and then look at another running the same program and not
have a clue as to what you are looking at, until you take the time to
understand what you are looking at.  Same goes for my instruction set
simulator.  Many of the instruction set simulators I come across are
trying to be fast or clever or both, and as a result can be very
unreadable.  I hope that this one and others of mine that you may
find are, in fact, readable.

More definitions:

Register.  When used in the context of assembly language a register
is very much like a variable in C or other programming languages.  A
difference though is that there are often a fixed number of registers
some may have specific rules or limitations, and very often the names
for them have been chosen.  So you have to reuse these variable like
registers with names like r0, r1, r2...In a more general sense the
term register is used a lot in programming.  For example a video
card might have a register than holds the value of the brightness
level being output.  By writing a new value to that register you can
change the brighness.  Another register might be the contrast.  Or perhaps
it may take a number of registers to describe the brightness and/or
contrast output by a video card.  These latter types of registers are
like the ones in a processor core, a chunk of bits somewhere that hold
something, but this latter type is usually accessed by reading or
writing a particular memory location.  The former type the type we
are going to focus on when programming in assembly langauge, these
have an intimate relationship with the processor instructions themselves
in fact an instruction set relies heavily on the registers in that

My understanding and implementation of the RiSC16 instruction set.

This processor has 8 so called general purpose registers.  One is
actually special so 7 of them are general purpose.  All of the
instructions in this instruction set rely on and operate on these

My shorthand reference for this instruction set is as follows:

000aaabbb0000ccc add ra,rb,rc       ra = rb + rc
001aaabbbsssssss addi ra,rb,simm    ra = rb + simm
010aaabbb0000ccc nand ra,rb,rc      ra = ~(rb&rc)
011aaaiiiiiiiiii lui ra,imm         ra=imm<<6
100aaabbbsssssss sw ra,[rb+simm]
101aaabbbsssssss lw ra,[rb+simm]
110aaabbbsssssss beq ra,rb,simm
111aaabbb0000000 jalr ra,rb         ra = pc; j [rb]
1111111111111111 halt

If it doesnt make sense right off, dont worry we are going to go through
each of these in detail.

Understand that this processor, with 8 instructions is not the norm.
This processor is simple in the sense that you can easily wrap your head
around all of its instructions, all that it does, to make useful programs
though you as a programmer have to work harder than you would on other
processors.  A goal here is to worry less about what you cant easily
do with this processor and instead focus on what you can do and how
you go about understanding what and why there are rules and limitations
to each instruction.

When the instruction set is laid out as I have shown it above you should
very quickly notice that with one exception you can tell what instruction
it is by looking at the top three bits.  This certainly makes it easier
for everyone to figure out what instruction they are looking at when
presented with a bunch of bits.  The one exception is something I added
to the RiSC16 defined by Professor Jacob.  Some processors will have
a halt instruction, but in general a processors job is to run forever
so long as the power is on.  The processors with halt instructions are
often microcontrollers and the halt is a temporary state, basically go
to sleep and consume very little power until I wake you up, then
wake up fast do a few things and go back to sleep.  For example your
television remote control, in order to prevent the batteries from
having to be replaced daily or weekly, the electronics in a device like
that use very very little power when in sleep mode, then they wake up
still using little power do a quick task then go back to sleep.  Battery
life being a primary design requirement across the board.  I have a
halt instruction because this processor is for educational purposes
using a simulator, write a small amount of code, end with a halt, look
at the output of the simulator.  The simulator certainly can be left
running a program forever or for a long time if you have a program that
you want to run that way, not a problem.  But many of the examples will
rely on the halt to end the example and allow the output to be examined

The second thing you might assume upon first inspection, and with
experience right away is that this instruction set appears to be "fixed
word length".  Fixed word length means the length of the instructions
as measured in number of bits, is the same for all the instructions
in the instruction set.  All 9 instructions are exactly 16 bits no
more no less.  Because it allows for simpler logic and a more deterministic
nature the relatively modern risc processors tend to use fixed word length
instruction sets.  Not all, and not all all the time, but compared to the
older processors like the 6502 and 8086 which are considered cisc, risc
leans towards fixed.  Variable word length instruction sets are found
in processors like the 6502 and 8086 and many others.  Variable word
length means that some instructions use more bits than others, some might
be 8 bits and some might be as many as 81 bits (9 bytes) or more in
modern 64 bit cisc processors for a single instruction.  A fixed word
length processor knows what it needs to do when it reads and decodes
that single instruction, it does not have to go back out and fetch or
wait for the rest of the instruction to arrive, it is all there.  A drawback
would be that many simple instructions you might want to have dont
need all of those bits and you are wasting space and bandwidth for the
simpler instructions.  Variable word length instructions you ideally
want to make the commonly used instructions shorter and the less commonly
used instructions longer.  Many times the additional length is for other
reasons as we will see shortly with the addi instruction.

With experience, another thing you notice at first glance is that this
RiSC16 instruction set uses some registers with the conditional branch.
The thought is, this is MIPS like. (MIPS is another, well known,
instruction set).  Upon further study you see that indeed the comparison
and the conditional branch are done by the same instruction.  The more
popular way to do this is to have alu functions set flags, including a
compare instruction, which is a subtract that does not save the result
only updates the flags.  Then the conditional branch instruction(s) uses
the flags set by some prior instruction.  For the RiSC16 the beq
instruciton compares the contents of the two registers.  If they match
then the branch happens, if they do not match then the branch does not
happen.  So now you ask yourself, if the branch is mips like, what else
is mips like?  Is there a delay slot after the branch and what about r0?
In this case, RiSC16, there is not a branch delay slot.  For pipelined
processors (pipelining allows for higher execution performance) when
a branch happens the pipeline needs to be flushed and re-filled, this
costs clock cycles, a branch delay slot or slots means that one or
some of the instructions after the branch will be executed, recovering
the cost of some of those lost clock cycles.  Typically you would
arrange instructions so that one of the last things you would have done
before performing the branch (that does not affect the branch) is
placed after the branch.  If you dont have an instruction you can move
there then a nop is used, and you basically lose the clock cycle anyway.
So the RiSC16 does not have a delay slot, but r0 is mips like.  What
that means is r0 is special, the contents of r0 is always the value
zero (0x0000).  You can read/use r0 wherever you want but if it is the
destination register in an instruction, its contents are not changed,
it is always zero. As a programmer you could have done this with any
register on your own, no need for hardware to make one zero.  Processors
like this one are designed such that you need a register to be zero to
do useful things, so might as well force one.  Again it wasnt required
to force one as a programmer you could have set one and left it or
set a register to zero when you needed it.  In the spirit of the RiSC16
instruction set I have also forced r0 to zero in my implementations.
Having a register contain zero makes a few of the instructions more
powerful.  For example add, if one of the two operand regisers is
a zero then the add becomes a move instruction, move the contents
from one to the other.  If both operands are zero then it becomes
move zero to the destination register.  If all three are zero then it
becomes a nop.

I am going to run through a reference for the instruction set, then
some observations, and then we will start talking about machine and
assembly language programming.  Between now and then the text will
have the feel that you understand assembly language, this will serve
as a reference once you do know assembly language.


000aaabbb0000ccc add ra,rb,rc       ra = rb + rc

ra, rb, and rc are any one of the 8 general purpose registers.  The
contents of rb and rc are added together and stored in register ra
(if ra is not r0).


add r1,r2,r3    r1 = r2 + r3
add r1,r2,r0    r1 = r2 + r0, since r0 = 0 : r1 = r2.
add r3,r0,r0    r3 = r0 + r0, since r0 = 0 : r3 = 0x0000


001aaabbbsssssss addi ra,rb,simm    ra = rb + simm

ra and rb are any one of the 8 general purpose registers, the contents
of rb and the immediate value are added together and the result is
stored in ra (if ra is not r0).

The immediate for addi, is a signed immediate, from the instruction
encoding we see there are 7 bits available to the immediate value,
being signed means sign extended, so whatever is in bit 6 is used
in bits 7 to 15.  for example:

001xxxxxx1xxxxxx instruction encoding
1111111111xxxxxx immediate value

001xxxxxx0xxxxxx instruction encoding
0000000000xxxxxx immediate value

001xxxxxx1000101 instruction encoding
1111111111000101 immediate value

001xxxxxx1000101 instruction encoding
0000000000100101 immediate value

So the valid immediate values are 0x0000 to 0x003F and 0xFFC0 to 0xFFFF
Seeing this encoding and a description for the immediate you should never
need to ask "Why cant I use an immediate value of 0x1234 with addi".
You now know the reason is because there is no way to encode that value
in the instruction.


addi r1,r2,0x0010    r1 = r2 + 0x0010
addi r1,r0,0x0034    r1 = r0 + 0x0034, since r0 = 0 : r1 = 0x0034


010aaabbb0000ccc nand ra,rb,rc      ra = ~(rb&rc)

This is a NOT AND instruction. ra, rb, and rc are any one of the
8 general purpose registers.  rb and rc are ANDed together (bit 0
ANDed with bit 0, bit 1 ANDed with bit 1, etc).  The result of that
AND operation is then inverted (bit 0 = NOT bit 0, bit 1 = NOT bit1,

nand r1,r2,r3  r1 = ~(r2&r3)
nand r5,r0,r7  r5 = ~(r0&r7) -> r5 = ~(0) = 0xFFFF


011aaaiiiiiiiiii lui ra,imm         ra=imm<<6

The lui instruction is a load upper immediate.  Since we need 3 bits
for the opcode, 3 bits for the destination register the largest immediate
we can encode is 10 bits.  This instruciton zeros the lower 6 bits and
places the immediate value encode in the instruction in the upper 10
bits of the specified register.  The valid immediate values that
can be encoded in the instruction bits are 0x000 to 0x3FF.  This also
means that valid immediate values are 0x0000, 0x0040, 0x0080, 0x0100,
0x0140, etc.  Any multiple of 0x40 between 0x0000 and 0xFFFF.

This docment and examples are going to use an assembly language where
you specify the immediate value you want.  For example if you want
0x0040 you put 0x0040 in the assembly language.  The instruction will
be encoded with a 0x001, you dont put 0x001 to get 0x0040 you put 0x0040
to get 0x0040.


lui r1,0x1200     r1 = 0x1200
lui r1,0xFFC0     r1 = 0xFFC0


100aaabbbsssssss sw ra,[rb+simm]

This instruction writes or stores a word to a location in memory.  The
myrisc16 implementation of memory addressing is in whole 16 bit words,
you cannot address a byte within a word.  Just like the addi instruction
the immediate value is a signed extended immediate value, the same rule
applies the hardware only allows immediates 0x0000 to 0x003F or 0xFFC0
to 0xFFFF.


sw r1,[r2+0x0034]  write r1 to address r2+0x0034


101aaabbbsssssss lw ra,[rb+simm]

Exactly like the store word instruction, but this is a load word it
reads from memory, not writes.  Everything else is the same, immediates
are limited to 0x0000 to 0x003F or 0xFFC0 to 0xFFFF.  Only 16 bit
words are addresssable, the 8 bit bytes within a word are not.


110aaabbbsssssss beq ra,rb,simm

This instruction compares the values in registers ra and rb, if equal
then the next instruction executed is at pc + 1 + simm.  Where pc is
the address of the beq instruciton in question.  If the contents of
ra and rb are not equal then the next instruction afer beq is
executed, no branch happens.

beq uses a sign extended immediate value like addi, sw, and lw.  So
the immediate values are limited to 0x0000 to 0x003F and 0xFFC0 to 0xFFFF.

Assemblers will generally allow you to use labels instead of having
to add up the number of instructions.


111aaabbb0000000 jalr ra,rb         ra = pc; j [rb]

Jump and link register.  First the instruction after the jalr instruction
in question, pc+1, is stored in the ra register (if ra is not r0).  Next
the program control branches to the address in the rb register.


1111111111111111 halt

This instruciton is not part of RiSC16, it is specific to myrisc16.
Note the top three bits are the opcode for JALR.  Note that bits 0 to
6 in the jalr instruction are zero.  This implies that if any of the
lower 7 bits are non-zero it is not really a jalr but an undefined
instruction.  myrisc16 uses one of those undefined instructions to
implement a halt instruction.  The halt instruciton causes the processor
to halt, to stop executing instructions.  This is atypical for processors
but it makes it much easier to demonstrate simple programs for teaching

As mentioned before risc processors tend to use fixed length instructions.
And the fixed length instructions tend to be the same width as the registers
and/or memory busses, etc.  Which means that you dont have enough bits
to both encode a load immediate instruciton and have a full registers
width of bits to put in that register.  The lui instruciton needs 3 bits
for the opcode and 3 bits for the destination register.  That leaves
us with only 10 maximum bits we can load.  Now what they could have done
is have had a load high and a load low, for example two other opcode
bits could have been used:

011aaa00iiiiiiii lh ra,imm     ra = (imm<<8) | (ra&0x00FF);
011aaa01iiiiiiii lhz ra,imm    ra = (imm<<8) | 0x0000
011aaa10iiiiiiii ll ra,imm     ra = imm | (ra&0xFF00);
011aaa11iiiiiiii llz ra,imm    ra = imm | 0x0000

So to load the value 0x4567 into a register using this fantasy
instruciton encoding

lhz ra,0x4500
ll ra,0x0067

The RiSC16 processor does not do that in that form, but it does it in
another form.  The lui instruciton allows you to set all of the bits
in a register, with the lower 6 always zero and the upper 10 whatever
you want.  And if you think about it if you have used lui to set the
upper 10 bits of a register, and the lower are zeros then you can use
addi with r0 as the register operand and addi can be used to modify
the lower 6 bits.  So using RiSC16 to load the value 0x4567 into a

lui ra,0x4540
addi ra,r0,0x0027

Variable word instruction sets often allow any bit pattern because they
often encode a full sized immediate as extra words in the instruction.
A 32 bit x86 move of some immediate value into a register might be one
to a few 8 bit bytes for the opcode indicating this is a load immediate
of size 32 bits into a specific register.  Then the instruction would
be followed by that 32 bit immediate, 4 bytes.

Why on earth would you have a nand instruction in a processor?  Dont
most have and instructions and not instructions separately?  Let's
look at some truth tables:

not c = not a
a    c
0    1
1    0

or  c = a or b
a b  c
0 0  0
0 1  1
1 0  1
1 1  1

and  c = a and b
a b  c
0 0  0
0 1  0
1 0  0
1 1  1

nand c = not ( a and b )
a b  c
0 0  1
0 1  1
1 0  1
1 1  0

nor c = not ( a or b )
a b  c
0 0  0
0 1  1
1 0  1
1 1  1

xor c = a xor b
a b  c
0 0  0
0 1  1
1 0  1
1 1  0

Using only nand truth tables we can easily implement not, first notice
in the and table that anything anded with itself is itself

and  c = a and b
a b  c
0 0  0
1 1  1

so to get c = not a we need

c = a nand a

a a  c
0 0  1
1 1  0

Using only nand truth tables you can implement xor logic (from wikipedia)

d = a nand b
e = a nand d
f = b nand d
c = e nand f

a b  d  e  f  c
0 0  1  1  1  0
0 1  1  1  0  1
1 0  1  0  1  1
1 1  0  1  1  0

c = a xor b

De Morgan says that  a or b = not ((not a) and (not b))

= not ((not a) and (not b))
= (not a) nand (not b)
= (a nand a) nand (b nand b)

d = a nand a
e = b nand b
c = d nand e

a b  d  e  c
0 0  1  1  0
0 1  1  0  1
1 0  0  1  1
1 1  0  0  1

c = a or b

So a nor operation then would be c = not ( a or b ).  Since we have
done c = a or b and c = not a we know

d = a nand a
e = b nand b
f = d nand e
c = f nand f

a b  d  e  f  c
0 0  1  1  0  1
0 1  1  0  1  0
1 0  0  1  1  0
1 1  0  0  1  0

c = a nor b

I think you get the idea, you can make any other logic operation using
nand operations.  (same is true for nor).  If you wander about wikipedia
you will see that for the various boolean operations they show you how
to implement that operation using nand gates.

The beq instruction uses a sign extended immediate, this allows you
to branch forward and backward but only by a limited amount.  Just
like the addi, sw, and lw instructions the sign extended immediate is
limited to 0x0000 to 0x003F or 0xFFC0 to 0xFFFF, no other values
are allowed.  So what if we encoded a 0x7F in to the simm bits?  That
would give 0xFFFF, which if you are fast with your twos complement
math you know that is a minus one (-1).  pc = pc + 1 + 0xFFFF means
pc = pc + 1 - 1 = pc.  An infinite loop.

So what are our actual branch limits?  Our immediate is limited to two
ranges 0x0000 to 0x003F and 0xFFC0 to 0xFFFF.  So we should try those
four limits and see what happens.  It is assumed that you understand
twos complement.  The value pc here is the address of the beq instruction

pc = pc + 1 + 0xFFC0 = pc + 0xFFC1 = pc - 0x003F
pc = pc + 1 + 0xFFFF = pc + 0x0000
pc = pc + 1 + 0x0000 = pc + 0x0001
pc = pc + 1 + 0x003F = pc + 0x0040

So we can go anywhere from 63 (0x3F) instructions backward and 64 (0x40)
forward relative to the beq instruction itself.

Now typically when programming in assembly language you dont normally
have to count instructions and set the immediate value, some assemblers
might not even let you set the immediate.  The use of labels is typical.

for example

  ;some code
  ;more code
  beq r1,r0,one  ; if r1 == r0 then branch to the first instruction after
                 ; the label one
  beq r1,r2,two
  ;some code
  ;more code

The assembler is keeping track of instructions and labels, the assembler
will then figure out the address for the beq and the address for the
destination.  The assembler will do the math and encode the right value
in the instruction for you.  If the computed value does not conform to
the sign extended immediate limitations then the assembler will give
you some flavor of error message.  Hopefully not to cryptic.

At this point I normally would say that I have created an instruction
set simulator and an assembler so that you can learn assembly language
for this instruction set without painful to compile tools.  This tutorial
started off telling you that you need to know C.  In part because C and
C syntax will be used to explain what the asm is doing, second because
you should be able to compile the the tool or tools, written in C, using
your C compiler of choice (within reason).  What I have not done, at least
not so far, is create an assembler.  At least not a traditional assembler
that takes ascii files and makes machine code.  I have borrowed a cool
way to use a C compiler as an assembler.

So what would normally look like this:

  addi r1,r0,0x0020
  nand r2,r0,r0
  add r1,r1,r2
  beq r1,r0,two
  beq r0,r0,one

Looks like this:


You can see that I dont need to write a parser, the C language compiler
does the parsing, we turn an asm instruction into a C function and
just pass it the same parameters.

The key to this assembler is the file tinyasm.c.  It takes advantage
of C macros that let you take whatever ascii is passed and shove that
into the next level C program that is created by the front end parser.
The registers are just variables that have been declared and initialized
for you.

As the code I borrowed from I start everything with a macro, but I have
the macro call a full function so that I can easily do more in the
function than dealing with the code as macros.  I can check for sign
extended immediate values for example.

So the add instruction add ra,rb,rc starts like this

#define add(ra,rb,rc)       do_add(ra,rb,rc)

And the full function is this

void do_add ( unsigned int ra, unsigned int rb, unsigned int rc )
        printf("do_add limit fail pc = %u \n",__pc__);

You can see this allows for some limit checking, unlike a real assembler
with a full parser, I am not able to do the complete job of syntax
checking, you can make mistakes and get away with stuff you normally
wouldnt with a real assembler.  The tradeoff here is that you can very
easily see the connection to machine code, and later will, if you wish,
create your own pseudo instructions.

From our reference material the add instruction looks like this

000aaabbb0000ccc add ra,rb,rc       ra = rb + rc

16 bits, upper three bits are zero.  Then nine other bits, three sets
of three, define the general purpose registers used in the instruciton.
The tinyasm magic will put the numbers 0-7 in the variables ra, rb, rc
passed to the do_add() function.

add r1,r2,r3

becomes this

int r1=1, r2=2, r3=3;

which is basically


Using the instruction definition

000aaabbb0000ccc add ra,rb,rc       ra = rb + rc

we need to make sure that the register numbers are limited to three
bits.  if ra was an 8 for example and we didnt do a check for it that
fourth bit would/could wander into our opcode field and change the
instruction from an add to an addi, that would be a problem.

A quick and dirty method that works fine with the macro would be to
simply and the incoming number with a 7, insuring it is only 3 significant
bits.  But I wanted at least some warnings and errors so I limit check
the incoming values, if above 7 then error, which means if it it gets
encoded the value is between 0 and 7;

The emit macro takes a word and emits it out into the instruciton stream
in the binary.  Most places used that word is a machine instruction.
So do_add when it calls emit() is converting from asm a list of register
numbers to machine code, bits in the right place for the opcode and
bits that indicate what registers are used as operands and the destination.


Now lets dive in and learn some assembly.  You will do the most for
yourself by typing these lessons in manually.  As you know from other
programming experience an important part of programming is correctly
typing in the language and debugging the code that you have written.
By cutting and pasting you lose that experience for code that you know
the expected result, and then experience it for code you are not sure

Lesson 0:  Building the tools.

The first tool of interest is the instruciton set simulator mr16sim.c.
I have tried to make the code portable, hopefully it compiles for you,
if not let me know.

If using the gnu C compiler

gcc -o mr16sim.exe -o mr26sim.c

mr16sim filename.csv

And now an example lesson.  Dont enter the lines starting with ----

---- lesson0.c
#include "tinyasm.c"


---- lesson0.c

gcc -o lesson0.exe lesson0.c

Pass 1 completed, starting output pass.
Assembly Succeeded.

./mr16sim.exe lesson0.csv
[0x0000] 0x6400 lui r1,0x000 (0x0000)
[0x0001] 0x2481 addi r1,r1,0x0001 (1)
[0x0002] 0x2481 addi r1,r1,0x0001 (1)
[0x0003] 0xFFFF halt
fetch_count 4
write_count 0
read_count 0

This lesson is not about what did these instructions do but about
making sure the tools work, if the tools dont work then stop here.

Lesson 1: LUI

---- lesson1.c
#include "tinyasm.c"


---- lesson1.c

When simulated the output is:

[0x0000] 0x6400 lui r1,0x000 (0x0000)
[0x0001] 0x6800 lui r2,0x000 (0x0000)
[0x0002] 0x66AF lui r1,0x2AF (0xABC0)
[0x0003] 0x7448 lui r5,0x048 (0x1200)
[0x0004] 0xFFFF halt
fetch_count 5
write_count 0
read_count 0

What does it do?

Lui stands for load upper immediate.  An immediate is a constant that
is encoded in the machine code instruction itself.  If you look at the
reference material above, the instruction is 16 bits.  The number of
bits used to figure out what instruction this is, the opcode, is 3 bits
and the number of bits in the instruction to indicate what register is
going to get the immediate is 3 bits.  So 16-3-3 = 10 bits left over.
Our registers are 16 bits wide so the biggest immediate we can have
is 10 bits.  The upper means load those 10 bits into the upper 10 bits
of the register.  The lui instruction sets the lower 6 bits to zero so
all 16 bits in the register are modified by this instruction.  No,
unfortunately there is no lli, load lower immediate.  We will learn in
the next lesson how to load all 16 bits of a register with a desired

Why do we need to load immediate values into registers?  Registers are
just like variables in a high level language, at some point before you
use those variables you have to put something in them be it using
constants or by reading from some input like reading from memory or
a file or user.  If the variable is passed into the function then somewhere
above that function there is ultimately a variable or variables that
are loaded from somewhere before being used.

A number of the instructions require that there are registers used as
operands so before we can use one of those instructions we must load
the operand registers with some value.

The simulator is showing us each time a register is being writte with
a value, you can see that the register writes both the register number
and the value line up with the instructions in the program.

Lesson 2: ADDI for immediates

---- lesson2a.c
#include "tinyasm.c"


---- lesson2a.c

When building the csv file you should see

do_addi limit fail pc = 3

So change that line to

---- lesson2b.c
#include "tinyasm.c"


---- lesson2b.c

When simulated the output is:
[0x0000] 0x6448 lui r1,0x048 (0x1200)
[0x0001] 0x24B4 addi r1,r1,0x0034 (52)
[0x0002] 0x6558 lui r1,0x158 (0x5600)
[0x0003] 0x24F8 addi r1,r1,0xFFF8 (-8)
[0x0004] 0x6559 lui r1,0x159 (0x5640)
[0x0005] 0x24B8 addi r1,r1,0x0038 (56)
[0x0006] 0xFFFF halt
fetch_count 7
write_count 0
read_count 0

What does it do?

Well for starters you can see with the first two instructions I am
placing the constant 0x1234 into r1.  We know that the lui instruction
sets the upper 10 bits to match the immediate and lower 6 bits to zero
(which also matches the immediate because the assembler complains
otherwise).  So if we think about the addi description in the reference
material above.  It adds the immediate value to the second register
and stores the result in the first register.  So after the first lui
where 0x1200 is in r1, now we are adding
which is
The constant we wanted in r1.

Now why didnt that work with 0x5678, why did the assembler complain for
starters?  A drawback to this tiny assembler is not being able to tell
you a line number (I imagine there is some C magic you can use to get
that to work) so it shows you the address in memory which is zero based
so the first lui is address 0 the first addi is address 1 and so on, the
offending addi is the addi(r1,r1,0x0078).  Why is that a problem?
Well the syntax enforced by this assembler is that you must show the
full sign extended constant.  Looking at the reference material for
the addi instruction it is a 7 bit immediate (16 - 3 bits for opcode -
3 bits for dest register - 3 bits for operand register = 7 bits left over).
The constant 0x0078 is 0b0000000001111000 if you count them up 0b1111000
is seven significant bits, so what is the problem?  This is a signed
integer, if the 7th bit is set we must have all the bits above set, if
the 7th bit is zero then all the upper bits must be zero.  For 0x0078
we have the 7th bit set but the upper bits zero, that is wrong so we
go ahead and sign extend right?  0b1111111111111000 = 0xFFF8 and try

Well that gives 0x5600+0xFFF8 = 0x55F8, that is not the 0x5678 we were
looking for.  But notice the third pair of instructions.  The constant
0x0038 = 0b0000000000111000, the lower 7 bits are 0b0111000 and the upper
bits match the 7th bit, so that is a valid constant.  If we look at
0x5640 0b0101011001000000 the lower 6 bits are zero, so that is a valid
lui constant.  And when we add 0x5640+0x0038 we get 0x5678, the desired
16 bit value we wanted in r1.

Although it would be nice to split constants on 8 bit boundaries

0x1234 -> 0x1200+0x0034
0x5678 -> 0x5600+0x0078
0xABCD -> 0xAB00+0x00CD

We really have to split them 10 on the left and 6 on the right

0x1234 -> 0b0001001000110100 -> 0b0001001000 110100 -> 0x1200+0x0034
0x5678 -> 0b0101011001111000 -> 0b0101011001 111000 -> 0x5640+0x0038
0xABCD -> 0b1010101111001101 -> 0b1010101111 001101 -> 0xABC0+0x000D

The left number so it is called here is the pattern used with lui and
the right number the pattern used with the addi that follows.

lui r1,0x1200
addi r1,r1,0x0034
lui r2,0x5640
addi r2,r2,0x0038
lui r3,0xABC0
addi r3,r3,0x000D

Lesson 3: ADDI for addition

---- lesson3.c
#include "tinyasm.c"


---- lesson3.c

When simulated the output is:

[0x0000] 0x6448 lui r1,0x048 (0x1200)
[0x0001] 0x68D0 lui r2,0x0D0 (0x3400)
[0x0002] 0x2C83 addi r3,r1,0x0003 (3)
[0x0003] 0x2D84 addi r3,r3,0x0004 (4)
[0x0004] 0x3105 addi r4,r2,0x0005 (5)
[0x0005] 0x3406 addi r5,r0,0x0006 (6)
[0x0006] 0x297F addi r2,r2,0xFFFF (-1)
[0x0007] 0x38FE addi r6,r1,0xFFFE (-2)
[0x0008] 0x24FD addi r1,r1,0xFFFD (-3)
[0x0009] 0xFFFF halt
fetch_count 10
write_count 0
read_count 0

What does it do?

001aaabbbsssssss addi ra,rb,simm    ra = rb + simm

addi ra,rb,simm means

ra = rb + simm where ra and rb are registers and simm is a signed
immediate.  The signed immediate is 7 bits which implies 0x00 to 0x7F
but the most significant bit is sign extended which means that the
possible range of values are 0x0000 to 0x003F and 0xFFC0 to 0xFFFF.
Viewed as unsigned numbers that is 0 to 63 or 65417 to 65535, viewed
as signed numbers 0 to 63 or -64 to -1.

The example starts by using lui to put some values in a couple of registers
we should not assume that the registers already have some value we need
to put values in them before using them.

the next line, the first addi is

r3 = r1 + 0x0003 = 0x1200 + 0x0003 = 0x1203

Next, there is no reason why you cant use the same register as both
an operand and the resultP

r3 = r3 + 0x0004 = 0x1203 + 0x0004 = 0x1207


r4 = r2 + 0x0005 = 0x3400 + 0x0005 = 0x3405

Note on this one that r0 is always zero when used as an operand.  So
this is like moving the constant into r5 since adding anyting with 0 is

r5 = r0 + 0x0006 = 0x0000 + 0x0006 = 0x0006

Now some negative numbers or large unsigned numbers

r2 = r2 + 0xFFFF = 0x3400 + 0xFFFF = 0x33FF

r6 = r1 + 0xFFFE = 0x1200 + 0xFFFE = 0x11FE

r1 = r1 + 0xFFFD = 0x1200 + 0xFFFD = 0x11FD

At this point it should be painfully obvious that this is like programming
in any other language except there are limits on what you can do.  The
registers are just like variables, this instruction limits you to a
register = regsiter + constant and the constant has limits.  As far as
how you use it though is like programming in any other language.

Lesson 4: ADD

---- lesson4.c
#include "tinyasm.c"


---- lesson4.c

When simulated the output is:

[0x0000] 0x6444 lui r1,0x044 (0x1100)
[0x0001] 0x6888 lui r2,0x088 (0x2200)
[0x0002] 0x6CCC lui r3,0x0CC (0x3300)
[0x0003] 0x2DB3 addi r3,r3,0x0033 (51)
[0x0004] 0x1082 add r4,r1,r2
[0x0005] 0x1483 add r5,r1,r3
[0x0006] 0x1981 add r6,r3,r1
[0x0007] 0x1D05 add r7,r2,r5
[0x0008] 0x0400 add r1,r0,r0
[0x0009] 0xFFFF halt
fetch_count 10
write_count 0
read_count 0

What does it do?

Just like addi, think about programming in C or any other language
the registers are like variables and this instruction limits you to
a variable equals the sum of two variables.  So reading and understanding
this code should be easy to see:


r1 = 0x1100


r2 = 0x2200


r3 = 0x3300


r3 = r3 + 0x0033 = 0x3300 + 0x0033 = 0x3333


r4 = r1 + r2 = 0x1100 + 0x2200 = 0x3300


r5 = r1 + r3 = 0x1100 + 0x3333 = 0x4433


r6 = r3 + r1 = 0x3333 + 0x1100 = 0x4433


r7 = r2 + r5 = 0x2200 + 0x4433 = 0x6633


r1 = r0 + r0 = 0x0000 + 0x0000 = 0x0000


Lesson 5: NAND

---- lesson5.c
#include "tinyasm.c"


---- lesson5.c

When simulated the output is:

[0x0000] 0x6444 lui r1,0x044 (0x1100)
[0x0001] 0x2491 addi r1,r1,0x0011 (17)
[0x0002] 0x4481 nand r1,r1,r1
[0x0003] 0x4481 nand r1,r1,r1
[0x0004] 0x283F addi r2,r0,0x003F (63)
[0x0005] 0x4C82 nand r3,r1,r2
[0x0006] 0x4D83 nand r3,r3,r3
[0x0007] 0x5000 nand r4,r0,r0
[0x0008] 0x688C lui r2,0x08C (0x2300)
[0x0009] 0x4D04 nand r3,r2,r4
[0x000A] 0x5503 nand r5,r2,r3
[0x000B] 0x5A03 nand r6,r4,r3
[0x000C] 0x5E86 nand r7,r5,r6
[0x000D] 0x2904 addi r2,r2,0x0004 (4)
[0x000E] 0x4C81 nand r3,r1,r1
[0x000F] 0x5102 nand r4,r2,r2
[0x0010] 0x5584 nand r5,r3,r4
[0x0011] 0xFFFF halt
fetch_count 18
write_count 0
read_count 0

What does it do?

010aaabbb0000ccc nand ra,rb,rc      ra = ~(rb&rc)

Maybe you are asking yourself what is up with this weird instruction?
Make up your mind AND or NOT, but both?  Well what you may or may not
know is that if you have a NOR or a NAND you can build all of the other
logical operations AND, NOT, OR, XOR.  The reference material above
describes how this is done using truth tables.

Knowing from the description that the nand instruction performs the
operation ra = ~(rb&rc), at this point you should have no problem
reading this code.  After that we will look at what it actually does.


r1 = 0x1100


r1 = r1 + 0x0011 = 0x1100 + 0x0011 = 0x1111


r1 = ~(r1&r1) = ~(0x1111&0x1111) = ~(0x1111) = 0xEEEE


r1 = ~(r1&r1) = ~(0xEEEE&0xEEEE) = ~(0xEEEE) = 0x1111


r2 = r0 + 0x003F = 0x0000 + 0x003F = 0x003F


r3 = ~(r1&r2) = ~(0x1111&0x003F) = ~(0x0011) = 0xFFEE


r3 = ~(r3&r3) = ~(0xFFEE&0xFFEE) = ~(0xFFEE) = 0x0011


r4 = ~(r0&r0) = ~(0x0000&0x0000) = ~(0x0000) = 0xFFFF


r2 = 0x2300


r3 = ~(r2&r4) = ~(0x2300&0xFFFF) = ~(0x2300) = 0xDCFF


r5 = ~(r2&r3) = ~(0x2300&0x0011) = ~(0x0000) = 0xFFFF


r6 = ~(r4&r3) = ~(0xFFFF&0xDCFF) = ~(0xDCFF) = 0x2300


r7 = ~(r5&r6) = ~(0xFFFF&0x2300) = ~(0x2300) = 0xDCFF


r2 = r2 + 0x0004 = 0x2300 + 0x0004 = 0x2304


r3 = ~(r1&r1) = ~(0x1111&0x1111) = ~(0x1111) = 0xEEEE


r4 = ~(r2&r2) = ~(0x2304&0x2304) = ~(0x2304) = 0xDCFB


r5 = ~(r3&r4) = ~(0xEEEE&0xDCFB) = ~(0xCCEA) = 0x3315


So what kinds of things did we just see?  The most obvious is  a simple
not operation, if the two operands are the same register then we know
that anding somthing with itself is itself.  So then you take the bitwise
not of that and you get the bitwise not, of the input register.


r1 = ~(r1&r1) = ~(0x1111&0x1111) = ~(0x1111) = 0xEEEE

NOT(0x1111) = 0xEEEE

The next thing was a simple and


r3 = ~(r1&r2) = ~(0x1111&0x003F) = ~(0x0011) = 0xFFEE


r3 = ~(r3&r3) = ~(0xFFEE&0xFFEE) = ~(0xFFEE) = 0x0011

The first nand you and the two operands but your result is inverted, so
then you nand the result if the first operatin against itself and that
is a not.  The net result is an AND operation 0x1111 & 0x003F = 0x0011

From the reference material above we know that an xor operation, a xor b
can be computed using this sequence of nand operations, where d, e, and f
are temporary variables.

d = a nand b
e = a nand d
f = b nand d
c = e nand f


r3 = ~(r2&r4) = ~(0x2300&0xFFFF) = ~(0x2300) = 0xDCFF


r5 = ~(r2&r3) = ~(0x2300&0x0011) = ~(0x0000) = 0xFFFF


r6 = ~(r4&r3) = ~(0xFFFF&0xDCFF) = ~(0xDCFF) = 0x2300


r7 = ~(r5&r6) = ~(0xFFFF&0x2300) = ~(0x2300) = 0xDCFF

The net result is r2 xor r4, 0x2300 xor 0xFFFF = 0xDCFF

From the reference material above we also know that a or b can be
implemented using these instructions, where d and e are temporary

d = a nand a
e = b nand b
c = d nand e


r3 = ~(r1&r1) = ~(0x1111&0x1111) = ~(0x1111) = 0xEEEE


r4 = ~(r2&r2) = ~(0x2304&0x2304) = ~(0x2304) = 0xDCFB


r5 = ~(r3&r4) = ~(0xEEEE&0xDCFB) = ~(0xCCEA) = 0x3315

r1 or r2 = 0x1111 or 0x2304 = 0x3315

Lesson 6: pseudo instruciton OR

---- lesson6.c
#include "tinyasm.c"

---- lesson6.c

When simulated the output is:

[0x0000] 0x6444 lui r1,0x044 (0x1100)
[0x0001] 0x2491 addi r1,r1,0x0011 (17)
[0x0002] 0x688C lui r2,0x08C (0x2300)
[0x0003] 0x2904 addi r2,r2,0x0004 (4)
[0x0004] 0x4C81 nand r3,r1,r1
[0x0005] 0x5102 nand r4,r2,r2
[0x0006] 0x5584 nand r5,r3,r4
[0x0007] 0xFFFF halt
fetch_count 8
write_count 0
read_count 0

What does it do?

From lesson 5 and the reference material we learned that the operation
c = a or b can be computing using nand and two spare registers:

    d = b nand b
    e = c nand c
    a = d nand e

This interesting tiny assembler we are using makes it very easy to create
pseudo instructions.  Instructions with a different name or function
or syntax, that can be implemented using one or more other instructions.

Examine tinyasm.c and you will see:

void do_or ( unsigned int ra, unsigned int rb, unsigned int rc, unsigned int rd, unsigned int re )
    //c = a or b:
    //d = b nand b
    //e = c nand c
    //a = d nand e
#define or(ra,rb,rc,rd,re)  do_or(ra,rb,rc,rd,re)

You should try making your own pseudo instructions.  AND, NOT, XOR.

Lesson 7: BEQ

---- lesson7.c
#include "tinyasm.c"



---- lesson7.c

When simulated the output is:

[0x0000] 0x2405 addi r1,r0,0x0005 (5)
[0x0001] 0x2800 addi r2,r0,0x0000 (0)
[0x0002] 0x2901 addi r2,r2,0x0001 (1)
[0x0003] 0xC881 beq r2,r1,0x0005 (0x0001 0x0005)
[0x0004] 0xC07D beq r0,r0,0x0002 (0x0000 0x0000)
[0x0002] 0x2901 addi r2,r2,0x0001 (1)
[0x0003] 0xC881 beq r2,r1,0x0005 (0x0002 0x0005)
[0x0004] 0xC07D beq r0,r0,0x0002 (0x0000 0x0000)
[0x0002] 0x2901 addi r2,r2,0x0001 (1)
[0x0003] 0xC881 beq r2,r1,0x0005 (0x0003 0x0005)
[0x0004] 0xC07D beq r0,r0,0x0002 (0x0000 0x0000)
[0x0002] 0x2901 addi r2,r2,0x0001 (1)
[0x0003] 0xC881 beq r2,r1,0x0005 (0x0004 0x0005)
[0x0004] 0xC07D beq r0,r0,0x0002 (0x0000 0x0000)
[0x0002] 0x2901 addi r2,r2,0x0001 (1)
[0x0003] 0xC881 beq r2,r1,0x0005 (0x0005 0x0005)
[0x0005] 0xFFFF halt
fetch_count 17
write_count 0
read_count 0

What does it do?

If you are familiar with processors that use flags like the z flag means
equal or c flag means carry.  We dont have flags here, not for conditional
branches.  We have a branch if equal instruction which includes the
operands to be compared.  It does it all in one instruction, good, bad
or otherwise.

110aaabbbsssssss beq ra,rb,simm

If the contents of regsters ra and rb are equal then the program counter
is modified as such pc = (pc+1) + simm, where pc is the address of the
beq instruction in question.

Normally you dont encode the offset yourself in a branch instruction
you let the assembler compute it for you and you use labels.  This
tiny assembler language is a little strange in the use of labels.
In a normal C program you would just put the label:

    unsigned int ra;
    unsigned int rb;

    if(rb==ra) goto loop_done;
    goto loop_top:

And using a normal assembler you would pretty much do the same thing.  With
this tiny assembler C thing at some point before you use the lable
you declare it.


Then at the place in the code where you want the label you put the label


This program does what the C code does above.  We start by loading a
5 into register r1, and then a 0 into register r2.  We add 1 to r2
then compare and branch if equal r1 and r2.  So when r2 counts
to 5 r1 and r2 will be equal and the branch will happen.  Otherwise
you dont take that branch and you look at the next branch instruction.
The second beq instruction, compares r0 and r0, it doesnt matter
that it is r0, any register compared with itself is going to have the
same value so the result here is always an equal so it always branches.
Another term for this is an unconditional branch.  So if r1 equals r2
then we skip over the unconditial branch otherwise we perform the
unconditial branch, which leads us back to adding 1 to r2, eventually
r2 will be a 5 and they are equal and we branch out of the loop.

Normally a processor will have a number of conditional branch instructions
branch if equal, branch if not equal, branch if greater than, etc. Because
this instruction set is so greatly reduced there is only one conditional
branch.  You can already see with the pair of beq instructions and
a label you can create what is essentially a branch if not equal.

    unsigned int ra;
    unsigned int rb;

    if(rb!=ra) goto loop_top;

Lesson 8: Load and store

---- lesson8.c
#include "tinyasm.c"


---- lesson8.c

When simulated the output is:
[0x0000] 0x6444 lui r1,0x044 (0x1100)
[0x0001] 0x2491 addi r1,r1,0x0011 (17)
[0x0002] 0x6888 lui r2,0x088 (0x2200)
[0x0003] 0x2922 addi r2,r2,0x0022 (34)
[0x0004] 0x6C04 lui r3,0x004 (0x0100)
[0x0005] 0x8580 sw r1,[r3+0] ([0x0100]<=0x1111)
[0x0006] 0x8982 sw r2,[r3+2] ([0x0102]<=0x2222)
[0x0007] 0xB180 lw r4,[r3+0] ([0x0100]=>0x1111)
[0x0008] 0xB182 lw r4,[r3+2] ([0x0102]=>0x2222)
[0x0009] 0x2D82 addi r3,r3,0x0002 (2)
[0x000A] 0xB180 lw r4,[r3+0] ([0x0102]=>0x2222)
[0x000B] 0xB1FE lw r4,[r3-2] ([0x0100]=>0x1111)
[0x000C] 0xFFFF halt
fetch_count 13
write_count 2
read_count 0

What does it do?

The sw and lw instructions are used for writing (store word) and reading
(read word) stuff to and from memory space.  I say memory space instead
of memory because often the memory space or memory bus is used to
access peripherals as well as actual memory.  Some addresses may be
the control registers for a uart, or usb controller or hard disk or video
or something like that.  And the rest of the addresses are general purpose
memory that we can write and read when we run out of registers to keep
important items in.  The example above assumes that address 0x100 and
0x102 are general purpose memory.

You should already understand how r1 gets the value 0x1111 and r2 0x2222
and r3 0x100 (Before going into the first sw and lw instructions.
From the instruction referece we see that this uses a signed immediate
just like the addi instruction.  So we know what our range of valid
signed immedites is for this instruction.
100aaabbbsssssss sw ra,[rb+simm]
101aaabbbsssssss lw ra,[rb+simm]
The signed immediate is added to the contents of the rb register creating
an address.  A memory bus operation happens on that address, either
a write or read depending on the instruction.  If it is a write the
value in register ra is written to that memory address.  If it is a
read then whatever is read back from that address on the memory bus
is stored in register ra.

The example demonstrates writing 0x1111 to address 0x0100 then writes
0x2222 to address 0x0102.  Then it reads from 0x0100, since we have
not done anything to modify since the prior write, address 0x0100
contains the value 0x1111 that was written to it.  Likewise reading
from 0x0102 gives 0x2222.  Offsets are used to get at address 0x0102.

Adding 2 to r3 leaves 0x0102 in r3.  We demonstrate this by using the
newly modified r3 to read from memory resulting in 0x2222.  A signed
integer is allowed as an offset to the specified register for
addressing memory.  Adding 0xFFFE is the same as subtracting 0x0002
so in theory 0x0102 - 2 = 0x0100, and reading from that address does
indeed return 0x1111.

000aaabbb0000ccc add ra,rb,rc       ra = rb + rc
001aaabbbsssssss addi ra,rb,simm    ra = rb + simm
010aaabbb0000ccc nand ra,rb,rc      ra = ~(rb&rc)
011aaaiiiiiiiiii lui ra,imm         ra=imm<<6
100aaabbbsssssss sw ra,[rb+simm]
101aaabbbsssssss lw ra,[rb+simm]
110aaabbbsssssss beq ra,rb,simm
111aaabbb0000000 jalr ra,rb         ra = pc; j [rb]
1111111111111111 halt

building an instruction set simulator

look at and simulating logic

This is a rough draft and a work in progress, if/when I get through the
first pass then I will go back and prune and tweak and re-write.


