GitHub - cbmeeks/gs4502b: Experimental pipelined 4502 CPU design

GS4502B - An attempt to create a high-performance 4502 and 6502 compatible CPU

This repository contains a work-in-progress design for a radically higher-performance 6502 compatible processor than the existing 48MHz 45GS10 processor used in the MEGA65 retro-computer.

Whereas the 45GS10 is essentially just a relatively normal 6502 core clocked at the high speed allowed by a modern FPGA, the GS4502B is a complete redesign, intended to yield both higher maximum clock-speed, as well as substantially increased instructions-per-cycle (IPC) throughput.

The three key architectural changes are:

Use of a relatively deep pipeline to allow increased clock speed. The increase in clock speed should be sufficient to result in no-worse instruction latency in almost all cases, and much lower instruction latency in most cases. The intention is to allow a clock speed of 192MHz, a four-fold improvement on the 45GS10.
The introduction of an instruction-cache (I-CACHE), to allow the processor to dispatch one instruction per cycle under normal operating conditions. Further, the I-CACHE pre-fetch logic will include the ability to fold independent consecutive instructions into a single cache entry, so that it is possible under certain conditions to obtain an IPC > 1. However, even without instruction folding, the combination of pipeline and I-CACHE should allow an IPC approaching 1, as compared to the typical IPC of around 0.3 for a 6502 and 0.27 for the 45GS10.
The inclusion of powerful register and flag renaming logic, that will allow many instruction sequences that would otherwise stall the pipeline to proceed without impediment. For example, the sequence LDA $1234 / STA $2345 would be able to proceed in successive cycles, because the second instruction would be tagged to use the result of the first instruction as its operand, allowing another instruction that modifies or uses the accummulator to follow directly after. This requires that the write-back stage and memory controller have a substantial degree of intelligence, compared with the 45GS10 or a normal 6502 core. Tight loops of the form LDA xxx / STA xxx / INX / BNE *-n will benefit particularly from this feature, because the each iteration of the loop can be executed in just three cycles (LDA, STA + INX / BNE), allowing simple copy routines to proceed at 2/3 the speed of a DMA-based copy. This is a good example of the degree of speed improvment that this processor design can offer -- assuming that I can complete it!

Together, these improvements will hopefully result in a processor that is at least 10x the speed of the 45GS10, when implemented in the same FPGA device. It is also probable that it will require less FPGA resources, due to the adoption of a more modular and scrutible design, that avoids the excessive duplication of resources that appears to occur during synthesis of the 45GS10 due to my poor programming style in that processor. However, this is all speculation until it is actually implemented and working.

It would also have been possible to implement out-of-order execution to further increase IPC, however the logic to do so is notoriously large in area, and it is probable that it would only provide modest IPC improvements, given that we already have instruction merging and register renaming to help keep the pipeline as busy as possible. Further improvement would require the inclusion of additional execution units, i.e., a true super-scaler design, however this would simply increase the size of the processor even more. In any case, because the write-back stage can perform certain arithmetic operations in order to handle RMW instructions and renamed registers and flags, it already includes a low-cost form of super-scalarity in having two ALUs.

Self-modifying code

Perhaps the single greatest challenge in implementing a high-performance 6502-class processor is the wide-spread use of self-modifying code. Even the BASIC interprettor on the C64 uses it! Worse, it is quite common to modify the very next instruction to be executed, which means that the pipeline has to be rather clever indeed to not accidentally execute the wrong version of an instruction.

Support for any and all forms of self-modifying code is quite simply a mandatory requirement for any 6502-compatible processor, and thus will be implemented in the GS4502B. The great challenge is how to do this, without harming the performance of the processor when executing non-self-modifying instructions. Because modification and execution of instructions may be widely separated in both time and memory space, every write to memory must be checked to see if it requires updating or invalidating one or more cache lines. Because instructions can be upto 3 bytes in length, three cache lines must be checked for every memory write that occurs. This is, quite simply put, extremely annoying. The GS4502B is intended to use a four-way parallel instruction cache to reduce this cost, by allowing all three offending cache lines to be read in parallel, and then also patched in parallel if required. The fine details of how this would work are yet to be settled, and the the portion of the design that is least settled at this point in time.

Name		Name	Last commit message	Last commit date
Latest commit History 242 Commits
Ophis @ 98ffd2a		Ophis @ 98ffd2a
dotclock		dotclock
ghdl @ 6f53685		ghdl @ 6f53685
newcpu		newcpu
.gitignore		.gitignore
.gitmodules		.gitmodules
Makefile		Makefile
README.md		README.md
address_translator.vhdl		address_translator.vhdl
addressingmodeequations.c		addressingmodeequations.c
alu.vhdl		alu.vhdl
container.ucf		container.ucf
container.vhdl		container.vhdl
cpu_test.vhdl		cpu_test.vhdl
debugtools.vhdl		debugtools.vhdl
disassemble.vhdl		disassemble.vhdl
dotclock.asy		dotclock.asy
dotclock.gise		dotclock.gise
dotclock.vhd		dotclock.vhd
dotclock.vho		dotclock.vho
dotclock.xco		dotclock.xco
dotclock.xdc		dotclock.xdc
extractextraflags		extractextraflags
extractflags		extractflags
extrainstructionflags.c		extrainstructionflags.c
ghdl_ram108x1k.vhdl		ghdl_ram108x1k.vhdl
gs4502b.vhdl		gs4502b.vhdl
gs4502b_core.vhdl		gs4502b_core.vhdl
gs4502b_instruction_prefetch.vhdl		gs4502b_instruction_prefetch.vhdl
gs4502b_stage_decode.vhdl		gs4502b_stage_decode.vhdl
gs4502b_stage_execute.vhdl		gs4502b_stage_execute.vhdl
gs4502b_stage_validate.vhdl		gs4502b_stage_validate.vhdl
instrlenequations.c		instrlenequations.c
instructionequations.c		instructionequations.c
instructions.vhdl		instructions.vhdl
makeram.c		makeram.c
mega65ram.a65		mega65ram.a65
memory_controller.vhdl		memory_controller.vhdl

cbmeeks/gs4502b

Folders and files

Latest commit

History

Repository files navigation

GS4502B - An attempt to create a high-performance 4502 and 6502 compatible CPU

Self-modifying code

About

Resources

Stars

Watchers

Forks

Languages