Kyouma CPU is a simple 32-bit processor made just for fun (and maybe some learning). Here you can find a full specification for its architecture.
First of all, it's mostly 32-bit. This means that all the registers are 32-bit, all operations are 32-bit and most of the opcodes are 32-bit. Still, there are short 16-bit opcodes, but they can come only in pairs (so if they don't, NOP should be inserted).
Second, it is little-endian (because that's conventional, I guess?)
Third, it uses predication. I found it in zipcpu and then in ARM, and it seems to be a very good idea for fully accessible uniform register sets. So every instruction can be executed conditionally. In 16-bit opcodes there are no space for condition specifier, so I desided to use ARM's technique and add an instruction (CND) to specify condition codes for 3 successive instructions.
Because I really liked zipcpu's concept about two modes of operation, I'm going to implement it here as well.
So, KCPU has two modes, user (for user code) and supervisor (for kernel-level code). Each mode has it's own set of registers and can freely switch between them. It is similar to the context switch in normal CPUs, but because we are changing a single flag inside the CPU, it can be much faster. Also KCPU in supervisor mode can access user registers (for implementing system calls) while KCPU in user mode can access only its own registers.
There are 16 registers (only 15 of which can store data) in each set and access to all of them is possible (because FREEDOM!) They are:
R0- always reads 0, writes are ignoredR1-R10- general purposeR11orSR- status registerR12orLR- link register (stores return address of the last subroutine call)R13orFP- frame pointerR14orSP- stack pointerR15orPC- program counter They can be referred to with their respective numbers from 0 to 15.
There are also HI and LO registers for multiplication and division results. They are shared between both modes.
HI and LO registers can only be read with special instructions.
-
Bit 0 - Zero flag. Set if last operation resulted in 0
-
Bit 1 - Carry flag
-
Bit 2 - Negative flag. Set if last operation resulted in negative number
-
Bit 3 - Overflow flag
-
Bit 4 - Mode flag.
0if supervisor mode,1if user mode -
Bit 4 - Zero division error flag
-
Bit 5 - Illegal opcode flag
-
Bit 6 - Wait for interrupt flag
-
Bit 7 - Step flag
-
Bit 8 - Hardware interrupt flag. Set if interrupt was hardware and not software.
-
Bit 9 - NULL reference flag
-
Bits 10-31 - Unused, always
0
Each condition specifier is 4 bits long.
-
Bit 3 is a freeze bit. If it's set, then CPU will freeze flag register, so instruction result won't change it. In assembly it is set like this:
ADD* R1, R2, R3 -
Bits 0-2 is condition itself. It may be as follows:
Condition Assembly symbol Meaning 000- No condition. Instruction is always executed 001?VExecute if overflow flag is set 010?Zor?EQExecute if zero flag is set 011?NZor?NEExecute if zero flag is not set 100?LTExecute if negative flag is set 101?GEExecute if negative flag is not set 110?CExecute if carry flag is set 111?NCExecute if carry flag is not set In assembly it combines with freeze bit like this:
ADD?NC* R1, R2, R3
There are multiple types of instructions:
-
Short (16-bit). It can use only values in registers
1514..1211..87..43..00Opcode Destination Source 1 Source 2 Opcode Instruction mnemonic Description Action 000ADDAdd two integers {D} <- {S1} + {S2}001SUBSubtract two integers {D} <- {S1} - {S2}010LSHLogical shift (left if positive, right if negative) {D} <- {S1} <</>> {S2}011ASHArithmetic shift (left if positive, right if negative) {D} <- {S1} <</>> {S2} (signed)100ANDLogical AND {D} <- {S1} & {S2}101ORLogical OR {D} <- {S1} | {S2}110XORLogical XOR {D} <- {S1} ^ {S2}111CNDConditions (see below) - The
CNDinstruction has the following format:15..1211..87..43..001111st condition 2nd condition 3rd condition It sets conditions for the following 3 instructions
Also, there is
NOPinstruction. It's code is0x0000, so it isADD R0, R0, R0. Usually it would change flags but it is an exception so it doesn't.There is one more thing: if second instruction in the pair (short instructions come in pairs, right?) is a
NOPinstruction, then it's skipped altogether.As such, beware that if you do something like
ADD R1, PC, R0 NOPthen value in
R1register would be the address of instruction afterNOP, notNOPitself -
Immediate (32-bit)
31..2928..2625..2221..1817..43..0100Opcode Destination Source Immediate value Condition Opcode Instruction mnemonic Description Action 000ADDIAdd two integers {D} <- {S} + I001SUBISubtract two integers {D} <- {S} - I010LSHILogical shift (left if positive, right if negative) {D} <- {S} <</>> I011ASHIArithmetic shift (left if positive, right if negative) {D} <- {S} <</>> I (signed)100ANDILogical AND {D} <- {S} & I101ORILogical OR {D} <- {S} | I110XORILogical XOR {D} <- {S} ^ I111LDHMove immediate to high 14 bits and source to the rest {D} <- {I, {S}[17..0]}Immediate value is sign extended to 32 bits. If
LDHis passed label as an argument, immediate value is high 13 bits of address -
Load/store (32-bit)
31..3029..2726..2322..1918..43..011Opcode Source/Destination Address Immediate value Condition Opcode Instruction mnemonic Description Action 000LWLoad 32 bits {S/D} <- ({A} + I)(32-bits)001SWStore 32 bits ({A} + I) <- {S/D}(32-bits)010SHStore 16 bits ({A} + I) <- {S/D}(16-bits)011SBStore 8 bits ({A} + I) <- {S/D}(8-bits)100LHULoad 16 bits unsigned {S/D} <- ({A} + I)(16-bits unsigned)101LHSLoad 16 bits signed {S/D} <- ({A} + I)(16-bits signed)110LBULoad 8 bits unsigned {S/D} <- ({A} + I)(8-bits unsigned)111LBSLoad 8 bits signed {S/D} <- ({A} + I)(8-bits signed) -
Misc (32-bit)
-
LDI- loads 20-bit signed immedate value in register31..2827..2423..43..01010Destination Immediate value Condition If passed label as an argument, immediate value is low 19 bits of address
-
MLTU- multiplies two unsigned 32-bit integers. Result is inHI..LOpair31..2524..2120..1716..43..01011000Source 1 Source 2 Unused?.. Condition -
MLTS- multiplies two signed 32-bit integers. Result is inHI..LOpair31..2524..2120..1716..43..01011001Source 1 Source 2 Unused?.. Condition -
DIVU- divides two unsigned 32-bit integers. Quotient is inLOregister, remainder is inHIregister31..2524..2120..1716..43..01011010Source 1 Source 2 Unused?.. Condition -
DIVS- divides two signed 32-bit integers. Quotient is inLOregister, remainder is inHIregister31..2524..2120..1716..43..01011011Source 1 Source 2 Unused?.. Condition Division takes 11 cycles, so for the next 11 instructions you should not expect results in
HI..LOregisters. You still can perform multiplication at that time though, just don't do it right at the 11th instruction afterDIV -
MVSU- moves value from user register to supervisor register31..2524..2120..1716..43..01011100Destination Source Unused?.. Condition Name because in assembly its written like
MVSU sR1, uR2and moves value from userR2to supervisorR1 -
MVUS- moves value from supervisor register to user register31..2524..2120..1716..43..01011101Destination Source Unused?.. Condition Name because in assembly its written like
MVUS uR1, sR2and moves value from supervisorR2to userR1 -
MVHI- moves value fromHIregister to some other register31..2524..2120..43..01011110Destination Unused?.. Condition -
MVLO- moves value fromLOregister to some other register31..2524..2120..43..01011111Destination Unused?.. Condition
-
There is no definite memory map, but for convenience:
0x00000000- Inaccessible (if accessed in user mode sets corresponding flag and switches to supervisor mode. If in supervisor mode, halt the CPU)0x00000001..0x0000FFFF- ROM for loader amd supervisor routines. Could be user programs on ROM too0x00010000- Usual code position0x10000000- Heap start0xDFFFFFFF- User Stack bottom. Stack grows downwards0xEFFFFFFF- Supervisor Stack bottom. Stack grows downwards0xFFFF0000- Ports
Caution: 1 byte ports only work if writing to them with specifically, not together with other ports
0xFFFFFFFF(1 byte) - LCD control signals (LCD_CTRL)- Bit 0 is
LCD_RS - Bit 1 is
LCD_RW - Bit 2 is
LCD_E
- Bit 0 is
0xFFFFFFFE(1 byte) - LCD data signals (LCD_DATA)0xFFFFFFFD(1 byte) - CPU speed (CPU_SPEED)0means manual speed (like button-press slow), (CPU_SPEED_MANUAL)1means slow speed (like 50 Hz), (CPU_SPEED_SLOW)2means max speed (full 50 MHz), , (CPU_SPEED_MAX)
KCPU should use usual C calling convention with slight modification: return address is in LR, and not on stack.
If needed, LR is saved on stack or somewhere else by caller. Also, no registers are saved except SP and FP
- If needed, pushes
LRon stack - Pushes arguments on stack (in reverse order)
- Sets
LRto return address - Jumps to callee
- Cleans up arguments
- If needed, gets
LRfrom stack
- Pushes
FPon stack - Saves current stack position in
FP - Executes its code, using
(FP+8),(FP+12)etc. as arguments and allocating local variables at(FP+0),(FP-4),(FP-8)etc. - Stores return value in
R1 - Restores stack position
- Gets
FPfrom stack - Jumps to position specified by
LR