This is a forked version of Hao He's RISC-V Simulator. In this repository, using pages to manage memory is disabled.
- Part 1 correctness of supporting RISCV32I
- Part 1 correctness of fused instructions with data forwarding
- Part 1 correctness of all instructions without data forwarding
- Part 2 correctness of part2 p2.s
- Part 2 efficiency of part2 p2.s
- A short report about anything you have learn in this project
- Part 2 correctness of part2 p3.s
- Part 2 efficiency of part2 p3.s
- Improve the logic of Hao He’s simulator. (using Correlated Branch Prediction)
The idea is combine PC and a global branch history(GHB) to indexing BHT. (More Details in DataHazards slides)
the implementation is not difficult based on existing code. There are 3 tasks to do:
- Maintain a global branch history
ghbin classBranchPredictor - Update
ghbeverytime branch happensthis->ghb = (this->ghb << 1) + 1orthis->ghb = (this->ghb << 1) - when indexing do xor operation between
pcandghb(Gshare)id = (pc ^ this->ghb) % PRED_BUF_SIZE
I implement different length of ghb (32/16/8/4)
| Program | BPB | CBP32 | CBP16 | CBP8 | CBP4 |
|---|---|---|---|---|---|
| helloworld.riscv | 0.8615 | 0.8615 | 0.8615 | 0.8615 | 0.8615 |
| quicsort.riscv | 0.9589 | 0.9829 | 0.9829 | 0.9844 | 0.9763 |
| matrixmult.riscv | 0.6150 | 0.7894 | 0.7894 | 0.8450 | 0.8533 |
| ackermann.riscv | 0.9595 | 0.9809 | 0.9809 | 0.9814 | 0.9805 |
The results show that correlated branch prediction can further improve performance based on the 2-bit branch prediction buffer, especially in matrixmult.riscv. However, there is no performance improvement in helloworld.riscv, likely because the branch behavior is relatively simple.
From the results of ackermann.riscv and quicksort.riscv, we observe that as the length of the global branch history register (GHR) decreases, the improvement first increases and then decreases. The reason could be that if the GHR is too short, it fails to capture sufficient historical context, making it unable to reflect correlations between distant branches. Conversely, if the GHR is too long, it may capture irrelevant distant branch behaviors, leading to history pollution.
Notice that the branch prediction results differ from the results shown in He Hao's blog. I checked the initial execution results of the simulator and found that they were already different from the beginning. The reason could be that our project compiles C code using RISC-V32 instead of RISC-V64.
In whole project, switch RISCV64 to RISCV32 is relative easy based on He Hao's code, but when implement pipline without dataforwarding, i met some tricky bugs:
- dReg stall: when dReg is stalled, the data got from register file is also freezed, we need to update
dRegat correct timing to ensure correctness. - zero register: if the current instruction has x0 as destReg and next instruction also used x0 as rs, it is easy to be detected as data hazard(but actually not)
- jump eliminated dataHazard: if jump happens, pipline is drained, therefore eliminate some dataHazard.
-
Project 3 greatly enhanced my programming and debugging skills. Although I encountered some challenging bugs during development, the process of resolving them provided me with valuable debugging experience and reinforced the importance of writing robust code with comprehensive error handling.
-
While understanding He Hao's simulator implementation, I further solidified my knowledge of the 5-stage pipeline and how data is transferred between different stages through pipeline registers.
# For part2 and part2_p2
------------ CPU STATE ------------
PC: 0x10118
zero: 0x00000000(0) ra: 0x00000001(1) sp: 0x80000000(-2147483648) gp: 0x00000000(0)
tp: 0x00000000(0) t0: 0x00000004(4) t1: 0x00001000(4096) t2: 0x00001000(4096)
s0: 0x00000000(0) s1: 0x00000000(0) a0: 0x00000000(0) a1: 0x0000000d(13)
a2: 0x0000000e(14) a3: 0x000002d8(728) a4: 0x0000009c(156) a5: 0x0000000e(14)
a6: 0x0016c2d8(1491672) a7: 0x0000005d(93) s2: 0x00000000(0) s3: 0x00000000(0)
s4: 0x00000000(0) s5: 0x00000000(0) s6: 0x00000000(0) s7: 0x00000000(0)
s8: 0x00000000(0) s9: 0x00000000(0) s10: 0x00000000(0) s11: 0x00000000(0)
t3: 0x00000000(0) t4: 0x00000000(0) t5: 0x00000000(0) t6: 0x00000000(0)
-----------------------------------
# For part2_p3
------------ CPU STATE ------------
PC: 0x10110
zero: 0x00000000(0) ra: 0x00000001(1) sp: 0x80000000(-2147483648) gp: 0x00000000(0)
tp: 0x00000000(0) t0: 0x00000004(4) t1: 0x00001000(4096) t2: 0x00001000(4096)
s0: 0x00000000(0) s1: 0x00000000(0) a0: 0x00000000(0) a1: 0x0000000d(13)
a2: 0x0000000e(14) a3: 0x000002d8(728) a4: 0x00000010(16) a5: 0x0000000e(14)
a6: 0x0016c2d8(1491672) a7: 0x0000005d(93) s2: 0x00000000(0) s3: 0x00000000(0)
s4: 0x00000000(0) s5: 0x00000000(0) s6: 0x00000000(0) s7: 0x00000000(0)
s8: 0x00000000(0) s9: 0x00000000(0) s10: 0x00000000(0) s11: 0x00000000(0)
t3: 0x00000000(0) t4: 0x00000000(0) t5: 0x00000000(0) t6: 0x00000000(0)
-----------------------------------Notice here a4 register is different in part2/par2_p2 and part2_p3, the reason is because fmadd.i a3, a1, a2, a3 is used to switch mul a4, a1, a2 and add a3, a3, a4, so a4 is eliminated in loop, so it no longer update after li a4, 16.
student@8e33fa974021:~/CSC3050-2025-Spring-Project-3/build$ ./Simulator ../riscv-elf/part2.riscv -x
Program exit from an exit() system call
------------ STATISTICS -----------
Number of Instructions: 270
Number of Cycles: 600
Avg Cycles per Instrcution: 2.2222
Branch Perdiction Accuacy: 0.0833 (Strategy: Always Not Taken)
Number of Control Hazards: 23
Number of Data Hazards: 101
Number of Memory Hazards: 0
-----------------------------------
student@8e33fa974021:~/CSC3050-2025-Spring-Project-3/build$ ./Simulator ../riscv-elf/part2_p2.riscv -x
Program exit from an exit() system call
------------ STATISTICS -----------
Number of Instructions: 270
Number of Cycles: 514
Avg Cycles per Instrcution: 1.9037
Branch Perdiction Accuacy: 0.0833 (Strategy: Always Not Taken)
Number of Control Hazards: 23
Number of Data Hazards: 64
Number of Memory Hazards: 0
-----------------------------------
student@8e33fa974021:~/CSC3050-2025-Spring-Project-3/build$ ./Simulator ../riscv-elf/part2_p3.riscv -x
Program exit from an exit() system call
------------ STATISTICS -----------
Number of Instructions: 246
Number of Cycles: 502
Avg Cycles per Instrcution: 2.0407
Branch Perdiction Accuacy: 0.0833 (Strategy: Always Not Taken)
Number of Control Hazards: 23
Number of Data Hazards: 64
Number of Memory Hazards: 0
-----------------------------------Notice Cycles is reduced after code scheduling and further deduced after using fmadd.i to reduce instruction numbers.