EdgeMATX-TinyML-Accelerator

EdgeMATX-TinyML-Accelerator is a Verilog-based RISC-V accelerator project that integrates a 4x4 fixed-point matrix-multiplication engine with PicoRV32 through the PCPI custom-instruction interface.

The repository is organized around a simulation-first workflow:

standalone accelerator validation,
PicoRV32 + PCPI integration,
firmware-driven regression and cycle comparison,
preparation for later FPGA deployment on Pynq-Z2.

Project Highlights

4x4 Q5.10 systolic matrix accelerator RTL.
PicoRV32 integration through a custom PCPI instruction.
Scripted regression, handoff validation, and 3-way cycle comparison.
Live real-input flow for evaluator-provided matrices.
Beginner-focused documentation for wrapper RTL, accelerator RTL, and systolic-array concepts.

Current Status

Standalone accelerator RTL is validated in simulation.
PicoRV32 + PCPI integration is working end-to-end in simulation.
Firmware-driven smoke, regression, handoff, professor-demo, and cycle-compare flows are available.
FPGA timing closure and on-board performance measurement remain future work.

Repository Map

start_here/: quick-entry docs and evaluation flow.
integration/pcpi_demo/: main PicoRV32 + PCPI + accelerator flow.
accel_standalone/: standalone accelerator RTL evaluation flow.
picorv32/: vendored PicoRV32 core from YosysHQ.
RISC-V/: vendored RV32I core reference implementation.
midsem_sim/: compatibility shim for older standalone-flow paths.
integration/pcpi_demo/legacy/: fallback/reference assets separated from active flow.

Start Here

start_here/README.md
start_here/EVAL_FLOW.md
integration/pcpi_demo/README.md
docs/diagrams/pcpi_wrapper_realistic_block_diagram.drawio.xml

Dependencies (Install Before Running)

Required for all simulation flows:

git
PowerShell (Windows)
iverilog and vvp (Icarus Verilog)
python (Python 3)

Recommended for waveform inspection:

gtkwave

Required for firmware rebuild (PCPI regression/handoff):

Option A: Native toolchain on Windows:

riscv64-unknown-elf-gcc
riscv64-unknown-elf-objcopy
make

Option B: WSL fallback (Ubuntu), used by scripts when native toolchain is missing:

sudo apt-get update
sudo apt-get install -y gcc-riscv64-unknown-elf binutils-riscv64-unknown-elf make python3

Quick verify commands:

git --version
iverilog -V
vvp -V
python --version
gtkwave --version
wsl bash -lc "riscv64-unknown-elf-gcc --version | head -n 1"

Vendored RISC-V Core

This repository currently includes the upstream core as a vendor copy (not a submodule):

Upstream: https://github.com/srpoyrek/RISC-V
Imported on: 2026-02-28
Import metadata: RISC-V/VENDORING.md

The RISC-V/ folder is tracked directly by this repo, and RISC-V/.git is intentionally removed.

Upstream Core Summary

Based on the upstream README, the core includes:

5-stage pipelined RISC-V architecture (RV32I)
Verilog HDL implementation
Modules such as control unit, hazard detection, forwarding, ALU, and memory-related blocks
Testbench-based verification (ModelSim-oriented upstream setup)

Next Development Focus

Show simulation-first progress in mid-sem using accel_standalone.
Stabilize accelerator + custom instruction interface in simulation before FPGA deployment.
Transition from analytic speedup estimates to measured board timings (ARM and mcycle).

Core Integration Status

Accelerator is currently validated in standalone RTL simulation (accel_standalone).
PicoRV32 is vendored and available in-repo (picorv32/).
A first CPU integration milestone is implemented in simulation via PCPI (integration/pcpi_demo).
Custom instruction path is tested with machine-code loaded directly in testbench memory.
PCPI demo now uses matrix base pointers (rs1/rs2), reads A/B from memory, and writes C buffer back to memory.
Firmware scaffold is added (firmware.S, linker script, Makefile, hex generation path) with fallback hex support when toolchain is unavailable.
Full board deployment is still pending.

Planned Core Integration Path

Add a proven integration-ready RV32 core (recommended: PicoRV32).
Wrap the accelerator with a coprocessor/custom-op interface (start, busy, done).
Add decode/handshake logic so a custom instruction triggers matrix multiply.
Build CPU+accelerator simulation testbench and verify correctness plus stall behavior.
Replace analytic speedup estimates with measured cycle counts (mcycle / ARM timing).
Move to Vivado/Pynq-Z2 hardware integration after simulation sign-off.

Standalone Accelerator Quick Start

Run from repository root:

.\accel_standalone\scripts\run_midsem_sim.ps1

Generated artifacts:

accel_standalone/results/sim_output.log
accel_standalone/results/MIDSEM_RESULTS.md

Compatibility shim (old path, still supported):

.\midsem_sim\scripts\run_midsem_sim.ps1

PCPI Integration Demo Quick Start

Run from repository root:

.\integration\pcpi_demo\scripts\run_pcpi_demo.ps1

Optional C firmware variant (same custom instruction semantics):

.\integration\pcpi_demo\scripts\run_pcpi_demo.ps1 -FirmwareVariant c

Note: the C smoke variant and cycle-compare flow both use the shared source integration/pcpi_demo/firmware/firmware_matmul_unified.c with compile-time mode/address macros. Accelerator offload uses an explicitly emitted custom instruction word (0x5420818b), not automatic loop-to-accelerator compiler conversion.

Generated artifacts:

integration/pcpi_demo/results/pcpi_demo.log
integration/pcpi_demo/results/pcpi_demo_wave.vcd

PCPI 8-Case Regression Quick Start

Run from repository root:

.\integration\pcpi_demo\scripts\run_pcpi_regression.ps1

Generated artifacts:

integration/pcpi_demo/results/cases/*.log
integration/pcpi_demo/results/pcpi_regression_summary.md
integration/pcpi_demo/results/pcpi_regression_summary.json

PCPI Handoff Validation Quick Start

Run from repository root:

.\integration\pcpi_demo\scripts\run_pcpi_handoff.ps1

PCPI One-Command Local Checker

Run from repository root:

.\integration\pcpi_demo\scripts\run_pcpi_local_check.ps1

This script runs smoke (asm + c), full regression, and handoff, and exits non-zero on any failure.

PCPI Cycle Comparison

Run from repository root:

.\integration\pcpi_demo\scripts\run_cycle_compare.ps1

This reports cycle counts for:

software baseline without scalar MUL (rv32i, ENABLE_MUL=0)
software baseline with scalar MUL (rv32im, ENABLE_MUL=1)
custom-instruction accelerator path

and writes speedup ratios across all three.

Latest verified (2026-03-05):

accel_cycles=673
sw_nomul_cycles=26130 (rv32i, ENABLE_MUL=0)
sw_mul_cycles=7975 (rv32im, ENABLE_MUL=1)
sw_nomul/accel=38.8262x
sw_mul/accel=11.8499x
sw_nomul/sw_mul=3.2765x

PCPI Professor Demo Cases

Run from repository root:

.\integration\pcpi_demo\scripts\run_pcpi_professor_demo.ps1

This runs an explainable set of matrix cases (identity, negative identity, zero, half-scale, signed passthrough) and produces a concise demo summary.

Cycle Scaling Estimator

Run from repository root:

python .\integration\pcpi_demo\scripts\estimate_cycle_scaling.py --sizes 4,8,16,32,64

This generates estimated normal-core vs accelerator scaling tables (ideal and overhead-aware) in JSON form.

Custom Real-Input Case Flow

Mentor/evaluator-provided real matrices can be tested without touching baseline regression cases.json.

Fastest live-evaluation mode (edit one JSON, run one script):

Edit integration/pcpi_demo/tests/live_real_input.json
Run:

.\integration\pcpi_demo\scripts\run_pcpi_custom_cycle_compare.ps1

This single command automatically converts real values to Q5.10, generates firmware case data, and runs accelerator + SW no-MUL + SW MUL comparisons. It also writes per-variant outputs in real format:

integration/pcpi_demo/results/custom_cases/live_eval_active_outputs_real.json

Current checked-in live profile (live_real_input.json) is tuned for near-50x no-MUL comparison and currently measures:

accel=673, sw_nomul=36246, sw_mul=7975
sw_nomul/accel=53.8574x
sw_mul/accel=11.8499x

Convert real values to Q5.10 and print preview only:

python .\integration\pcpi_demo\tests\real_to_q5_10_case.py --input-json .\integration\pcpi_demo\tests\sample_real_input.json

Convert and append timestamped custom case into isolated custom file:

python .\integration\pcpi_demo\tests\real_to_q5_10_case.py --input-json .\integration\pcpi_demo\tests\sample_real_input.json --append-custom

Run one custom case from custom case file:

.\integration\pcpi_demo\scripts\run_pcpi_custom_case.ps1 -CaseName <custom_case_name>

Run one custom case across all 3 performance variants (accelerator, SW no-MUL, SW MUL):

.\integration\pcpi_demo\scripts\run_pcpi_custom_cycle_compare.ps1 -CaseName <custom_case_name>

This writes per-variant logs plus a per-case cycle summary:

integration/pcpi_demo/results/custom_cases/<case>_cycle_accel.log
integration/pcpi_demo/results/custom_cases/<case>_cycle_sw_nomul.log
integration/pcpi_demo/results/custom_cases/<case>_cycle_sw_mul.log
integration/pcpi_demo/results/custom_cases/<case>_cycle_compare_summary.md
integration/pcpi_demo/results/custom_cases/<case>_cycle_compare_summary.json
integration/pcpi_demo/results/custom_cases/<case>_outputs_real.json

Explicitly clear generated custom cases:

python .\integration\pcpi_demo\tests\real_to_q5_10_case.py --clear-generated

Repo Hygiene + Handoff Discipline

Generated outputs are intentionally ignored (do not commit):
- integration/pcpi_demo/results/pcpi_cycle_*
- integration/pcpi_demo/results/pcpi_prof_demo_*
- integration/pcpi_demo/results/prof_demo_cases/*
- pynq_z2_custom_core/build/*.out
run_cycle_compare.ps1 and run_pcpi_professor_demo.ps1 now use a shared lock file (integration/pcpi_demo/firmware/.firmware_flow.lock) to avoid concurrent firmware rewrite races.
- run_pcpi_custom_case.ps1 also uses this lock.
- run_pcpi_custom_cycle_compare.ps1 also uses this lock.
After any code/script/RTL/testbench change, update both:
- README.md
- handoff_project_context.md
Consolidated tracked handoff/testing table is maintained at:
- integration/pcpi_demo/TEST_RESULTS_SUMMARY.md
Mentor-facing progress brief is maintained at:
- mentor_progress_update.txt
Beginner-to-advanced full project walkthrough is maintained at:
- integration/pcpi_demo/docs/MIDSEM_COMPLETE_PROJECT_GUIDE.md
Dedicated RTL learning docs (wrapper, accelerator, systolic concept, end-to-end interaction) are at:
- integration/pcpi_demo/docs/RTL_WRAPPER_LINE_BY_LINE.md
- integration/pcpi_demo/docs/RTL_ACCELERATOR_LINE_BY_LINE.md
- integration/pcpi_demo/docs/SYSTOLIC_ARRAY_FROM_SCRATCH.md
- integration/pcpi_demo/docs/END_TO_END_BLOCK_INTERACTION.md
Design-space and deployment tradeoff note is at:
- integration/pcpi_demo/docs/DESIGN_TRADEOFFS_AND_USE_CASES.md
Interactive web visualizer for architecture + handshake animation is at:
- integration/pcpi_demo/visualizer/README.md
- integration/pcpi_demo/visualizer/index.html
- Production URL: https://tinyml-pcpi-visualizer.vercel.app
- It now includes per-arrow signal inspection, CPU stall/handoff guidance, project-level architecture info, PE dataflow view, step-back control, and draggable split-pane layout.

Generated artifacts:

integration/pcpi_demo/results/pcpi_handoff.log
integration/pcpi_demo/results/pcpi_handoff_wave.vcd
integration/pcpi_demo/results/pcpi_handoff_summary.md

License

This repository is licensed under the MIT License. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EdgeMATX-TinyML-Accelerator

Project Highlights

Current Status

Repository Map

Start Here

Dependencies (Install Before Running)

Vendored RISC-V Core

Upstream Core Summary

Next Development Focus

Core Integration Status

Planned Core Integration Path

Standalone Accelerator Quick Start

PCPI Integration Demo Quick Start

PCPI 8-Case Regression Quick Start

PCPI Handoff Validation Quick Start

PCPI One-Command Local Checker

PCPI Cycle Comparison

PCPI Professor Demo Cases

Cycle Scaling Estimator

Custom Real-Input Case Flow

Repo Hygiene + Handoff Discipline

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
RISC-V		RISC-V
accel_standalone		accel_standalone
docs		docs
integration/pcpi_demo		integration/pcpi_demo
midsem_sim		midsem_sim
picorv32		picorv32
start_here		start_here
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
handoff_project_context.md		handoff_project_context.md
mentor_progress_update.txt		mentor_progress_update.txt

Folders and files

Latest commit

History

Repository files navigation

EdgeMATX-TinyML-Accelerator

Project Highlights

Current Status

Repository Map

Start Here

Dependencies (Install Before Running)

Vendored RISC-V Core

Upstream Core Summary

Next Development Focus

Core Integration Status

Planned Core Integration Path

Standalone Accelerator Quick Start

PCPI Integration Demo Quick Start

PCPI 8-Case Regression Quick Start

PCPI Handoff Validation Quick Start

PCPI One-Command Local Checker

PCPI Cycle Comparison

PCPI Professor Demo Cases

Cycle Scaling Estimator

Custom Real-Input Case Flow

Repo Hygiene + Handoff Discipline

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages