A vision-guided laser turret built on a custom MIPS-style soft-core processor running on an FPGA.
A host vision pipeline finds a target in a camera stream and streams its pixel coordinates over UART to a Digilent Nexys A7-100T (Artix-7). On the FPGA, a hand-built 5-stage pipelined processor — extended with a custom clamp instruction and a set of memory-mapped peripherals — aims a pan/tilt servo gimbal at the target, fires a laser, and plays audio cues, all behind a layered hardware safety interlock.
This was a semester project for an undergraduate computer-architecture course (Duke ECE 350). The processor core is a from-scratch MIPS-like CPU; everything around it — the ISA extension, the peripherals, the host vision system, and the firmware — was designed to turn that CPU into a working closed-loop targeting system.
- What it does
- System architecture
- Repository layout
- The processor: custom ISA
- Hardware control flow
- Software control flow (
full.s) - Host vision pipeline
- Toolchain
- Build & run
- Verification
- Acknowledgements
camera feed target pixel (x, y) servo + laser + audio
───────────────► HOST ──────────────────────────► FPGA ──────────────────────► 🎯
(ESP32-CAM MJPEG) detect + smooth 9-byte UART @115200 aim, fire, beep
- An ESP32-CAM streams MJPEG video over Wi-Fi.
- A host program (laptop / Raspberry Pi) detects the target with YOLO or template matching, smooths the centroid, and sends
(x, y, valid)over a serial link at 30 Hz. - The FPGA receives the packet, and the soft-core CPU runs a search → track → settle → dwell → fire control loop that drives the gimbal toward the target, fires the laser once it is locked and stable, and triggers audio feedback — never aiming or firing outside a hardware-enforced safe envelope.
┌──────────────────────────────┐ ┌─────────────────────────────────────────────────────────┐
│ HOST PC │ │ NEXYS A7-100T (Artix-7 FPGA) │
│ │ │ │
│ MjpegSource ── BGR frame ──► │ UART │ uart_rx ─► packet_parser ─► ingest_mux ─► ┌───────────┐ │
│ Detector (YOLO / template) │ 115200 │ (read) │ MIPS CPU │ │
│ Smoother (EMA + N-of-M gate) │ ──────► │ (writes split by dmem_mux on addr nibble) │ + clamp │ │
│ FpgaLink.send_target() │ 9 bytes │ └─────┬─────┘ │
│ 0xAA 0x55 seq flags │ 30 Hz │ mmio_regs ◄──────── sw 0xF00..0xF03 ────────────┘ │
│ x_hi x_lo y_hi y_lo chk │ │ │ │ │ │
└──────────────────────────────┘ │ ▼ ▼ ▼ │
│ pwm_gen pwm_gen laser_gate ◄── watchdog + HW interlock │
│ (pan) (tilt) │ │
│ │ │ ▼ │
│ servo servo laser MOSFET audio: event_detect │
│ → audio_player │
│ → sigma_delta → spkr │
└─────────────────────────────────────────────────────────┘
The FPGA design clocks every peripheral directly from the board's 100 MHz oscillator (the clock-wizard divider used in early bring-up was dropped from the final design).
| Path | Contents |
|---|---|
proc-toolchain/ |
Single source of truth for the hardware. All Verilog RTL (main/), testbenches, the custom assembler, and the verification/automation scripts. |
proc-toolchain/main/ |
The synthesizable modules: proc/ (CPU), alu/, regfile/, multdiv/, pwm_gen/, laser_gate/, watchdog/, mmio_regs/, dmem_mux/, ingest_mux/, packet_parser/, uart_rx/, audio/, and the sentinel_top/ integration top. |
mips/ |
Assembly control programs (.s) and assembled memory images (.mem). full.s is the production firmware; the rest are bring-up/diagnostic programs. |
YOLO_vision/ |
Host-side Python vision pipeline: capture, detectors, smoothing, and the serial link to the FPGA. |
arduino_vision/ |
ESP32-CAM sketches for the legacy on-camera detection path. |
servodebug/ |
Standalone Python utilities for finding servo travel limits and centering. |
docs/ |
ISA tables, host↔FPGA protocol spec, design notes, board schematic, and the original project proposal. |
scripts/ |
wav_to_mem.py (audio asset compiler), the verification runner, and helpers. |
The hardware is built and simulated from
proc-toolchain/main. To open it as a Vivado project, create a fresh project targeting the Nexys A7-100T (xc7a100tcsg324-1), add themain/**/*.vsources plusmain/sentinel_top/sentinel_top.xdcas constraints, and loadmips/full.memas the instruction ROM image.
The CPU is a 5-stage pipelined, 32-bit MIPS-style core (fetch / decode / execute / memory / writeback) with full data-forwarding and load-use hazard detection. It is implemented in proc-toolchain/main/proc/processor.v, with the datapath wired together at proc-toolchain/main/sentinel_top/sentinel_top.v.
The base instruction set follows a Waterloo ECE-style MIPS-like ISA (full tables in docs/ece350_isa_tables.md). Instructions are 32 bits with the opcode in bits [31:27]; R-type instructions additionally carry an ALU function code (ALUop) in bits [6:2].
| Format | Instructions |
|---|---|
| R-type | add sub and or sll sra mul div |
| I-type | addi sw lw bne blt bex setx |
| J-type | j jal jr |
The one ISA extension beyond the base set is a signed saturating clamp — the feature behind several of this repo's commit messages ("Clamp instruction finally working").
clamp $rd, $rs, $rt, $ru # $rd = max($rt, min($ru, $rs)) (signed)
# $rs = value, $rt = lower bound, $ru = upper bound| Property | Value |
|---|---|
| Mnemonic | clamp $rd, $rs, $rt, $ru |
| Format | Custom R4 (four register operands) |
Opcode [31:27] |
00000 (shares the R-type primary opcode) |
Funct [6:2] |
01000 (first free function slot) |
| Semantics | $rd = max($rt, min($ru, $rs)), signed |
The encoding trick — a fourth register with no spare bits. A standard R-type has only three register fields plus a 5-bit shift-amount literal. clamp needs four registers (dest + value + lo + hi). Rather than grow the instruction word, it repurposes the shamt field [11:7] as a third source-register index ($ru). The field layout becomes opcode | rd | rs | rt | ru | funct — bit-identical to an R-type, but the shamt slot now names a register. See docs/ece350_isa_tables.md and the assembler at proc-toolchain/assembler-python-version/ (instructions.csv: clamp,R4,01000).
What it touches in the datapath. A three-operand instruction in a two-read-port machine required threading a new operand all the way down the pipeline:
- ALU (
alu/alu.v) — a third input portdata_operandCwas added, and the op-select was widened from 3 to 4 bits to open a "high" function bank where funct1000selects clamp. The clamp itself is two combinational mux stages: clamp to the upper bound, then to the lower bound, using$signedcomparisons (the pan/tilt errors it bounds are signed). It can only shrink a value, so its overflow output is tied low. - Register file — a third read port (port C) was added. It is clamp-only: for every other instruction it reads
$zero, so it is inert. - Pipeline — port C gets its own D/X latch, three-way forwarding (X→X and M→X bypass) mirroring ports A and B, and an extended load-use hazard check so a
clampimmediately following a dependentlwstalls correctly.
Why it exists. The integral pan controller in mips/full.s must keep the commanded servo duty inside [DUTY_MIN, DUTY_MAX]. Without clamp that bound is a multi-instruction blt/blt/j idiom that also flushes the pipeline on its branches. clamp collapses it into a single execute-stage cycle with no control-flow penalty. It is exercised end-to-end by mips/clamp_test.s (below/above/in-range/boundary/negative cases, including back-to-back clamps that stress the port-C bypass).
Software
clampis a guardrail, not the safety boundary — the hardware PWM clamp (below) is the real enforcement and cannot be bypassed by firmware.
The CPU has no special I/O instructions — every peripheral is reached with ordinary lw/sw. The 12-bit, word-addressed data space is split by address:
| Word addr | Name | Dir | Meaning |
|---|---|---|---|
0x000 |
blob_x / target X |
R | Target X from the packet parser |
0x001 |
blob_y / target Y |
R | Target Y from the packet parser |
0x002 |
valid |
R | Target-valid flag (bit 0) |
0x003 |
seq |
R | Packet sequence number (bits 7:0) |
0x004–0xEFF |
general RAM | R/W | Normal data memory |
0xF00 |
SERVO_PAN |
W | Pan PWM pulse width, in 100 MHz ticks |
0xF01 |
SERVO_TILT |
W | Tilt PWM pulse width, in ticks |
0xF02 |
LASER_EN |
W | Per-packet laser request (bit 0) |
0xF03 |
CTRL |
W | bit 0 = arm (set once at boot) |
Address constants live in mmio_regs.v and the map is documented at the top of sentinel_top.v. Two details worth knowing:
- Reads of words
0x000–0x003are spoofed.ingest_mux.vintercepts those four addresses on the read path and returns live packet-parser fields instead of RAM — so the firmware reads fresh target data without a second RAM write port. Writes are routed bydmem_mux.v, which sends anything in page0xFtommio_regsand everything else to RAM. - Pan/tilt are crossed in firmware. On the bench rig the pan and tilt servo leads are physically swapped, so
full.sdeliberately writes the pan command to0xF01and tilt to0xF00. The hardware names are correct; the firmware compensates.
Power-on defaults: servos centered (150000 ticks), laser off, disarmed.
sentinel_top (main/sentinel_top/sentinel_top.v) — project top
├─ processor CPU (main/proc/processor.v) — 5-stage pipelined core
│ ├─ alu (+ clamp, port C) (main/alu/alu.v)
│ ├─ regfile (3 read ports) (main/regfile/regfile.v)
│ └─ multdiv (main/multdiv/multdiv.v)
├─ ROM InstMem (loads full.mem) (main/proc/ROM.v)
├─ RAM ProcMem (0x000..0xEFF) (main/proc/RAM.v)
├─ dmem_mux (write router) (main/dmem_mux/dmem_mux.v)
├─ ingest_mux (read intercept) (main/ingest_mux/ingest_mux.v)
├─ uart_rx (115200, 8-N-1) (main/uart_rx/uart_rx.v)
├─ packet_parser (framing+checksum) (main/packet_parser/packet_parser.v)
├─ uart_debug_tx (status telemetry) (main/sentinel_top/uart_tx.v)
├─ mmio_regs (PAN/TILT/LASER/CTRL)(main/mmio_regs/mmio_regs.v)
├─ pwm_gen ×2 (pan + tilt servo) (main/pwm_gen/pwm_gen.v)
├─ watchdog (packet freshness) (main/watchdog/watchdog.v)
├─ laser_gate (4-input safety AND)(main/laser_gate/laser_gate.v)
└─ audio
├─ event_detect (triggers) (main/audio/event_detect.v)
├─ sample_tick (8 kHz strobe) (main/audio/sample_tick.v)
├─ audio_player (priority FSM) (main/audio/audio_player.v)
│ └─ audio_rom ×3 (startup/lockin/fire .mem) (main/audio/audio_rom.v)
└─ sigma_delta (1-bit DAC) (main/audio/sigma_delta.v)
Wrapper.v in main/proc/ is the standalone ISA test harness (CPU + ROM + RAM + regfile only); sentinel_top.v is the real system top.
uart_rxsynchronizes the async RX line and recovers bytes with anIDLE→START→DATA→STOPFSM, sampling at mid-bit. At 100 MHz the divider is868ticks/bit (434for the half-bit sample point) → 115200 baud.packet_parserruns anIDLE→GOT_AA→PAYLOADFSM over the 9-byte frame, accumulates a running XOR, and on a valid checksum commitstarget_x / target_y / target_valid / seqand pulsespacket_okfor one cycle. Corrupt or partial frames leave the previous values intact.ingest_muxexposes those fields to the CPU as read-only words0x000–0x003.- The CPU runs
full.s: itlws the target, computes a servo command, andsws the result to the MMIO page. dmem_muxroutes the write tommio_regs, which latchespan_ticks/tilt_ticksand fans them out to the twopwm_geninstances and the laser/arm bits tolaser_gate.
A separate uart_debug_tx path streams parser status (sequence, valid, X/Y, error flags) back to the host for bring-up.
The laser MOSFET is driven by laser_gate.v, a registered AND of four independent conditions owned by four different layers — no single fault can fire the beam:
laser_pin = sw_enable # LASER_EN[0] — per-packet request from firmware
& ~stale # watchdog says a fresh packet arrived recently
& arm_bit # CTRL[0] — boot-time arm latch (separate CPU action)
& hw_interlock # physical slide switch SW0, never routed through the CPU
The watchdog counts the 100 MHz clock and asserts stale after ~50 ms without a packet_ok. Crucially its counter starts saturated at reset, so the laser is locked out from power-on until the first valid packet — closing the "powered but no data yet" window. Only good packets reset it; a babbling host that sends garbage still trips stale.
Independently, the pwm_gen modules hardware-clamp the commanded pulse width into a safe range before the comparator (pan 100000..200000, tilt 70000..190000 ticks — asymmetric because the gimbal arm hits the chassis). Even a firmware bug cannot aim the laser outside the allowed cone. This — not the software clamp instruction — is the true safety boundary.
Each pwm_gen produces a 50 Hz (20 ms = 2,000,000-tick) frame and holds the output high while a free-running counter is below the (clamped) commanded width:
| Pulse | Ticks @100 MHz | Servo angle |
|---|---|---|
| 1.0 ms | 100,000 | ≈ −45° |
| 1.5 ms | 150,000 | center |
| 2.0 ms | 200,000 | ≈ +45° |
The CPU writes the pulse width directly in ticks; output is registered and glitch-free, with new commands taking effect within one ≤20 ms frame.
event_detect.v turns live state into one-cycle triggers — a startup chirp on the arm edge, a fire cue on the laser-fire edge, and a debounced lock-in cue when a target is acquired. audio_player.v is a priority FSM (fire > lock-in > startup) that streams 8-bit PCM from three ROMs at an 8 kHz sample rate, through a sigma_delta modulator to the board's 1-bit mono audio output. The PCM ROM images (startup.mem, lockin.mem, fire.mem) are generated from source clips by scripts/wav_to_mem.py.
mips/full.s is a forever-polling control loop (no interrupts — the ISA has none, and the ≤30 Hz packet rate doesn't need them). It implements a search → track → settle → dwell → fire state machine rather than a naive proportional loop:
- Init — load gains and constants, center both servos, write
CTRL=1to arm the laser gate, and seed the tracking state. - Pace to the packet rate — spin until the packet
seqchanges, so the loop runs in lockstep with the 30 Hz UART feed. If the host goes silent the gimbal freezes and the watchdog handles the laser. - No target (
valid==0) — sweep the pan servo in a triangle wave to search; if a target was just lost, hold position for up toLOST_MAXframes before falling back to sweep. - Target acquired — on the search→track edge, seed the tracked duty from the current sweep position so the barrel doesn't jump.
- Settle gate — after commanding a move, wait a few frames before measuring again, so corrections don't stack on an in-flight servo.
- Track — compute
err_x = CX − blob_x, apply an integral pan controller (duty += err_x * KI), bound it with the customclampinstruction, and write it toSERVO_PAN. - Lock & fire — once the error is inside a tolerance band for
DWELL_Nconsecutive frames, assertLASER_EN; otherwise keep it low.
Tilt tracking is intentionally parked at center in this image so pan dynamics could be tuned in isolation. Other programs in mips/ are focused bring-up tests (laser_on.s, servo_*.s, clamp_test.s).
The host side lives in YOLO_vision/ and is orchestrated by main.py: capture a frame → detect → smooth → send. It is configured entirely through config.yaml.
The host and the FPGA packet_parser agree on a 9-byte, big-endian, XOR-checksummed frame (built in link/fpga_link.py; spec in docs/fpga-host-communication.md):
| Byte | Field | Notes |
|---|---|---|
| 0 | 0xAA |
sync 1 |
| 1 | 0x55 |
sync 2 |
| 2 | seq |
increments per packet, wraps 0–255 |
| 3 | flags |
bit 0 = valid; others reserved |
| 4–5 | x_hi, x_lo |
target X, big-endian |
| 6–7 | y_hi, y_lo |
target Y, big-endian |
| 8 | chk |
XOR of bytes 0–7 |
Link: 115200 baud, 8-N-1, 30 Hz. Coordinates are clamped to the 320×240 frame grid host-side. Miss policy: when no target is found, the host still sends a packet every cycle with valid=0 (rather than going silent) so the FPGA watchdog stays fed and only the valid bit toggles.
Detection is pluggable behind a small Detector protocol (detectors/base.py), selected by detector.kind in the config:
- Template matcher (
template_matcher.py) — multi-scale normalized cross-correlation against a captured template. No training, fast enough for 30+ fps on a Raspberry Pi; best for a fixed rigid target. - YOLO (
yolo_detector.py) — Ultralytics YOLOv8-nano (yolov8n.pt), imported lazily so the template path needs no torch. Picks the highest-confidence box per frame; can be filtered to specific classes or pointed at a fine-tuned weight.
Both feed a Smoother that applies an EMA to the centroid and an N-of-M temporal gate before a target is declared valid, and resets after a detection gap so re-acquisition doesn't drag from a stale position.
Helper tools in YOLO_vision/tools/: calibrate_template.py (grab a template ROI from a live frame), live_preview.py (annotated preview, no UART), and host_replay.py (replay a canned coordinate sequence for FPGA bring-up without a camera).
arduino_vision/ holds two ESP32-CAM sketches that do detection on the microcontroller and emit packets directly — an earlier "smart sensor" approach that the host pipeline replaced:
esp_color_detection/— ratiometric green-blob detection with centroid + temporal filtering.radio_detection/— an Edge Impulse on-device neural detector.
Note: both sketches emit an older 6-byte / single-
0xAA/ 9600-baud frame and are not wire-compatible with the current 9-byte parser. They predate the host-side rework and are kept for reference. In the current system the ESP32 runs the stockCameraWebServerMJPEG streamer and all detection happens on the host.
proc-toolchain/ is a self-contained build/test harness:
- Assembler (
assembler-python-version/) — assembles.s→.mem, including the customclampR4 instruction (defined ininstructions.csv). - Autotester (
autotester.py) — runs assembly programs through an Icarus Verilog simulation and diffs register/memory state against expected results intest_files/. - Helper scripts (
helper_scripts/) — compilation, HTML report generation, a "banned Verilog" linter for the course's structural-only constraints, and more.
Hardware (FPGA):
- Assemble the firmware:
proc-toolchain/assembler-python-version/assemble.py mips/full.s→mips/full.mem. - In Vivado, target the Nexys A7-100T, add
proc-toolchain/main/**/*.vandmain/sentinel_top/sentinel_top.xdc, point the instruction ROM atmips/full.mem, and generate a bitstream. - Program the board; flip the laser interlock switch (SW0) to enable firing.
Host vision:
cd YOLO_vision
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# edit config.yaml: set esp.stream_url to your ESP32-CAM, and link.port to your serial port
python main.pyTwo layers, both driven by scripts/run_sentinel_verification.sh:
- ISA regression — a subset of the course's processor test suite, assembled and simulated, checked for the expected pass marker.
- System integration —
sentinel_top_tb.vdrives the whole datapath (UART → parser → MMIO → PWM/laser/watchdog) and checks end-to-end behavior. Module-level testbenches (*_tb.v) accompany most peripherals, and the audio chain has its own (tb_audio_player.v,tb_event_detect.v,tb_sigma_delta.v).
The base processor core descends from a Duke ECE 350 (Digital Systems) lab sequence (ALU, register file, mult/div, pipelined CPU); the assignment specs are preserved as the Checkpoint *.pdf files. Everything beyond the base CPU — the clamp ISA extension, the peripheral set, the safety architecture, the host vision pipeline, and the firmware — was designed for this project. Built by Nicolas Vasilescu and Luhan Wang.