Skip to content

Tonic4to/Sentinel-Vision-Platform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sentinel Vision Platform

A vision-guided laser turret built on a custom MIPS-style soft-core processor running on an FPGA.

A host vision pipeline finds a target in a camera stream and streams its pixel coordinates over UART to a Digilent Nexys A7-100T (Artix-7). On the FPGA, a hand-built 5-stage pipelined processor — extended with a custom clamp instruction and a set of memory-mapped peripherals — aims a pan/tilt servo gimbal at the target, fires a laser, and plays audio cues, all behind a layered hardware safety interlock.

This was a semester project for an undergraduate computer-architecture course (Duke ECE 350). The processor core is a from-scratch MIPS-like CPU; everything around it — the ISA extension, the peripherals, the host vision system, and the firmware — was designed to turn that CPU into a working closed-loop targeting system.


Table of contents


What it does

   camera feed                target pixel (x, y)            servo + laser + audio
  ───────────────►  HOST  ──────────────────────────►  FPGA  ──────────────────────►  🎯
   (ESP32-CAM MJPEG)   detect + smooth     9-byte UART @115200    aim, fire, beep
  1. An ESP32-CAM streams MJPEG video over Wi-Fi.
  2. A host program (laptop / Raspberry Pi) detects the target with YOLO or template matching, smooths the centroid, and sends (x, y, valid) over a serial link at 30 Hz.
  3. The FPGA receives the packet, and the soft-core CPU runs a search → track → settle → dwell → fire control loop that drives the gimbal toward the target, fires the laser once it is locked and stable, and triggers audio feedback — never aiming or firing outside a hardware-enforced safe envelope.

System architecture

┌──────────────────────────────┐         ┌─────────────────────────────────────────────────────────┐
│            HOST PC            │         │                NEXYS A7-100T  (Artix-7 FPGA)              │
│                               │         │                                                           │
│  MjpegSource ── BGR frame ──► │  UART   │  uart_rx ─► packet_parser ─► ingest_mux ─► ┌───────────┐  │
│  Detector (YOLO / template)   │ 115200  │                                  (read)    │  MIPS CPU │  │
│  Smoother (EMA + N-of-M gate) │ ──────► │  (writes split by dmem_mux on addr nibble) │  + clamp  │  │
│  FpgaLink.send_target()       │ 9 bytes │                                            └─────┬─────┘  │
│       0xAA 0x55 seq flags     │ 30 Hz   │   mmio_regs ◄──────── sw 0xF00..0xF03 ────────────┘       │
│       x_hi x_lo y_hi y_lo chk │         │     │      │        │                                     │
└──────────────────────────────┘         │     ▼      ▼        ▼                                     │
                                          │  pwm_gen pwm_gen  laser_gate ◄── watchdog + HW interlock  │
                                          │  (pan)   (tilt)       │                                   │
                                          │     │      │          ▼                                   │
                                          │   servo  servo     laser MOSFET      audio: event_detect  │
                                          │                                       → audio_player       │
                                          │                                       → sigma_delta → spkr │
                                          └─────────────────────────────────────────────────────────┘

The FPGA design clocks every peripheral directly from the board's 100 MHz oscillator (the clock-wizard divider used in early bring-up was dropped from the final design).


Repository layout

Path Contents
proc-toolchain/ Single source of truth for the hardware. All Verilog RTL (main/), testbenches, the custom assembler, and the verification/automation scripts.
proc-toolchain/main/ The synthesizable modules: proc/ (CPU), alu/, regfile/, multdiv/, pwm_gen/, laser_gate/, watchdog/, mmio_regs/, dmem_mux/, ingest_mux/, packet_parser/, uart_rx/, audio/, and the sentinel_top/ integration top.
mips/ Assembly control programs (.s) and assembled memory images (.mem). full.s is the production firmware; the rest are bring-up/diagnostic programs.
YOLO_vision/ Host-side Python vision pipeline: capture, detectors, smoothing, and the serial link to the FPGA.
arduino_vision/ ESP32-CAM sketches for the legacy on-camera detection path.
servodebug/ Standalone Python utilities for finding servo travel limits and centering.
docs/ ISA tables, host↔FPGA protocol spec, design notes, board schematic, and the original project proposal.
scripts/ wav_to_mem.py (audio asset compiler), the verification runner, and helpers.

The hardware is built and simulated from proc-toolchain/main. To open it as a Vivado project, create a fresh project targeting the Nexys A7-100T (xc7a100tcsg324-1), add the main/**/*.v sources plus main/sentinel_top/sentinel_top.xdc as constraints, and load mips/full.mem as the instruction ROM image.


The processor: custom ISA

The CPU is a 5-stage pipelined, 32-bit MIPS-style core (fetch / decode / execute / memory / writeback) with full data-forwarding and load-use hazard detection. It is implemented in proc-toolchain/main/proc/processor.v, with the datapath wired together at proc-toolchain/main/sentinel_top/sentinel_top.v.

Base ISA

The base instruction set follows a Waterloo ECE-style MIPS-like ISA (full tables in docs/ece350_isa_tables.md). Instructions are 32 bits with the opcode in bits [31:27]; R-type instructions additionally carry an ALU function code (ALUop) in bits [6:2].

Format Instructions
R-type add sub and or sll sra mul div
I-type addi sw lw bne blt bex setx
J-type j jal jr

Custom instruction: clamp

The one ISA extension beyond the base set is a signed saturating clamp — the feature behind several of this repo's commit messages ("Clamp instruction finally working").

clamp $rd, $rs, $rt, $ru     # $rd = max($rt, min($ru, $rs))   (signed)
                             #   $rs = value,  $rt = lower bound,  $ru = upper bound
Property Value
Mnemonic clamp $rd, $rs, $rt, $ru
Format Custom R4 (four register operands)
Opcode [31:27] 00000 (shares the R-type primary opcode)
Funct [6:2] 01000 (first free function slot)
Semantics $rd = max($rt, min($ru, $rs)), signed

The encoding trick — a fourth register with no spare bits. A standard R-type has only three register fields plus a 5-bit shift-amount literal. clamp needs four registers (dest + value + lo + hi). Rather than grow the instruction word, it repurposes the shamt field [11:7] as a third source-register index ($ru). The field layout becomes opcode | rd | rs | rt | ru | funct — bit-identical to an R-type, but the shamt slot now names a register. See docs/ece350_isa_tables.md and the assembler at proc-toolchain/assembler-python-version/ (instructions.csv: clamp,R4,01000).

What it touches in the datapath. A three-operand instruction in a two-read-port machine required threading a new operand all the way down the pipeline:

  • ALU (alu/alu.v) — a third input port data_operandC was added, and the op-select was widened from 3 to 4 bits to open a "high" function bank where funct 1000 selects clamp. The clamp itself is two combinational mux stages: clamp to the upper bound, then to the lower bound, using $signed comparisons (the pan/tilt errors it bounds are signed). It can only shrink a value, so its overflow output is tied low.
  • Register file — a third read port (port C) was added. It is clamp-only: for every other instruction it reads $zero, so it is inert.
  • Pipeline — port C gets its own D/X latch, three-way forwarding (X→X and M→X bypass) mirroring ports A and B, and an extended load-use hazard check so a clamp immediately following a dependent lw stalls correctly.

Why it exists. The integral pan controller in mips/full.s must keep the commanded servo duty inside [DUTY_MIN, DUTY_MAX]. Without clamp that bound is a multi-instruction blt/blt/j idiom that also flushes the pipeline on its branches. clamp collapses it into a single execute-stage cycle with no control-flow penalty. It is exercised end-to-end by mips/clamp_test.s (below/above/in-range/boundary/negative cases, including back-to-back clamps that stress the port-C bypass).

Software clamp is a guardrail, not the safety boundary — the hardware PWM clamp (below) is the real enforcement and cannot be bypassed by firmware.

Memory-mapped I/O

The CPU has no special I/O instructions — every peripheral is reached with ordinary lw/sw. The 12-bit, word-addressed data space is split by address:

Word addr Name Dir Meaning
0x000 blob_x / target X R Target X from the packet parser
0x001 blob_y / target Y R Target Y from the packet parser
0x002 valid R Target-valid flag (bit 0)
0x003 seq R Packet sequence number (bits 7:0)
0x0040xEFF general RAM R/W Normal data memory
0xF00 SERVO_PAN W Pan PWM pulse width, in 100 MHz ticks
0xF01 SERVO_TILT W Tilt PWM pulse width, in ticks
0xF02 LASER_EN W Per-packet laser request (bit 0)
0xF03 CTRL W bit 0 = arm (set once at boot)

Address constants live in mmio_regs.v and the map is documented at the top of sentinel_top.v. Two details worth knowing:

  • Reads of words 0x0000x003 are spoofed. ingest_mux.v intercepts those four addresses on the read path and returns live packet-parser fields instead of RAM — so the firmware reads fresh target data without a second RAM write port. Writes are routed by dmem_mux.v, which sends anything in page 0xF to mmio_regs and everything else to RAM.
  • Pan/tilt are crossed in firmware. On the bench rig the pan and tilt servo leads are physically swapped, so full.s deliberately writes the pan command to 0xF01 and tilt to 0xF00. The hardware names are correct; the firmware compensates.

Power-on defaults: servos centered (150000 ticks), laser off, disarmed.


Hardware control flow

Module hierarchy

sentinel_top                          (main/sentinel_top/sentinel_top.v)   — project top
├─ processor  CPU                     (main/proc/processor.v)              — 5-stage pipelined core
│   ├─ alu        (+ clamp, port C)   (main/alu/alu.v)
│   ├─ regfile    (3 read ports)      (main/regfile/regfile.v)
│   └─ multdiv                        (main/multdiv/multdiv.v)
├─ ROM   InstMem  (loads full.mem)    (main/proc/ROM.v)
├─ RAM   ProcMem  (0x000..0xEFF)      (main/proc/RAM.v)
├─ dmem_mux       (write router)      (main/dmem_mux/dmem_mux.v)
├─ ingest_mux     (read intercept)    (main/ingest_mux/ingest_mux.v)
├─ uart_rx        (115200, 8-N-1)     (main/uart_rx/uart_rx.v)
├─ packet_parser  (framing+checksum)  (main/packet_parser/packet_parser.v)
├─ uart_debug_tx  (status telemetry)  (main/sentinel_top/uart_tx.v)
├─ mmio_regs      (PAN/TILT/LASER/CTRL)(main/mmio_regs/mmio_regs.v)
├─ pwm_gen ×2     (pan + tilt servo)  (main/pwm_gen/pwm_gen.v)
├─ watchdog       (packet freshness)  (main/watchdog/watchdog.v)
├─ laser_gate     (4-input safety AND)(main/laser_gate/laser_gate.v)
└─ audio
    ├─ event_detect   (triggers)      (main/audio/event_detect.v)
    ├─ sample_tick    (8 kHz strobe)  (main/audio/sample_tick.v)
    ├─ audio_player   (priority FSM)  (main/audio/audio_player.v)
    │   └─ audio_rom ×3 (startup/lockin/fire .mem)  (main/audio/audio_rom.v)
    └─ sigma_delta    (1-bit DAC)     (main/audio/sigma_delta.v)

Wrapper.v in main/proc/ is the standalone ISA test harness (CPU + ROM + RAM + regfile only); sentinel_top.v is the real system top.

Target → servo data path

  1. uart_rx synchronizes the async RX line and recovers bytes with an IDLE→START→DATA→STOP FSM, sampling at mid-bit. At 100 MHz the divider is 868 ticks/bit (434 for the half-bit sample point) → 115200 baud.
  2. packet_parser runs an IDLE→GOT_AA→PAYLOAD FSM over the 9-byte frame, accumulates a running XOR, and on a valid checksum commits target_x / target_y / target_valid / seq and pulses packet_ok for one cycle. Corrupt or partial frames leave the previous values intact.
  3. ingest_mux exposes those fields to the CPU as read-only words 0x0000x003.
  4. The CPU runs full.s: it lws the target, computes a servo command, and sws the result to the MMIO page.
  5. dmem_mux routes the write to mmio_regs, which latches pan_ticks / tilt_ticks and fans them out to the two pwm_gen instances and the laser/arm bits to laser_gate.

A separate uart_debug_tx path streams parser status (sequence, valid, X/Y, error flags) back to the host for bring-up.

Layered laser safety

The laser MOSFET is driven by laser_gate.v, a registered AND of four independent conditions owned by four different layers — no single fault can fire the beam:

laser_pin = sw_enable      # LASER_EN[0]  — per-packet request from firmware
          & ~stale         # watchdog says a fresh packet arrived recently
          & arm_bit        # CTRL[0]      — boot-time arm latch (separate CPU action)
          & hw_interlock   # physical slide switch SW0, never routed through the CPU

The watchdog counts the 100 MHz clock and asserts stale after ~50 ms without a packet_ok. Crucially its counter starts saturated at reset, so the laser is locked out from power-on until the first valid packet — closing the "powered but no data yet" window. Only good packets reset it; a babbling host that sends garbage still trips stale.

Independently, the pwm_gen modules hardware-clamp the commanded pulse width into a safe range before the comparator (pan 100000..200000, tilt 70000..190000 ticks — asymmetric because the gimbal arm hits the chassis). Even a firmware bug cannot aim the laser outside the allowed cone. This — not the software clamp instruction — is the true safety boundary.

Servo PWM

Each pwm_gen produces a 50 Hz (20 ms = 2,000,000-tick) frame and holds the output high while a free-running counter is below the (clamped) commanded width:

Pulse Ticks @100 MHz Servo angle
1.0 ms 100,000 ≈ −45°
1.5 ms 150,000 center
2.0 ms 200,000 ≈ +45°

The CPU writes the pulse width directly in ticks; output is registered and glitch-free, with new commands taking effect within one ≤20 ms frame.

Audio subsystem

event_detect.v turns live state into one-cycle triggers — a startup chirp on the arm edge, a fire cue on the laser-fire edge, and a debounced lock-in cue when a target is acquired. audio_player.v is a priority FSM (fire > lock-in > startup) that streams 8-bit PCM from three ROMs at an 8 kHz sample rate, through a sigma_delta modulator to the board's 1-bit mono audio output. The PCM ROM images (startup.mem, lockin.mem, fire.mem) are generated from source clips by scripts/wav_to_mem.py.


Software control flow (full.s)

mips/full.s is a forever-polling control loop (no interrupts — the ISA has none, and the ≤30 Hz packet rate doesn't need them). It implements a search → track → settle → dwell → fire state machine rather than a naive proportional loop:

  1. Init — load gains and constants, center both servos, write CTRL=1 to arm the laser gate, and seed the tracking state.
  2. Pace to the packet rate — spin until the packet seq changes, so the loop runs in lockstep with the 30 Hz UART feed. If the host goes silent the gimbal freezes and the watchdog handles the laser.
  3. No target (valid==0) — sweep the pan servo in a triangle wave to search; if a target was just lost, hold position for up to LOST_MAX frames before falling back to sweep.
  4. Target acquired — on the search→track edge, seed the tracked duty from the current sweep position so the barrel doesn't jump.
  5. Settle gate — after commanding a move, wait a few frames before measuring again, so corrections don't stack on an in-flight servo.
  6. Track — compute err_x = CX − blob_x, apply an integral pan controller (duty += err_x * KI), bound it with the custom clamp instruction, and write it to SERVO_PAN.
  7. Lock & fire — once the error is inside a tolerance band for DWELL_N consecutive frames, assert LASER_EN; otherwise keep it low.

Tilt tracking is intentionally parked at center in this image so pan dynamics could be tuned in isolation. Other programs in mips/ are focused bring-up tests (laser_on.s, servo_*.s, clamp_test.s).


Host vision pipeline

The host side lives in YOLO_vision/ and is orchestrated by main.py: capture a frame → detect → smooth → send. It is configured entirely through config.yaml.

Serial packet format

The host and the FPGA packet_parser agree on a 9-byte, big-endian, XOR-checksummed frame (built in link/fpga_link.py; spec in docs/fpga-host-communication.md):

Byte Field Notes
0 0xAA sync 1
1 0x55 sync 2
2 seq increments per packet, wraps 0–255
3 flags bit 0 = valid; others reserved
4–5 x_hi, x_lo target X, big-endian
6–7 y_hi, y_lo target Y, big-endian
8 chk XOR of bytes 0–7

Link: 115200 baud, 8-N-1, 30 Hz. Coordinates are clamped to the 320×240 frame grid host-side. Miss policy: when no target is found, the host still sends a packet every cycle with valid=0 (rather than going silent) so the FPGA watchdog stays fed and only the valid bit toggles.

Detectors

Detection is pluggable behind a small Detector protocol (detectors/base.py), selected by detector.kind in the config:

  • Template matcher (template_matcher.py) — multi-scale normalized cross-correlation against a captured template. No training, fast enough for 30+ fps on a Raspberry Pi; best for a fixed rigid target.
  • YOLO (yolo_detector.py) — Ultralytics YOLOv8-nano (yolov8n.pt), imported lazily so the template path needs no torch. Picks the highest-confidence box per frame; can be filtered to specific classes or pointed at a fine-tuned weight.

Both feed a Smoother that applies an EMA to the centroid and an N-of-M temporal gate before a target is declared valid, and resets after a detection gap so re-acquisition doesn't drag from a stale position.

Helper tools in YOLO_vision/tools/: calibrate_template.py (grab a template ROI from a live frame), live_preview.py (annotated preview, no UART), and host_replay.py (replay a canned coordinate sequence for FPGA bring-up without a camera).

On-camera (ESP32) detection — legacy path

arduino_vision/ holds two ESP32-CAM sketches that do detection on the microcontroller and emit packets directly — an earlier "smart sensor" approach that the host pipeline replaced:

  • esp_color_detection/ — ratiometric green-blob detection with centroid + temporal filtering.
  • radio_detection/ — an Edge Impulse on-device neural detector.

Note: both sketches emit an older 6-byte / single-0xAA / 9600-baud frame and are not wire-compatible with the current 9-byte parser. They predate the host-side rework and are kept for reference. In the current system the ESP32 runs the stock CameraWebServer MJPEG streamer and all detection happens on the host.


Toolchain

proc-toolchain/ is a self-contained build/test harness:

  • Assembler (assembler-python-version/) — assembles .s.mem, including the custom clamp R4 instruction (defined in instructions.csv).
  • Autotester (autotester.py) — runs assembly programs through an Icarus Verilog simulation and diffs register/memory state against expected results in test_files/.
  • Helper scripts (helper_scripts/) — compilation, HTML report generation, a "banned Verilog" linter for the course's structural-only constraints, and more.

Build & run

Hardware (FPGA):

  1. Assemble the firmware: proc-toolchain/assembler-python-version/assemble.py mips/full.smips/full.mem.
  2. In Vivado, target the Nexys A7-100T, add proc-toolchain/main/**/*.v and main/sentinel_top/sentinel_top.xdc, point the instruction ROM at mips/full.mem, and generate a bitstream.
  3. Program the board; flip the laser interlock switch (SW0) to enable firing.

Host vision:

cd YOLO_vision
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# edit config.yaml: set esp.stream_url to your ESP32-CAM, and link.port to your serial port
python main.py

Verification

Two layers, both driven by scripts/run_sentinel_verification.sh:

  1. ISA regression — a subset of the course's processor test suite, assembled and simulated, checked for the expected pass marker.
  2. System integrationsentinel_top_tb.v drives the whole datapath (UART → parser → MMIO → PWM/laser/watchdog) and checks end-to-end behavior. Module-level testbenches (*_tb.v) accompany most peripherals, and the audio chain has its own (tb_audio_player.v, tb_event_detect.v, tb_sigma_delta.v).

Acknowledgements

The base processor core descends from a Duke ECE 350 (Digital Systems) lab sequence (ALU, register file, mult/div, pipelined CPU); the assignment specs are preserved as the Checkpoint *.pdf files. Everything beyond the base CPU — the clamp ISA extension, the peripheral set, the safety architecture, the host vision pipeline, and the firmware — was designed for this project. Built by Nicolas Vasilescu and Luhan Wang.

About

Vision-guided laser turret on a custom FPGA MIPS soft-core processor (Nexys A7): host YOLO/template detection streams targets over UART to drive pan/tilt servos, a laser, and audio — behind layered hardware safety.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors