|
| 1 | +// SPDX-License-Identifier: PMPL-1.0-or-later |
| 2 | +// Copyright (c) 2026 Jonathan D.A. Jewell (hyperpolymath) |
| 3 | += VeriSimDB Ecosystem Ingest Pipeline |
| 4 | +:version: 0.1.0-draft |
| 5 | +:status: Draft |
| 6 | +:updated: 2026-04-05 |
| 7 | + |
| 8 | +== Purpose |
| 9 | + |
| 10 | +Derive ecosystem-link octads from the code itself — not from hand-written |
| 11 | +ECOSYSTEM.a2ml files. The pipeline scans every repo for dependency manifests |
| 12 | +and import statements, normalises them to repo identities, and emits |
| 13 | +`ecosystem-link` octads to VeriSimDB (port 8097). |
| 14 | + |
| 15 | +This replaces the Graph-modality section of ECOSYSTEM.a2ml with a derived, |
| 16 | +continuously-updated graph. |
| 17 | + |
| 18 | +== Evidence Sources |
| 19 | + |
| 20 | +[cols="1,2,2",options="header"] |
| 21 | +|=== |
| 22 | +| Source | File(s) | Link type |
| 23 | + |
| 24 | +| Cargo (Rust) | `Cargo.toml` [dependencies] | `cargo-dep` |
| 25 | +| Deno | `deno.json` imports | `deno-import` |
| 26 | +| Mix (Elixir/Gleam) | `mix.exs` deps | `mix-dep` |
| 27 | +| Pack (Idris2) | `pack.toml`, `*.ipkg` depends | `idris-dep` |
| 28 | +| Zig | `build.zig.zon` | `zig-dep` |
| 29 | +| Guix/Nix | `guix.scm`, `flake.nix` inputs | `nix-input` |
| 30 | +| Just recipes | `Justfile` cross-repo refs | `build-ref` |
| 31 | +| GitHub Actions | `uses:` in `.github/workflows/*.yml` | `action-ref` |
| 32 | +| Inline imports | `import`, `use`, `require` statements | `code-import` |
| 33 | +|=== |
| 34 | + |
| 35 | +== Octad Shape |
| 36 | + |
| 37 | +Each link becomes one `ecosystem-link` octad (schema already defined in |
| 38 | +`.verisimdb/config.toml`): |
| 39 | + |
| 40 | +- **Semantic**: `source_repo`, `target_repo`, `link_type`, `strength` |
| 41 | +- **Temporal**: `first_seen`, `last_confirmed`, `stale_after_days` |
| 42 | +- **Graph**: `dependency_chain[]`, `transitive_consumers[]` |
| 43 | +- **Provenance**: `detection_method`, `evidence_file`, `evidence_line` |
| 44 | + |
| 45 | +**Strength** is a float in [0, 1]: |
| 46 | + |
| 47 | +- 1.0 — declared direct dependency (manifest entry) |
| 48 | +- 0.7 — import statement found in code |
| 49 | +- 0.4 — mentioned in config or docs only |
| 50 | +- 0.1 — text match in comments |
| 51 | + |
| 52 | +== Pipeline Stages |
| 53 | + |
| 54 | +[source] |
| 55 | +---- |
| 56 | +scan -> parse -> normalise -> resolve -> emit |
| 57 | +(walk) (format) (name) (repo URL) (VeriSimDB) |
| 58 | +---- |
| 59 | + |
| 60 | +=== Stage 1 — scan |
| 61 | + |
| 62 | +Walk every repo under `~/Documents/hyperpolymath-repos/`. For each file |
| 63 | +matching the patterns in the evidence-source table, record `(repo, path, |
| 64 | +format)`. |
| 65 | + |
| 66 | +=== Stage 2 — parse |
| 67 | + |
| 68 | +Per-format parsers extract dependency edges: |
| 69 | + |
| 70 | +- TOML (Cargo, pack.toml, zon) — parse and read `[dependencies]` |
| 71 | +- deno.json — parse as JSON (reluctantly) and read `imports` |
| 72 | +- mix.exs — regex `{:dep, "~> x.y"}` entries |
| 73 | +- Justfile — regex `just -d ../other-repo` invocations |
| 74 | +- Workflows — regex `uses: owner/repo@sha` |
| 75 | + |
| 76 | +=== Stage 3 — normalise |
| 77 | + |
| 78 | +Dependency names (`serde`, `deno.land/std`, `:phoenix`) are mapped to |
| 79 | +canonical repo identities via a registry file |
| 80 | +(`.verisimdb/repo-aliases.a2ml`). Unknown names are emitted with |
| 81 | +`target_repo="UNRESOLVED:<raw>"` so they can be resolved later. |
| 82 | + |
| 83 | +=== Stage 4 — resolve |
| 84 | + |
| 85 | +For each canonical repo name, resolve to a git URL from the hyperpolymath |
| 86 | +registry (if known) or leave as external. External deps are tagged |
| 87 | +`external=true`. |
| 88 | + |
| 89 | +=== Stage 5 — emit |
| 90 | + |
| 91 | +For each edge, create or update the `ecosystem-link` octad: |
| 92 | + |
| 93 | +- If an octad with the same `(source_repo, target_repo, link_type)` exists, |
| 94 | + update `last_confirmed` and keep `first_seen` |
| 95 | +- Otherwise create a new octad with `first_seen = last_confirmed = now` |
| 96 | +- Emit a Groove signal `ecosystem.link.discovered` (for new) or |
| 97 | + nothing (for updates) |
| 98 | + |
| 99 | +A temporal rule in VeriSimDB marks octads where |
| 100 | +`now - last_confirmed > stale_after_days` as stale and emits |
| 101 | +`ecosystem.link.stale`. |
| 102 | + |
| 103 | +== Scheduling |
| 104 | + |
| 105 | +- **On-demand**: `just ecosystem-ingest` in the standards repo |
| 106 | +- **Per-commit**: consumed via `commit.pushed` Groove signal — re-scan the |
| 107 | + changed repo only |
| 108 | +- **Weekly full sweep**: cron in CI — scans all 290+ repos |
| 109 | + |
| 110 | +== Implementation Plan (not this session) |
| 111 | + |
| 112 | +1. `ecosystem-ingest` CLI in Rust (~500 LOC) |
| 113 | +2. Per-format parsers as separate modules |
| 114 | +3. Repo alias registry (`.verisimdb/repo-aliases.a2ml`) |
| 115 | +4. VeriSimDB client using the Groove protocol |
| 116 | + |
| 117 | +== Consumer Wiring |
| 118 | + |
| 119 | +Once emitted, octads drive: |
| 120 | + |
| 121 | +- **PanLL Ecosystem panel** — render the dependency graph |
| 122 | +- **Hypatia `ecosystem-drift` rule** — flag when a repo's declared |
| 123 | + ECOSYSTEM.a2ml disagrees with the derived graph |
| 124 | +- **CRG evidence** — consumer count feeds into grade basis |
| 125 | + |
| 126 | +== Open Questions |
| 127 | + |
| 128 | +- Monorepo handling: each sub-project emits its own edges? Yes — the |
| 129 | + `source_repo` becomes `monorepo-name/sub-project-name`. |
| 130 | +- Circular deps: record as-is; let consumers flag cycles. |
| 131 | +- Version pinning: v0.1 ignores versions (just captures the edge); v0.2 may |
| 132 | + add a `version_range` field. |
0 commit comments