Skip to content

feat(authz): replace bespoke FGA with embedded OpenFGA ReBAC engine#625

Open
lakhansamani wants to merge 39 commits into
mainfrom
feat/fga-engine-spi
Open

feat(authz): replace bespoke FGA with embedded OpenFGA ReBAC engine#625
lakhansamani wants to merge 39 commits into
mainfrom
feat/fga-engine-spi

Conversation

@lakhansamani

Copy link
Copy Markdown
Contributor

Summary

Replaces the not-yet-rolled-out Resource/Scope/Policy/Permission authorization engine (#607/#610/#611) with an OpenFGA-backed ReBAC engine. Since the old FGA was never released, it is removed entirely rather than deprecated.

What changed

  • Engine SPI (internal/authorization/engine) with an embedded OpenFGA implementation (memory/sqlite/postgres/mysql datastores) and external-mode scaffolding. Selected via --authorization-engine=fga.
  • GraphQL API
    • Admin (super-admin gated, audited): _fga_write_model, _fga_get_model, _fga_write_tuples, _fga_delete_tuples, _fga_read_tuples.
    • Runtime: fga_check, fga_batch_check, fga_list_objects — the subject is pinned to the authenticated token, never client-supplied; fail-closed.
  • Session/validate: required_permissions replaced by required_relations on session, validate_session, validate_jwt_token. Coarse roles/scope gating is unchanged.
  • Dashboard: new FGA admin UI — authorization-model editor, relationship tuples, access tester.
  • SQLite driver: standardized on modernc.org/sqlite via a local GORM dialect so the embedded OpenFGA SQL datastore links without a duplicate database/sql "sqlite" registration. Pure-Go, no CGO.

New flags

--authorization-engine (policy default | fga), --fga-mode (embedded|external), --fga-store (memory|sqlite|postgres|mysql), --fga-store-url, --fga-external-url.

Deployment modes

  • single-node/dev: embedded + SQLite (migrations on boot).
  • HA / serverless: external Postgres/MySQL (or external OpenFGA service); run migrations as a separate init job; no embedded SQLite (ephemeral/non-shared disk).

Testing

  • go build ./..., go vet ./... clean.
  • SQLite integration + storage + graphql + authorization tests pass.
  • Embedded SQLite FGA verified in-process alongside GORM SQLite (no driver-registration panic).
  • Default binary starts without the previous Register called twice for driver sqlite panic.

Follow-ups (not in this PR)

  • SDK cleanup in authorizer-go / authorizer-js (remove required_permissions, add FGA client helpers).
  • Auth0 FGA → Authorizer import tool + MIGRATION.md.

Design docs included: FGA_OPENFGA_MIGRATION_PLAN.md, ENTERPRISE_AUTHZ_MODEL.md, AGENTIC_DELEGATION_DESIGN.md.

Remove the not-yet-rolled-out Resource/Scope/Policy/Permission engine
(#607/#610/#611) and replace it with an OpenFGA-backed ReBAC engine.

- AuthorizationEngine SPI (internal/authorization/engine) with an embedded
  OpenFGA implementation (memory/sqlite/postgres/mysql datastores) plus
  external-mode flag scaffolding.
- GraphQL: admin _fga_write_model/_fga_get_model/_fga_write_tuples/
  _fga_delete_tuples/_fga_read_tuples and runtime fga_check/fga_batch_check/
  fga_list_objects (runtime principal pinned to the token subject).
- required_relations on session/validate_session/validate_jwt_token;
  coarse roles/scope gating unchanged.
- Dashboard FGA admin UI: authorization model editor, relationship tuples,
  access tester.
- Standardize the SQLite driver on modernc.org/sqlite via a local GORM
  dialect so the embedded OpenFGA SQL datastore links without a duplicate
  database/sql "sqlite" registration.

Flags: --authorization-engine, --fga-mode, --fga-store, --fga-store-url,
--fga-external-url.
…model

Add design docs for the OpenFGA migration and the agentic-authorization
program, and update the v2 roadmap.

- FGA_OPENFGA_MIGRATION_PLAN.md: phased plan, locked decisions, deployment
  modes (single-node / HA / serverless), implementation status.
- ENTERPRISE_AUTHZ_MODEL.md: OpenFGA model patterns (role grants,
  user-specific overrides, exclusions, hierarchy) with a worked example.
- AGENTIC_DELEGATION_DESIGN.md: RFC 8693 token exchange, act claim,
  attenuation, audit delegation chain, revocation.
- FGA_IMPLEMENTATION_AGENTS.md: program execution plan.
- ROADMAP_V2.md: agentic authorization track; corrected FGA/audit status.
…on FGA checks

- Admin introspection ops `_fga_list_users` and `_fga_expand` (super-admin
  gated). These reveal the access graph (who-can-access / why), so they are
  admin-only rather than end-user facing.
- Optional, trust-gated `user` on `fga_check`/`fga_batch_check`/
  `fga_list_objects`: a super-admin may query an explicit subject; an ordinary
  end-user token stays pinned to its own subject and a client-supplied `user`
  is rejected (prevents enumerating another user's access). Centralized in
  resolveFgaSubject; M2M/client-credentials callers to be allowed in Phase 2.
- Engine SPI: ListUsers and Expand methods on AuthorizationEngine.
Add tests for previously-uncovered surface:
- _fga_delete_tuples (removes a tuple; non-admin rejected)
- _fga_get_model (returns active model; non-admin rejected)
- trust gate enforced per decision op: fga_list_objects and fga_batch_check
  reject an ordinary user supplying another subject (not only fga_check)
- session query honors required_relations (separate wiring of the same
  helper as validate_session)
… relations

- engine.ReadModel now returns (id, dsl): _fga_get_model previously returned an
  empty FgaModel.id while _fga_write_model returned one. Populate it from the
  active OpenFGA model id.
- Add a validate_jwt_token required_relations test (the third entry point of
  the shared enforceRequiredRelations helper); re-logs in for a fresh
  access token since session ops in earlier subtests rotate the original.
…re config

The two-engine selector (--authorization-engine=policy|fga) was a vestige of
the SPI design — the policy engine was removed entirely, leaving only OpenFGA.
FGA is now enabled by configuring a store: --fga-store (embedded) or
--fga-external-url (external). With neither set the engine is not constructed
and the fga_* resolvers fail closed, identical to the previous default.

- Remove the AuthorizationEngine config field and CLI flag.
- --fga-store defaults to "" (set it to enable embedded FGA).
- Update stale comments/schema descriptions referencing the removed flag.
…ly FGA

Authorizer embeds OpenFGA in-process — it IS the engine. Trim the FGA config
surface to what's actually used:

- Remove --fga-mode and --fga-external-url: external-OpenFGA-service mode was a
  non-functional stub (logged a warning, started no engine). HA/serverless use
  the embedded engine + an external SQL store (postgres/mysql), not a separate
  service. The AuthorizationEngine SPI still allows adding an external client
  later if a real need arises.
- Remove three dead flags left from the old policy engine, with zero consumers
  after its removal: --authorization-cache-ttl, --include-permissions-in-token,
  --authorization-log-all-checks.

FGA is now enabled solely by --fga-store (+ --fga-store-url). Build + full
SQLite suite green.
The "not enabled" empty state referenced the removed --authorization-engine=fga
flag. Rewrite it as a helpful empty state: correct enable command (--fga-store)
with copy-to-clipboard, store options (memory/sqlite/postgres/mysql), and a docs
link, styled to the dashboard's blue accent. Also replace the bare "No Tuples"
empty state with guidance on what a tuple is and how to grant the first one.
…ride

When the main database is OpenFGA-compatible (sqlite/postgres/mysql/mariadb),
FGA derives its store from --database-url automatically — no extra flags, with
OpenFGA's tables living in the main DB (as the old engine did). --fga-store /
--fga-store-url become overrides, required only when the main DB is unsupported
(mongodb, dynamodb, cassandra, couchbase, arangodb, sqlserver) or to use a
dedicated store.

- config.FGAStoreConfig() resolves the store (explicit override > main-DB
  derivation > disabled); unit-tested across the matrix.
- Migrations run on boot for SQL stores (idempotent, goose-locked → HA-safe).
- Dashboard "not enabled" copy updated to explain auto-reuse + the override.

Verified: a SQLite-configured instance auto-enables FGA (reused_main_db=true)
with no --fga-store and no driver-registration panic.
…thout store

- config: for every database OpenFGA can't use (mongodb, dynamodb, cassandra,
  scylla, couchbase, arangodb, sqlserver, libsql, cockroachdb, yugabyte,
  planetscale), FGAStoreConfig returns disabled when --fga-store/--fga-store-url
  are unset; an explicit --fga-store still enables it.
- integration: validate_session without required_relations succeeds when no FGA
  engine is configured — the instance works normally without FGA.
Replace the raw-DSL-only model editor with a visual builder that generates
OpenFGA DSL under the hood, plus a "DSL (advanced)" escape hatch:

- ModelBuilder: add/edit types, relations and permissions via forms — direct
  assignment (chips), unions, and inheritance ("X from Y"), no DSL knowledge
  needed.
- modelDsl.ts: generateDsl / parseDsl (best-effort) / validateModel /
  plain-English summarize + 3 starter templates (document sharing, folder
  inheritance, org/team/project). Verified round-trip; advanced constructs
  (and / but not / conditions) keep the user in DSL mode.
- Model page: Builder <-> DSL tabs, template chips, live "what this model
  means" summary, clearer intro copy. Loads an existing model into the builder
  when representable, else opens DSL.
… nav

Turn the three Authorization pages into a clear guided workflow:

- AuthSteps: a shared, clickable stepper (1 Define model → 2 Grant access →
  3 Test access) shown on each page, with done/current/upcoming states. Steps
  stay deep-linkable so admins can jump directly.
- Each page now leads with "Step N · <title>", a concrete worked Example
  callout (document-sharing running example), and a "Next →" link to continue.
- "RBAC — your roles" model template generated from the instance's configured
  roles (fetched via admin _env), with role-name sanitization. Round-trip
  verified.
- Sidebar: the Authorization group is now collapsible (chevron, aria-expanded),
  default-open when on an authorization route.
…t tree

The hand-rolled form builder was fragile (delete bug, cluttered layout). Replace
it with a robust master-detail tree editor:

- react-arborist tree shows types -> relations (expand/collapse, keyboard nav,
  per-node add/delete, selection); a detail pane edits the selected node's name,
  assignable types, and computed terms. Builder | DSL stays as two tabs.
- All model edits go through pure, unit-tested mutation helpers in modelDsl.ts
  (add/delete/rename type & relation, add/remove assignable & computed) — this
  eliminates the in-place-mutation delete bug at the source. Verified by a
  standalone mutation test.
- Removed the bespoke ModelBuilder.tsx.
…le catalog

Replace the confusing tree/builder + Builder/DSL sub-tabs with one simple,
example-driven editor:

- A catalog of 9 ready-to-use OpenFGA model examples (raw DSL, so they use the
  full language): document sharing, folder hierarchy, organizations & teams,
  RBAC roles, groups, block list (exclusion), multi-tenant SaaS, GitHub-style
  repos, and time-bound access (conditions) — plus a dynamic "Your roles"
  example. Each card shows a description; clicking loads it into the editor.
- One DSL editor + a live plain-English summary + Save. No tree, no builder,
  no model sub-tabs. CRUD is load/edit/save.
- All 9 examples validated against the OpenFGA DSL transformer (the same one
  the backend uses on save). Removed react-arborist and ModelTree.tsx.
The collapsible group header was styled as a faded uppercase section label
(text-gray-400, uppercase), which read as a disabled item. Style it like a
normal nav entry (text-sm, gray-700, blue-50 when active).
…pper

- DocsLinks: links to OpenFGA / ReBAC concepts, modeling guide, DSL reference,
  and relationship tuples — shown on the Model and Grant-access pages.
- Grant-access page: "Common grant patterns" cards (direct, assign a role,
  grant a whole role via role#assignee, public user:*, and grant-on-a-folder so
  all resources inherit) that prefill the form, plus a tip on avoiding a tuple
  per object id.
- Model page: switching to an example now confirms if there are unsaved changes
  and shows a toast; a note explains there is one active model and saving makes
  a new immutable version active.
- Stepper now marks a step done only when actually complete (model saved /
  tuples exist), so step 1 isn't checked when no model exists.
Add an "About model versions" info panel: one active model, saving creates a
new immutable version, earlier versions are retained, OpenFGA models are
append-only (a version can't be deleted individually), and separate models
need separate stores.
OpenFGA models are append-only — individual versions cannot be deleted.
Reset is the only way to remove a model and all its past versions and
start fresh.

- engine: add Reset() to the AuthorizationEngine SPI; OpenFGA impl deletes
  the store (model + all versions + tuples) and creates a new empty one
- graphql: add _fga_reset mutation, super-admin gated and audited
  (admin.fga_reset). Refused while any relationship tuples still exist so
  live grants are never dropped silently — callers must delete tuples first
- dashboard: "Danger zone" on the model page. Disabled with a link to the
  Grant access page while tuples exist; otherwise a typed-confirmation
  dialog (type RESET) before wiping
- test: TestOpenFGAEngine_Reset covers store rotation, model clearing,
  tuple removal, and engine reuse
- Add engine.ErrNoModel sentinel; ReadModel returns it on a fresh store so
  callers treat "no model yet" as an empty state, not a failure. FgaGetModel
  maps it to an empty model for the dashboard's starting view. Fail-closed is
  unchanged — Check/BatchCheck/ListObjects still deny on a model-less store.
- Add authorizer_fga_checks_total, authorizer_fga_check_duration_seconds and
  authorizer_fga_operations_total, recorded across the FGA resolvers. Only
  low-cardinality constant labels are ever used as label values.
- Tests: ErrNoModel sentinel (engine), empty-model GraphQL state + metric
  recording (integration), metric helpers (unit).
…ubject

- Step 1 is now two-mode: a roles × permissions matrix (RbacBuilder, the
  default for non-developers) that generates a standard OpenFGA RBAC model,
  plus the Advanced (DSL) editor. No syntax to learn to define a model.
- Example catalogs (model examples and grant patterns) moved into modal
  popups so the editor and the add-tuple form stay the focus.
- Tester gains a User (subject) field so a super-admin can check any subject;
  result copy reflects the checked subject. Server already gates the override
  to admins.
- Grant page guards against writing tuples before a model exists, and only
  blocks on a genuine no-model error — never on a transient failure.
- Drop the dead _env.ROLES / AdminRolesQuery fetch.
- Add vitest + modelDsl.test.ts unit coverage (rbacModel, parse, summarize,
  example catalog).
- Add admin-only _admin_meta query (AdminMeta type) returning the configured
  roles / default_roles / protected_roles. Super-admin gated; the non-deprecated
  replacement for the role bits of _env (deprecated in v2).
- Dashboard model builder seeds its roles × permissions matrix from the real
  configured roles via _admin_meta, falling back to a generic set. The builder
  mounts only after the roles fetch settles so it never locks in the fallback.
- Test: admin_meta_test.go (super-admin gated, returns configured roles).
…tion

- Add docs/fga-rebac-guide.md: app vs FGA roles, identifying subjects by
  user:<id> (not names), org→project→resource hierarchy (grant once, inherit
  everywhere), and fine-grained grants that coexist with inheritance.
- Add "Org → project → resource" and "Company roles (RBAC)" model examples;
  make both concentric (editor implies viewer; permissions reference the next
  more-powerful one) per OpenFGA's concentric-relationships guidance.
- Add hierarchy_test.go proving inheritance from one org-level grant, scoped
  fine-grained grants, and concentric view, all keyed by user:<id>.
- Grant form nudges admins to use the user's id, not a name.
…date in CI

Reviewed every shipped model against openfga/agent-skills (the official
OpenFGA modeling rules):

- Folder hierarchy example: chain owner down (`owner from parent_folder`) so a
  folder owner can edit its documents — was the documented "parent role
  forgotten on child types" anti-pattern; rename parent → parent_folder per
  the naming convention; add folder can_view.
- Organizations & teams example: add can_view so apps check a permission, not
  the member relation directly.
- Model editor placeholder: concentric (editor implies viewer) instead of
  independent viewer/editor unioned in can_view.
- Add examples_validation_test.go: extracts every DSL from the dashboard
  catalog, the editor placeholder, and docs/fga-rebac-guide.md and writes each
  through the real embedded engine — the in-repo equivalent of
  `fga model validate`, so a malformed example can never ship.
- Replace every user:alice example, placeholder and grant-pattern prefill with
  the user:<id> / user:<user-id> convention the docs recommend — names aren't
  unique or stable; point admins at the Users page for the id.
- Fix the Grant access form alignment: the id hint under the User column made
  it taller than the other columns in the items-end grid; the hint is now a
  full-width row below the inputs so all fields and the Add button align.
…, id-only examples

- The model builder now always starts from the standard admin/editor/viewer
  matrix; the instance's configured roles are offered as one-click suggestion
  chips instead of being forced in as the seed (app roles like "user" make
  poor object-scoped FGA roles).
- Grant-pattern prefill uses folder:<folder-id>; ReBAC guide examples now use
  numeric object ids (organization:101, project:201, resource:301) — objects,
  like users, are identified by id, never by name. role:* objects stay keyed
  by role name by design.
…lver per file

BREAKING (branch-only, never released): replaces fga_check, fga_batch_check
and fga_list_objects.

- Public surface is now exactly two operations:
  - check_permissions(checks: [{relation, object, contextual_tuples?}], user?)
    → results echo each pair with allowed (a single check is a batch of one).
  - list_permissions(relation, object_type, user?) → objects.
- Subject trust gate (resolveFgaSubject): defaults to the caller's token
  subject; an explicit `user` (bare id normalized to user:<id>) is honored
  only for super-admins or when it equals the caller's own subject — anything
  else is rejected, never silently ignored.
- Resolvers restructured one-per-file: fga.go (shared helpers + gate),
  check_permissions.go, list_permissions.go, fga_write_model.go,
  fga_get_model.go, fga_write_tuples.go, fga_delete_tuples.go,
  fga_read_tuples.go, fga_list_users.go, fga_expand.go, fga_reset.go.
- Dashboard: Access Tester page removed (the wizard is now 2 steps); per-user
  verification moved to Users table → "View Permissions" modal, which calls
  list_permissions with an explicit subject under the admin session.
- Metrics labels: check_permissions / list_permissions.
- Integration tests rewritten, including a new self-specification case
  (non-admin passing their own subject is honored).
Adding a tuple whose relation or object type isn't in the active model
surfaced OpenFGA's raw gRPC error ("rpc error: code = Code(2000) desc =
Invalid tuple ..."), which read as "can't add grant access".

- Map tuple-validation errors in _fga_write_tuples/_fga_delete_tuples to a
  friendly message that keeps OpenFGA's reason and points at Step 1; raw
  error stays in the debug log. Covered by an integration test (also asserts
  no gRPC internals leak).
- Grant-pattern modal now states tuples must match YOUR model; the folder
  pattern notes it needs a folder type.
All program design docs (FGA migration plan, agentic delegation design,
enterprise authz model, implementation agents, migration-tool spec) and the
ReBAC guide now live in the authorizer-docs repo under specs/. References in
CLAUDE.md and ROADMAP_V2.md point there. The docs-guide DSL validation
subtest is removed with the guide; dashboard example validation stays.
check_permissions accepted unbounded contextual-tuple arrays from any
authenticated caller, relying on the embedded OpenFGA default limit as the
only guard. Enforce an explicit cap (100) in toContextualTuples with unit
coverage so the boundary no longer depends on engine configuration.
…init

The engine created a fresh OpenFGA store on every boot whenever no StoreID
was passed — and no caller ever persisted one — so on SQL-backed deployments
a restart orphaned the model and every tuple, and all checks failed with
'no authorization model written yet' until an admin rebuilt everything.

New() now recovers the existing store by exact name via ListStores and
adopts the store's latest authorization model, so persistent deployments
survive restarts with zero operator action. Covered by a restart-continuity
test that boots a second engine on the same SQLite file and asserts the
original decisions still hold.

Engine-init failure no longer log.Fatal()s the instance: FGA is optional,
so init errors (e.g. missing DDL rights for OpenFGA migrations) now log
and leave the engine nil — permission APIs fail closed, core auth keeps
serving. Also inlines the no-op strconvItoa wrapper.
…ers omitted

relation and object_type are now optional on list_permissions. When either
is omitted, every matching (type, relation) pair of the active model is
enumerated — an empty input answers "what can this user access?" in one
call. Pairs come from the new TypeRelations engine SPI method and are
expanded via ListObjects with bounded concurrency (5) so a single request
cannot saturate the embedded engine.

The response now carries (object, relation) detail in permissions[] and an
explicit truncated flag when the 1000-entry cap is hit, replacing the
previous silent truncation. The subject trust gate is unchanged: callers
enumerate their own access unless super-admin.
The Users-table permissions modal now treats both filters as optional,
matching the new list_permissions API: an empty form lists every permission
the user holds. Results render as (object, permission) rows instead of bare
object ids, and a notice appears when the server truncated at 1000 entries.
FGA tuples and permission lookups need the user's UUID; admins previously
had to open the user detail view to get it. The ID now shows muted and
monospaced under the email with a one-click copy button (existing
clipboard + toast pattern); the click does not trigger the row's detail
view.
The Users-table permissions modal now fetches everything the user can
access the moment it opens — no filter input or button click required.
The form is purely a narrowing filter (Apply filters / Refresh), skeleton
rows show while loading, and all state resets on close so the next open
starts fresh for any user.
…verride

TestFGADisabled now asserts that ALL admin FGA ops — including every write
path (_fga_write_model, _fga_write_tuples, _fga_delete_tuples, _fga_reset)
plus _fga_get_model, _fga_read_tuples, _fga_list_users, _fga_expand and the
public list_permissions — return the not-enabled error when no engine is
configured, even for a super admin. This proves no FGA record can be
created via the API on an unsupported database without --fga-store, and is
the exact error that switches the dashboard's Authorization tab into its
FgaNotEnabled state.

TestFGAExplicitStoreOverrideForUnsupportedDB proves the other direction at
the config→engine seam: a mongodb main DB with explicit --fga-store/
--fga-store-url resolves to an enabled FGA config, and an engine built from
it exactly as cmd/root.go wires it serves model writes, tuple writes, and
checks.
Adds the first component-level dashboard tests: FgaNotEnabled (what the
Authorization tab shows on databases without OpenFGA support and no
--fga-store) must explain the state and surface the exact flags that fix
it, and isFgaNotEnabledError — the single decision point that switches the
tab into that state — is covered for the backend message, case variants,
unrelated errors, and missing input. Component tests opt into jsdom per
file; pure DSL tests stay on the node environment.

New dev-only deps: jsdom, @testing-library/react, @testing-library/dom.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant