Skip to content

serve: viz scans entire project root — ignores flows_dir, .gitignore, and non-flow YAML #29

Description

@nullhack

Summary

flowr serve discovers every *.yaml file under the project root and treats each as a flow. Discovery ignores [tool.flowr].flows_dir, ignores .gitignore, and has no exclude mechanism, so archived/copied flow files and non-flow YAML anywhere in the tree show up in the viz alongside the real flows (and non-flow YAML either renders as a bogus "invalid" flow or fails to parse in the UI).

The CLI commands already scope themselves to flows_dir (default .flowr/flows); only the viz server scans the whole root.

Steps to Reproduce

mkdir -p /tmp/flowr-scan-demo && cd /tmp/flowr-scan-demo
mkdir -p .flowr/flows archive config

# one real flow
cat > .flowr/flows/main.yaml <<'EOF'
flow: main
version: "1.0.0"
states:
  - id: start
    next:
      done: done
  - id: done
EOF

# an archived copy of a flow, elsewhere in the tree (gitignored in real use)
cp .flowr/flows/main.yaml archive/old-main.yaml

# an unrelated YAML config file
cat > config/app.yaml <<'EOF'
runtime: python
region: us-east-1
EOF

flowr serve --path .

Observed

The viz lists three flows: main, old-main, and config/app. config/app.yaml has no flow/states keys, so it's indistinguishable from a real flow in the list and surfaces as a parse error when opened.

Expected

Only main is discovered. Discovery should be bounded by flows_dir (matching the CLI) and/or exclude .gitignore'd paths and files that don't structurally look like flows.

Root Cause

flowr/server/scanner.py, FlowRegistry._refresh():

def _refresh(self) -> None:
    self._files = []
    for yf in sorted(self._root.rglob("*.yaml")):
        rp = yf.relative_to(self._root)
        self._files.append(FlowFile(name=yf.stem, relative_path=str(rp)))

The registry is constructed with the raw --path value (. by default) from cli/serve.py:cmd_serve, and discover_flows(path) passes it straight through. Nothing in the scan path consults:

  • FlowrConfig.flows_dir / [tool.flowr].flows_dir,
  • .gitignore, or
  • the existing _check_structure() validator (which already requires flow + states and is only used by write_flow/create_flow, not by discovery).

Proposed Fix

Pick one (or combine):

  1. Scope discovery to flows_dir — read [tool.flowr].flows_dir (default .flowr/flows) and rglob only under it. This aligns the viz with the CLI's existing behaviour and is the smallest change. Falls back to . only if flows_dir is unset.
  2. Honor .gitignore — skip paths matched by the project's .gitignore (e.g. via pathspec). Handles the common "archived/backup flow copies" case without config.
  3. Filter by structure at discovery time — reuse _check_structure() to drop any YAML lacking flow + states before adding it to the registry. Cheap defense-in-depth that also rules out unrelated config YAML.

Option 1 alone fixes the reported case; options 2–3 make it robust against edge layouts.

Impact

  • Any project that keeps non-flow YAML (config, compose files, schema, vendored docs) or archived flow copies anywhere under the root gets a polluted, misleading flow list in the viz.
  • Users have no workaround except moving those files out of the project tree or renaming their extensions, because serve exposes no --flows-dir / --ignore flag and --path doubles as the scan root.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions