Skip to content

Commit 9a566ec

Browse files
committed
Merge branch 'tagparse' of https://github.com/csbobby/mellea_clean into tagparse
2 parents def3e77 + 995541b commit 9a566ec

160 files changed

Lines changed: 2526 additions & 1314 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.agents/skills/audit-markers/SKILL.md

Lines changed: 902 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
---
2+
name: skill-author
3+
description: >
4+
Draft, validate, and install new agent skills. Use when asked to create a new
5+
skill, automate a workflow, or add a capability. Produces cross-compatible
6+
SKILL.md files that work in both Claude Code and IBM Bob.
7+
argument-hint: "[skill-name]"
8+
compatibility: "Claude Code, IBM Bob"
9+
metadata:
10+
version: "2026-03-25"
11+
capabilities: [bash, read_file, write_file]
12+
---
13+
14+
# Skill Authoring Meta-Skill
15+
16+
Create new agent skills that work across Claude Code (CLI/IDE) and IBM Bob.
17+
18+
## Skill Location
19+
20+
Skills live under `.agents/skills/<name>/SKILL.md`.
21+
22+
Discovery configuration varies by tool:
23+
- **Claude Code:** Add `"skillLocations": [".agents/skills"]` to `.claude/settings.json`.
24+
Without this, Claude Code looks in `.claude/skills/` by default.
25+
- **IBM Bob:** Discovers `.agents/skills/` natively per agentskills.io convention.
26+
27+
Both tools read the same `SKILL.md` format. Use the frontmatter schema below
28+
to maximise compatibility.
29+
30+
## Workflow
31+
32+
1. **Name the skill** — kebab-case, max 64 chars (e.g. `api-tester`, `audit-markers`).
33+
34+
2. **Scaffold the directory:**
35+
```
36+
.agents/skills/<name>/
37+
├── SKILL.md # Required — frontmatter + instructions
38+
├── scripts/ # Optional — helper scripts
39+
└── templates/ # Optional — output templates
40+
```
41+
42+
3. **Write SKILL.md** — YAML frontmatter + markdown body (see schema below).
43+
44+
4. **Dry-run review** — mentally execute the skill against a realistic scenario
45+
before finalising. Walk through the procedure on a concrete example (a real
46+
file in the repo, not a hypothetical) and check for:
47+
- **Scaling gaps:** Does the procedure work for 1 file AND 100 files? If the
48+
skill accepts a directory or glob, it needs a triage strategy (e.g., "grep
49+
first to find candidates, then deep-read only files with issues") — not
50+
just "read every file fully."
51+
- **Boundary ambiguity:** If the skill defines categories or classifications,
52+
test the boundaries between adjacent categories with a real example. The
53+
edges are where agents will disagree or ask the user. Sharpen definitions
54+
until two agents reading the same test would classify it the same way.
55+
- **Stale references:** If the skill describes project state ("this hook needs
56+
to be added", "this marker is not yet registered"), verify those statements
57+
are still true. Embed checks ("read conftest.py to confirm") rather than
58+
assertions that rot.
59+
- **Output format at scale:** Run the report template mentally against the
60+
largest expected input. A per-function report for 5 files is fine; for 165
61+
files it's unusable. Design output for the largest scope — summary table
62+
first, per-item detail only where issues exist.
63+
- **Format coverage:** If the skill operates on multiple input formats (e.g.,
64+
`pytestmark` lists AND `# pytest:` comments), verify each format is
65+
explicitly addressed in the procedure. Implicit coverage causes agents to
66+
skip or guess.
67+
- **Rigid rules:** If you wrote "always X" or "never Y", find the edge case
68+
where the rule is wrong. Add the escape hatch. E.g., "per-function only"
69+
should say "module-level is acceptable when every function qualifies."
70+
71+
5. **Validate:**
72+
- Check the skill is discoverable: list files in `.agents/skills/`.
73+
- Confirm no frontmatter warnings from the IDE.
74+
- Verify the skill does not conflict with existing skills or `AGENTS.md`.
75+
76+
## SKILL.md Frontmatter Schema
77+
78+
Use only fields from the **cross-compatible** set to avoid IDE warnings.
79+
80+
### Cross-compatible fields (use these)
81+
82+
| Field | Type | Purpose |
83+
|-------|------|---------|
84+
| `name` | string | Kebab-case identifier. Becomes the `/slash-command`. Max 64 chars. |
85+
| `description` | string | What the skill does and when to trigger it. Be specific — agents use this to decide whether to invoke the skill automatically. |
86+
| `argument-hint` | string | Autocomplete hint. E.g. `"[file] [--dry-run]"`, `"[issue-number]"`. |
87+
| `compatibility` | string | Which tools support this skill. E.g. `"Claude Code, IBM Bob"`. |
88+
| `disable-model-invocation` | boolean | `true` = manual `/name` only, no auto-invocation. |
89+
| `user-invocable` | boolean | `false` = hidden from `/` menu. Use for background knowledge skills. |
90+
| `license` | string | SPDX identifier if publishing. E.g. `"Apache-2.0"`. |
91+
| `metadata` | object | Free-form key-value pairs for tool-specific or custom fields. |
92+
93+
### Tool-specific fields (put under `metadata`)
94+
95+
These are useful but not universally supported — nest them under `metadata`:
96+
97+
```yaml
98+
metadata:
99+
version: "2026-03-25"
100+
capabilities: [bash, read_file, write_file] # Bob/agentskills.io
101+
```
102+
103+
Claude Code's `allowed-tools` and `context`/`agent` fields are recognised by
104+
Claude Code but may trigger warnings in Bob's validator. If needed, add them
105+
to `metadata` or accept the warnings.
106+
107+
### Example frontmatter
108+
109+
```yaml
110+
---
111+
name: my-skill
112+
description: >
113+
Does X when Y. Use when asked to Z.
114+
argument-hint: "[target] [--flag]"
115+
compatibility: "Claude Code, IBM Bob"
116+
metadata:
117+
version: "2026-03-25"
118+
capabilities: [bash, read_file, write_file]
119+
---
120+
```
121+
122+
## SKILL.md Body Structure
123+
124+
After frontmatter, write clear markdown instructions the agent follows:
125+
126+
1. **Context section** — what the skill operates on, key reference files.
127+
2. **Procedure** — numbered steps the agent follows. Be explicit about decisions and edge cases.
128+
3. **Rules / constraints** — hard rules the agent must not break.
129+
4. **Output format** — what the agent should produce (report, edits, summary).
130+
131+
### Guidelines
132+
133+
- **Be specific.** Vague instructions produce inconsistent results across models.
134+
"Check if markers are correct" is worse than "Compare the test's assertions
135+
to the qualitative decision rule in section 3."
136+
- **Reference project files.** Point to docs, configs, and examples by relative
137+
path so the agent can read them. E.g. "See `test/MARKERS_GUIDE.md` for the
138+
full marker taxonomy."
139+
- **Declare scope boundaries.** State what the skill does NOT do. E.g. "This
140+
skill does not modify conftest.py — flag infrastructure issues as notes."
141+
- **Use `$ARGUMENTS`** for user input. `$ARGUMENTS` is the full argument string;
142+
`$1`, `$2` etc. are positional.
143+
- **Keep SKILL.md under 500 lines.** Use supporting files for large reference
144+
material (link to them from the body).
145+
- **Portability:** use relative paths from the repo root, never absolute paths.
146+
- **Formatting:** use YYYY-MM-DD for dates, 24-hour clock for times, metric units.
147+
- **Design for variable scope.** If the skill can operate on a single file or an
148+
entire directory, provide a triage strategy for the large case. Agents given
149+
"audit everything" with no prioritisation will either read every file (slow)
150+
or skip files (incomplete).
151+
- **Sharpen category boundaries.** When defining classifications, the boundary
152+
between adjacent categories causes the most disagreement. Add a "key
153+
distinction from X" sentence for each pair of adjacent tiers.
154+
- **Avoid temporal assertions.** Don't write "this conftest hook needs to be
155+
added" — write "check whether conftest.py already has the hook." State that
156+
goes stale silently is worse than no guidance at all.
157+
- **Qualify absolutes.** "Always X" and "never Y" rules need escape hatches for
158+
the common exception. E.g., "per-function only — unless every function in the
159+
file qualifies, in which case module-level is acceptable."
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
"""Validate SKILL.md frontmatter for agent skills."""
2+
3+
import json
4+
import os
5+
import sys
6+
7+
import yaml
8+
9+
10+
def validate_skill(skill_path: str) -> dict:
11+
"""Check that a skill directory has valid SKILL.md with required frontmatter keys."""
12+
skill_file = os.path.join(skill_path, "SKILL.md")
13+
14+
if not os.path.exists(skill_file):
15+
return {"status": "error", "message": "Missing SKILL.md"}
16+
17+
try:
18+
with open(skill_file) as f:
19+
# safe_load_all handles the --- delimiters correctly and won't
20+
# break on markdown horizontal rules later in the file.
21+
frontmatter = next(yaml.safe_load_all(f))
22+
23+
if not isinstance(frontmatter, dict):
24+
return {"status": "error", "message": "Frontmatter is not a YAML mapping"}
25+
26+
# Root-level required keys
27+
for key in ("name", "description"):
28+
if key not in frontmatter:
29+
return {"status": "error", "message": f"Missing root key: {key}"}
30+
31+
# version lives under metadata (per skill-author guide)
32+
meta = frontmatter.get("metadata")
33+
if not isinstance(meta, dict) or "version" not in meta:
34+
return {
35+
"status": "error",
36+
"message": "Missing nested key: metadata.version",
37+
}
38+
39+
return {"status": "success", "data": frontmatter}
40+
41+
except yaml.YAMLError as e:
42+
return {"status": "error", "message": f"Invalid YAML: {e}"}
43+
except StopIteration:
44+
return {"status": "error", "message": "No YAML frontmatter found"}
45+
46+
47+
if __name__ == "__main__":
48+
if len(sys.argv) < 2:
49+
print("Usage: python3 validate_skill.py <skill-directory>", file=sys.stderr)
50+
sys.exit(1)
51+
result = validate_skill(sys.argv[1])
52+
print(json.dumps(result))

.claude/settings.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{
2+
"skillLocations": [".agents/skills"]
3+
}

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -451,7 +451,8 @@ pyrightconfig.json
451451

452452
# AI agent configs
453453
.bob/
454-
.claude/
454+
.claude/*
455+
!.claude/settings.json
455456

456457
# Generated API documentation (built by tooling/docs-autogen/)
457458
docs/docs/api/

AGENTS.md

Lines changed: 36 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,6 @@ uv run pytest # Default: qualitative tests, skip slow te
2525
uv run pytest -m "not qualitative" # Fast tests only (~2 min)
2626
uv run pytest -m slow # Run only slow tests (>5 min)
2727
uv run pytest --co -q # Run ALL tests including slow (bypass config)
28-
uv run pytest --isolate-heavy # Enable GPU process isolation (opt-in)
2928
uv run ruff format . # Format code
3029
uv run ruff check . # Lint code
3130
uv run mypy . # Type check
@@ -44,49 +43,44 @@ uv run mypy . # Type check
4443
| `cli/` | CLI commands (`m serve`, `m alora`, `m decompose`, `m eval`) |
4544
| `test/` | All tests (run from repo root) |
4645
| `docs/examples/` | Example code (run as tests via pytest) |
46+
| `.agents/skills/` | Agent skills ([agentskills.io](https://agentskills.io) standard) |
4747
| `scratchpad/` | Experiments (git-ignored) |
4848

4949
## 3. Test Markers
50-
All tests and examples use markers to indicate requirements. The test infrastructure automatically skips tests based on system capabilities.
51-
52-
**Backend Markers:**
53-
- `@pytest.mark.ollama` — Requires Ollama running (local, lightweight)
54-
- `@pytest.mark.huggingface` — Requires HuggingFace backend (local, heavy)
55-
- `@pytest.mark.vllm` — Requires vLLM backend (local, GPU required)
56-
- `@pytest.mark.openai` — Requires OpenAI API (requires API key)
57-
- `@pytest.mark.watsonx` — Requires Watsonx API (requires API key)
58-
- `@pytest.mark.litellm` — Requires LiteLLM backend
59-
60-
**Capability Markers:**
61-
- `@pytest.mark.requires_gpu` — Requires GPU
62-
- `@pytest.mark.requires_heavy_ram` — Requires 48GB+ RAM
63-
- `@pytest.mark.requires_api_key` — Requires external API keys
64-
- `@pytest.mark.qualitative` — LLM output quality tests (skipped in CI via `CICD=1`)
65-
- `@pytest.mark.llm` — Makes LLM calls (needs at least Ollama)
66-
- `@pytest.mark.slow` — Tests taking >5 minutes (skipped via `SKIP_SLOW=1`)
67-
68-
**Execution Strategy Markers:**
69-
- `@pytest.mark.requires_gpu_isolation` — Requires OS-level process isolation to clear CUDA memory (use with `--isolate-heavy` or `CICD=1`)
70-
71-
**Examples in `docs/examples/`** use comment-based markers for clean code:
50+
Tests use a four-tier granularity system (`unit`, `integration`, `e2e`, `qualitative`) plus backend and resource markers. The `unit` marker is auto-applied by conftest — never write it explicitly. The `llm` marker is deprecated; use `e2e` instead.
51+
52+
See **[test/MARKERS_GUIDE.md](test/MARKERS_GUIDE.md)** for the full marker reference (tier definitions, backend markers, resource gates, auto-skip logic, common patterns).
53+
54+
**Examples in `docs/examples/`** use comment-based markers:
7255
```python
73-
# pytest: ollama, llm, requires_heavy_ram
56+
# pytest: e2e, ollama, qualitative
7457
"""Example description..."""
75-
76-
# Your clean example code here
7758
```
7859

79-
Tests/examples automatically skip if system lacks required resources. Heavy examples (e.g., HuggingFace) are skipped during collection to prevent memory issues.
60+
⚠️ Don't add `qualitative` to trivial tests — keep the fast loop fast.
61+
⚠️ Mark tests taking >1 minute with `slow`.
62+
63+
## 4. Agent Skills
64+
65+
Skills live in `.agents/skills/` following the [agentskills.io](https://agentskills.io) open standard. Each skill is a directory with a `SKILL.md` file (YAML frontmatter + markdown instructions).
66+
67+
**Tool discovery:**
8068

81-
**Default behavior:**
82-
- `uv run pytest` skips slow tests (>5 min) but runs qualitative tests
83-
- Use `pytest -m "not qualitative"` for fast tests only (~2 min)
84-
- Use `pytest -m slow` or `pytest` (without config) to include slow tests
69+
| Tool | Project skills | Global skills | Config needed |
70+
| ----------------- | ----------------- | ------------------- | ------------------------------------------------------------------ |
71+
| Claude Code | `.agents/skills/` | `~/.claude/skills/` | `"skillLocations": [".agents/skills"]` in `.claude/settings.json` |
72+
| IBM Bob | `.bob/skills/` | `~/.bob/skills/` | Symlink: `.bob/skills``.agents/skills` |
73+
| VS Code / Copilot | `.agents/skills/` || None (auto-discovered) |
8574

86-
⚠️ Don't add `qualitative` to trivial tests—keep the fast loop fast.
87-
⚠️ Mark tests taking >5 minutes with `slow` (e.g., dataset loading, extensive evaluations).
75+
**Bob users:** create the symlink once per clone:
8876

89-
## 4. Coding Standards
77+
```bash
78+
mkdir -p .bob && ln -s ../.agents/skills .bob/skills
79+
```
80+
81+
**Available skills:** `/audit-markers`, `/skill-author`
82+
83+
## 5. Coding Standards
9084
- **Types required** on all core functions
9185
- **Docstrings are prompts** — be specific, the LLM reads them
9286
- **Google-style docstrings**`Args:` on the **class docstring only**; `__init__` gets a single summary sentence. Add `Attributes:` only when a stored value differs in type/behaviour from its constructor input (type transforms, computed values, class constants). See CONTRIBUTING.md for a full example.
@@ -96,37 +90,38 @@ Tests/examples automatically skip if system lacks required resources. Heavy exam
9690
- **Friendly Dependency Errors**: Wraps optional backend imports in `try/except ImportError` with a helpful message (e.g., "Please pip install mellea[hf]"). See `mellea/stdlib/session.py` for examples.
9791
- **Backend telemetry fields**: All backends must populate `mot.usage` (dict with `prompt_tokens`, `completion_tokens`, `total_tokens`), `mot.model` (str), and `mot.provider` (str) in their `post_processing()` method. Metrics are automatically recorded by `TokenMetricsPlugin` — don't add manual `record_token_usage_metrics()` calls.
9892

99-
## 5. Commits & Hooks
93+
## 6. Commits & Hooks
10094
[Angular format](https://github.com/angular/angular/blob/main/CONTRIBUTING.md#commit): `feat:`, `fix:`, `docs:`, `test:`, `refactor:`, `release:`
10195

10296
Pre-commit runs: ruff, mypy, uv-lock, codespell
10397

104-
## 6. Timing
98+
## 7. Timing
10599
> **Don't cancel**: `pytest` (full) and `pre-commit --all-files` may take minutes. Canceling mid-run can corrupt state.
106100
107-
## 7. Common Issues
101+
## 8. Common Issues
108102
| Problem | Fix |
109103
|---------|-----|
110104
| `ComponentParseError` | Add examples to docstring |
111105
| `uv.lock` out of sync | Run `uv sync` |
112106
| Ollama refused | Run `ollama serve` |
113107
| Telemetry import errors | Run `uv sync` to install OpenTelemetry deps |
114108

115-
## 8. Self-Review (before notifying user)
109+
## 9. Self-Review (before notifying user)
116110
1. `uv run pytest test/ -m "not qualitative"` passes?
117111
2. `ruff format` and `ruff check` clean?
118112
3. New functions typed with concise docstrings?
119113
4. Unit tests added for new functionality?
120114
5. Avoided over-engineering?
121115

122-
## 9. Writing Tests
116+
## 10. Writing Tests
117+
123118
- Place tests in `test/` mirroring source structure
124119
- Name files `test_*.py` (required for pydocstyle)
125120
- Use `gh_run` fixture for CI-aware tests (see `test/conftest.py`)
126121
- Mark tests checking LLM output quality with `@pytest.mark.qualitative`
127122
- If a test fails, fix the **code**, not the test (unless the test was wrong)
128123

129-
## 10. Writing Docs
124+
## 11. Writing Docs
130125

131126
If you are modifying or creating pages under `docs/docs/`, follow the writing
132127
conventions in [`docs/docs/guide/CONTRIBUTING.md`](docs/docs/guide/CONTRIBUTING.md).
@@ -144,7 +139,7 @@ Key rules that differ from typical Markdown habits:
144139
mellea source; mark forward-looking content with `> **Coming soon:**`
145140
- **No visible TODOs** — if content is missing, open a GitHub issue instead
146141

147-
## 11. Feedback Loop
142+
## 12. Feedback Loop
148143

149144
Found a bug, workaround, or pattern? Update the docs:
150145

CLAUDE.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Claude Code Directives
2+
@AGENTS.md
3+
4+
## Execution
5+
- If instructed to create a new capability, strictly trigger the `skill-author` meta-skill to ensure cross-compatibility.

0 commit comments

Comments
 (0)