Add the triage skills and strategies experiments by ralphbean · Pull Request #12 · fullsend-ai/experiments

ralphbean · 2026-04-29T18:03:19Z

These came from fullsend-ai/fullsend#170 and were used to form the basis of our real triage agent from fullsend-ai/fullsend#279

waynesun09

Reviewed with a 10-agent review squad. Posting the top 5 most actionable findings inline — 2 are script bugs that crash or fail on macOS, 2 are data integrity issues affecting experiment results, and 1 is a JSON parsing bug that silently truncates output.

The $SCENARIO_NAME_ unbound variable (github-adapter.sh:76) and grep -oP portability issue (github-adapter.sh:80) are the quickest wins. The data integrity findings in the README and judge.sh are worth addressing before drawing conclusions from the experiment results.

waynesun09 · 2026-05-19T22:41:14Z

+
+---
+_This issue was created by the triage-skill-comparison experiment._
+_Strategy: $STRATEGY_NAME | Scenario: $SCENARIO_NAME_" \


Bug — Unbound variable crash on every run

$SCENARIO_NAME_ (with trailing underscore) is interpreted by bash as a single variable name, since _ is a valid identifier character. This variable is never set, so with set -euo pipefail (line 3), this line will crash every invocation with unbound variable: SCENARIO_NAME_.

Suggested change

_Strategy: $STRATEGY_NAME | Scenario: $SCENARIO_NAME_" \

--body "_Strategy: $STRATEGY_NAME | Scenario: ${SCENARIO_NAME}_"

Use ${SCENARIO_NAME}_ to explicitly delimit the variable name from the trailing underscore literal.

waynesun09 · 2026-05-19T22:41:16Z

+  --label "$LABEL_TRIAGE" \
+  2>/dev/null)"
+
+ISSUE_NUMBER="$(echo "$ISSUE_URL" | grep -oP '\d+$')"


Bug — grep -oP is GNU-only, fails on macOS

grep -P (Perl regex) is not available on macOS's default BSD grep. This will fail with grep: invalid option -- P on any macOS contributor's machine.

Suggested change

ISSUE_NUMBER="$(echo "$ISSUE_URL" | grep -oP '\d+$')"

ISSUE_NUMBER="$(echo "$ISSUE_URL" | grep -oE '[0-9]+$')"

grep -oE with POSIX extended regex achieves the same result and works on both GNU and BSD grep.

waynesun09 · 2026-05-19T22:41:17Z

+| Rank | Strategy | Mean score | Reliability |
+|------|----------|-----------|-------------|
+| 1 (tie) | omo-prometheus | 4.38 | 98% |
+| 1 (tie) | omc-deep-interview | 4.38 | 97% |


Data integrity — Results table is incomplete and reliability numbers don't match trial data

Two issues with this rankings table:

Incomplete results presented as final rankings: slow-search and wrong-search-results scenarios have zero results, and silent-data-corruption only has 2 of 5 strategies. The rankings here are drawn from partial data and may change significantly once all scenarios are run.

Reliability percentages contradict trial data: The table shows values like 98% and 97%, but examining the actual result files, all trials show parse_failures: 0 — suggesting either 100% reliability or a different calculation method that isn't documented.

Consider either marking this table as preliminary/partial, or holding it until all scenario × strategy combinations have results.

waynesun09 · 2026-05-19T22:41:18Z

+}
+
+echo "$JUDGE_JSON" | jq '.' > "$TRIAL_DIR/judge-assessment.json"
+SCORE="$(echo "$JUDGE_JSON" | jq -r '.weighted_total // 0')"


Data integrity — weighted_total values are unreliable

Two problems with trusting the LLM-provided weighted_total:

Arithmetic drift: Spot-checking ~33 of 120 judge assessment files shows 0.05–0.15 point discrepancies between the LLM's weighted_total and the sum you'd get from applying the stated weights to the individual scores. These small errors can change rankings.

Inconsistent nesting: At least one file (crash-on-save/structured-triage/trial-8/judge-assessment.json) has weighted_total nested inside .scores instead of at the top level, causing this jq expression to return 0 via the // 0 fallback — silently zeroing out the score.

Consider computing weighted_total deterministically from the component scores rather than trusting the LLM's arithmetic, and normalize the JSON structure before reading it.

waynesun09 · 2026-05-19T22:41:20Z

+  # Try first { ... } block
+  local braced
+  braced="$(echo "$raw" | awk '/{/{found=1} found{print} /}/{if(found) exit}')"
+  if [[ -n "$braced" ]] && echo "$braced" | jq . &>/dev/null; then
+    echo "$braced"; return 0
+  fi
+
+  echo "$raw"
+  return 1


Bug — extract_json truncates nested JSON objects

The awk pattern /{/,/}/ exits on the first closing } it encounters. For any JSON with nested objects (which is the expected output format for triage responses), this silently truncates the response — cutting off fields that appear after the first nested object closes.

For example, given:

{ "priority": { "level": "high", "reason": "crash" }, "component": "auth" }

The function would return only { "priority": { "level": "high", "reason": "crash" } — dropping "component" entirely.

Consider using a brace-depth counter in awk, or piping through jq to extract the first valid JSON object from the mixed output.

Add the triage skills and strategies experiments

f40693c

These came from fullsend-ai/fullsend#170 and were used to form the basis of our real triage agent from fullsend-ai/fullsend#279

ralphbean requested a review from a team as a code owner April 29, 2026 18:03

waynesun09 requested changes May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the triage skills and strategies experiments#12

Add the triage skills and strategies experiments#12
ralphbean wants to merge 1 commit into
mainfrom
triage-skills-and-strategies

ralphbean commented Apr 29, 2026

Uh oh!

waynesun09 left a comment

Uh oh!

waynesun09 May 19, 2026

Uh oh!

waynesun09 May 19, 2026

Uh oh!

waynesun09 May 19, 2026

Uh oh!

waynesun09 May 19, 2026

Uh oh!

waynesun09 May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	_Strategy: $STRATEGY_NAME \| Scenario: $SCENARIO_NAME_" \
	--body "_Strategy: $STRATEGY_NAME \| Scenario: ${SCENARIO_NAME}_"

	ISSUE_NUMBER="$(echo "$ISSUE_URL" \| grep -oP '\d+$')"
	ISSUE_NUMBER="$(echo "$ISSUE_URL" \| grep -oE '[0-9]+$')"

Conversation

ralphbean commented Apr 29, 2026

Uh oh!

waynesun09 left a comment

Choose a reason for hiding this comment

Uh oh!

waynesun09 May 19, 2026

Choose a reason for hiding this comment

Uh oh!

waynesun09 May 19, 2026

Choose a reason for hiding this comment

Uh oh!

waynesun09 May 19, 2026

Choose a reason for hiding this comment

Uh oh!

waynesun09 May 19, 2026

Choose a reason for hiding this comment

Uh oh!

waynesun09 May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants