Skip to content

feat: inital steps towards phase5 manual ranking improvement#130

Merged
nikilok merged 4 commits into
mainfrom
feat/phase5-improving-manual-verification
May 27, 2026
Merged

feat: inital steps towards phase5 manual ranking improvement#130
nikilok merged 4 commits into
mainfrom
feat/phase5-improving-manual-verification

Conversation

@nikilok
Copy link
Copy Markdown
Owner

@nikilok nikilok commented May 27, 2026

Summary by CodeRabbit

  • New Features

    • Improved inline candidate scoring and tie-breaking (locality/company-status effects and UK-presence preference).
    • New CLI scripts to compare/apply drain decisions and to hydrate missing company profiles.
  • Refactor

    • Sweep workflow now uses sponsor-aware lookups and performs many resolutions inline; CLI output metrics updated.
  • Tests

    • Added comprehensive tests for scoring, tie-resolution, route-type compatibility and edge cases.
  • Documentation

    • Updated Phase 5 docs, added drain-comparison report and a follow-ups checklist.

Review Change Stack

@vercel
Copy link
Copy Markdown

vercel Bot commented May 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
learn-tanstack-start Ready Ready Preview, Comment May 27, 2026 7:09pm

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 66953df3-8565-41e2-9728-249ae2ccecf0

📥 Commits

Reviewing files that changed from the base of the PR and between 04312dc and a6e00a4.

📒 Files selected for processing (11)
  • apps/web/scripts/drain-review-queue.ts
  • apps/web/scripts/hydrate-queue-proposed-profiles.ts
  • apps/web/scripts/phase5-sweep.ts
  • apps/web/src/lib/phase5/compare-candidates.test.ts
  • apps/web/src/lib/phase5/compare-candidates.ts
  • apps/web/src/lib/phase5/decide.test.ts
  • apps/web/src/lib/phase5/decide.ts
  • apps/web/src/lib/phase5/sql.ts
  • apps/web/src/lib/phase5/sweep.test.ts
  • apps/web/src/lib/phase5/sweep.ts
  • docs/followups.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • apps/web/src/lib/phase5/compare-candidates.ts
  • apps/web/scripts/hydrate-queue-proposed-profiles.ts
  • apps/web/scripts/drain-review-queue.ts

📝 Walkthrough

Walkthrough

Implements Phase 5 inline candidate comparison (route-type gate, sponsor-fit scoring, pairwise resolution with succession and UK-presence adjustments), adds drain/hydration scripts, refines SQL optimistic-lock precision, updates docs, and adds tests and a generated drain comparison report.

Changes

Phase 5 inline resolution pipeline

Layer / File(s) Summary
Route-type compatibility mapping
apps/web/src/lib/phase5/route-type-compat.ts
HmrcRoute and CHCompanyType unions; COMMERCIAL_FORMS / NOT_FOR_PROFIT_FORMS sets; ROUTE_TYPE_COMP mapping; routeTypeCompatible() defaults to true when route/type unknown.
Candidate scoring implementation and tests
apps/web/src/lib/phase5/score-candidate.ts, apps/web/src/lib/phase5/score-candidate.test.ts
Adds ScorerCandidate/ScorerSponsor, normaliseLocality() for case-insensitive locality, scoreCandidate() hard-gates incompatible route/type to -Infinity, otherwise scores locality (+3) + status weighting (+1 / -2); tests cover gating, locality, status, and combined totals.
Pairwise inline resolution implementation and tests
apps/web/src/lib/phase5/compare-candidates.ts, apps/web/src/lib/phase5/compare-candidates.test.ts
Exports STATUS_QUO_BONUS, SCORE_MARGIN (2), SUCCESSION_WEIGHT, UK_PRESENCE_WEIGHT; defines CompareCandidate/CompareAction/CompareResult; canonicalises names and detects previous-name succession (both directions); applies succession and UK-presence adjustments and returns promote/keep/inconclusive; tests validate bias, succession, hard-gate, UK-presence, regression, and canonical name matching.
Algorithm documentation and generated report
docs/phase5-sweep-algorithm.md, docs/phase5-drain-comparison.md
Docs updated to reflect case-insensitive locality, removed postcode-area scoring, max score +4, SCORE_MARGIN = 2, route-type refactor, UK-presence behavior, drain residue note; adds generated drain comparison markdown with per-row tallies and disagreement-first decisions.
SQL optimistic-lock precision
apps/web/src/lib/phase5/sql.ts
Truncates verified_at to milliseconds in optimistic-lock predicates (date_trunc('milliseconds', verified_at) IS NOT DISTINCT FROM ...) to avoid microsecond mismatch lock misses; adds makeLookupSponsor/makeGetProfile factories and adjusts resolveSponsor locality handling.

Queue maintenance scripts

Layer / File(s) Summary
Drain review-queue script (compare/apply modes)
apps/web/scripts/drain-review-queue.ts
Adds one-shot CLI script with --compare/--apply and `--strategy=trust
Hydrate missing proposed profiles
apps/web/scripts/hydrate-queue-proposed-profiles.ts
Adds rate-limited CH fetcher with timeout/retries/backoff, upserts fetched companies_house_profiles, supports --limit and --dry-run, logs stats, and exits non-zero if error rate >10%.
Sweep CLI wiring and tests
apps/web/scripts/phase5-sweep.ts, apps/web/src/lib/phase5/sweep.ts, apps/web/src/lib/phase5/sweep.test.ts
Rewires sweep deps to use lookupSponsor and getProfile, removes enqueueReview, implements log_and_bump and inline_score dispatch paths, tracks inlineResolved/inlineInconclusive/warned, and updates tests accordingly.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant scoreCandidate
  participant routeTypeCompatible
  Caller->>scoreCandidate: candidate, sponsor
  scoreCandidate->>routeTypeCompatible: sponsor.route, candidate.type
  alt route incompatible with company type
    routeTypeCompatible-->>scoreCandidate: false
    scoreCandidate-->>Caller: -Infinity
  else route compatible
    scoreCandidate->>scoreCandidate: normalise locality, compute feature scores
    scoreCandidate-->>Caller: sum of locality + status contributions
  end
Loading
sequenceDiagram
  participant Resolver
  participant scoreCandidate as scoreCandidate (existing)
  participant scoreCandidate2 as scoreCandidate (proposed)
  participant SuccessionMatch
  Resolver->>scoreCandidate: existing candidate
  scoreCandidate-->>Resolver: score_existing
  Resolver->>scoreCandidate2: proposed candidate
  scoreCandidate2-->>Resolver: score_proposed
  Resolver->>SuccessionMatch: canonicalise names, check previous_company_names
  SuccessionMatch-->>Resolver: succession forward/reverse
  Resolver->>Resolver: apply SUCCESSION_WEIGHT adjustments and UK_PRESENCE_WEIGHT
  alt adjusted_delta >= SCORE_MARGIN
    Resolver-->>Resolver: promote or keep
  else adjusted_delta < SCORE_MARGIN
    Resolver-->>Resolver: inconclusive
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 A rabbit hops through scoring fields,
Where routes and companies match and yield,
Locality whispers, old names guide the way,
Scores sway, ties break, and queues shrink each day.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title references 'phase5 manual ranking improvement' but the changeset implements comprehensive Phase 5 inline scoring, candidate comparison, and drain automation—only partially addressing manual verification. Clarify whether 'manual ranking improvement' accurately describes the primary scope, or revise to emphasize inline scoring (e.g., 'Phase 5 inline scoring and drain automation') or tie-resolution logic.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/phase5-improving-manual-verification

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/web/scripts/drain-review-queue.ts`:
- Around line 191-200: The SELECT DISTINCT ON (organisation_name) query is
non-deterministic for tied counts because ORDER BY ends with "route" only;
update the ORDER BY in that query (the block producing { organisation_name,
town_city, route }) to include town_city as a final tie-breaker (e.g. ORDER BY
organisation_name, n DESC, route, COALESCE(town_city, '') or ORDER BY ... route,
town_city NULLS LAST) so rows with equal n and route are chosen
deterministically.
- Around line 551-554: The code increments the wrong counter when a mapping is
missing: replace the increment of orphaned (orphaned += 1) with the stale
counter (stale += 1) so the branch that calls markResolved(r.id,
`drain_${strategy}_stale`, changedBy) and logs "stale (no mapping row)"
correctly updates the stale tally; locate the block where mapping is checked
(the mapping variable) and the orphaned/stale counters and change the increment
to stale += 1.

In `@apps/web/scripts/hydrate-queue-proposed-profiles.ts`:
- Around line 161-165: The current fetch branch treats all non-OK responses the
same so 401/403 (invalid API key) keeps the loop running; update the non-OK
branch in the fetch logic that uses res and path to special-case authentication
failures: if res.status === 401 || res.status === 403, log a clear message
including the status and path and return a distinct fatal/auth result (e.g., {
kind: 'auth_error', status: res.status }) or throw an Error so the caller of
this function can abort the whole hydration process immediately; keep the
existing return { kind: 'error' } for other non-OK statuses and still return {
kind: 'ok', data: await res.json() } for success.

In `@apps/web/src/lib/phase5/compare-candidates.ts`:
- Around line 115-124: The UK-presence boost is currently applied based only on
types (isUkEstablishment / isForeignEntity) which can wrongly favor unrelated
pairs; change the two-if block so the boost is only added when the two
candidates are the same legal entity — e.g., require an identity guard such as
canonical-name equality or a previous-name linkage before adding
UK_PRESENCE_WEIGHT to s_e or s_p; update the checks around
isUkEstablishment(existing.type) && isForeignEntity(proposed.type) and its
symmetric counterpart to include a same-entity predicate (for example:
(existing.canonicalName === proposed.canonicalName ||
hasPreviousNameLink(existing, proposed)) && ...), so only confirmed same-entity
pairs receive the boost.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 5f50d35e-8294-498e-9bed-499927221df7

📥 Commits

Reviewing files that changed from the base of the PR and between 3a43995 and 04312dc.

📒 Files selected for processing (9)
  • apps/web/scripts/drain-review-queue.ts
  • apps/web/scripts/hydrate-queue-proposed-profiles.ts
  • apps/web/src/lib/phase5/compare-candidates.test.ts
  • apps/web/src/lib/phase5/compare-candidates.ts
  • apps/web/src/lib/phase5/score-candidate.test.ts
  • apps/web/src/lib/phase5/score-candidate.ts
  • apps/web/src/lib/phase5/sql.ts
  • docs/phase5-drain-comparison.md
  • docs/phase5-sweep-algorithm.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • apps/web/src/lib/phase5/score-candidate.ts
  • apps/web/src/lib/phase5/score-candidate.test.ts

Comment thread apps/web/scripts/drain-review-queue.ts Outdated
Comment thread apps/web/scripts/drain-review-queue.ts
Comment on lines +582 to +598
const result = await applyPromotion(
mapping,
proposedResolution,
changedBy,
applyDeps,
);

if (!result.ok) {
lockMissed += 1;
console.log(
` ${idx} ${r.organisation_name} → lock_missed (mapping verified_at changed; queue row stays unresolved)`,
);
continue;
}

swapped += 1;
await markResolved(r.id, `drain_${strategy}_swap`, changedBy);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Resolve the queue row in the same write unit as the promotion.

Lines 582-598 do two separate writes: first applyPromotion, then markResolved. If the process dies after the promotion succeeds but before the queue update, the mapping is already swapped while the queue row stays unresolved; the next run will then classify that row as stale instead of recording a successful drain.

Please fold the queue resolution into the same DB transaction/CTE as the promotion, or add a recovery path that marks rows resolved when the live mapping already matches the proposed number.

Comment thread apps/web/scripts/hydrate-queue-proposed-profiles.ts
Comment thread apps/web/src/lib/phase5/compare-candidates.ts
- this ensures we no longer keep populating data into the human review
table
@nikilok nikilok merged commit b195be1 into main May 27, 2026
5 checks passed
@nikilok nikilok deleted the feat/phase5-improving-manual-verification branch May 27, 2026 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant