feat: inital steps towards phase5 manual ranking improvement by nikilok · Pull Request #130 · nikilok/learn-tanstack-start

nikilok · 2026-05-27T12:50:19Z

Summary by CodeRabbit

New Features
- Improved inline candidate scoring and tie-breaking (locality/company-status effects and UK-presence preference).
- New CLI scripts to compare/apply drain decisions and to hydrate missing company profiles.
Refactor
- Sweep workflow now uses sponsor-aware lookups and performs many resolutions inline; CLI output metrics updated.
Tests
- Added comprehensive tests for scoring, tie-resolution, route-type compatibility and edge cases.
Documentation
- Updated Phase 5 docs, added drain-comparison report and a follow-ups checklist.

vercel · 2026-05-27T12:50:21Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
learn-tanstack-start	Ready	Preview, Comment	May 27, 2026 7:09pm

coderabbitai · 2026-05-27T12:54:27Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 66953df3-8565-41e2-9728-249ae2ccecf0

📥 Commits

Reviewing files that changed from the base of the PR and between 04312dc and a6e00a4.

📒 Files selected for processing (11)

apps/web/scripts/drain-review-queue.ts
apps/web/scripts/hydrate-queue-proposed-profiles.ts
apps/web/scripts/phase5-sweep.ts
apps/web/src/lib/phase5/compare-candidates.test.ts
apps/web/src/lib/phase5/compare-candidates.ts
apps/web/src/lib/phase5/decide.test.ts
apps/web/src/lib/phase5/decide.ts
apps/web/src/lib/phase5/sql.ts
apps/web/src/lib/phase5/sweep.test.ts
apps/web/src/lib/phase5/sweep.ts
docs/followups.md

🚧 Files skipped from review as they are similar to previous changes (3)

apps/web/src/lib/phase5/compare-candidates.ts
apps/web/scripts/hydrate-queue-proposed-profiles.ts
apps/web/scripts/drain-review-queue.ts

📝 Walkthrough

Walkthrough

Implements Phase 5 inline candidate comparison (route-type gate, sponsor-fit scoring, pairwise resolution with succession and UK-presence adjustments), adds drain/hydration scripts, refines SQL optimistic-lock precision, updates docs, and adds tests and a generated drain comparison report.

Changes

Phase 5 inline resolution pipeline

Layer / File(s)	Summary
Route-type compatibility mapping `apps/web/src/lib/phase5/route-type-compat.ts`	`HmrcRoute` and `CHCompanyType` unions; `COMMERCIAL_FORMS` / `NOT_FOR_PROFIT_FORMS` sets; `ROUTE_TYPE_COMP` mapping; `routeTypeCompatible()` defaults to `true` when route/type unknown.
Candidate scoring implementation and tests `apps/web/src/lib/phase5/score-candidate.ts`, `apps/web/src/lib/phase5/score-candidate.test.ts`	Adds `ScorerCandidate`/`ScorerSponsor`, `normaliseLocality()` for case-insensitive locality, `scoreCandidate()` hard-gates incompatible route/type to `-Infinity`, otherwise scores locality (+3) + status weighting (+1 / -2); tests cover gating, locality, status, and combined totals.
Pairwise inline resolution implementation and tests `apps/web/src/lib/phase5/compare-candidates.ts`, `apps/web/src/lib/phase5/compare-candidates.test.ts`	Exports `STATUS_QUO_BONUS`, `SCORE_MARGIN` (2), `SUCCESSION_WEIGHT`, `UK_PRESENCE_WEIGHT`; defines `CompareCandidate`/`CompareAction`/`CompareResult`; canonicalises names and detects previous-name succession (both directions); applies succession and UK-presence adjustments and returns `promote`/`keep`/`inconclusive`; tests validate bias, succession, hard-gate, UK-presence, regression, and canonical name matching.
Algorithm documentation and generated report `docs/phase5-sweep-algorithm.md`, `docs/phase5-drain-comparison.md`	Docs updated to reflect case-insensitive locality, removed postcode-area scoring, max score +4, `SCORE_MARGIN = 2`, route-type refactor, UK-presence behavior, drain residue note; adds generated drain comparison markdown with per-row tallies and disagreement-first decisions.
SQL optimistic-lock precision `apps/web/src/lib/phase5/sql.ts`	Truncates `verified_at` to milliseconds in optimistic-lock predicates (`date_trunc('milliseconds', verified_at) IS NOT DISTINCT FROM ...`) to avoid microsecond mismatch lock misses; adds makeLookupSponsor/makeGetProfile factories and adjusts resolveSponsor locality handling.

Queue maintenance scripts

Layer / File(s)	Summary
Drain review-queue script (compare/apply modes) `apps/web/scripts/drain-review-queue.ts`	Adds one-shot CLI script with `--compare`/`--apply` and `--strategy=trust
Hydrate missing proposed profiles `apps/web/scripts/hydrate-queue-proposed-profiles.ts`	Adds rate-limited CH fetcher with timeout/retries/backoff, upserts fetched `companies_house_profiles`, supports `--limit` and `--dry-run`, logs stats, and exits non-zero if error rate >10%.
Sweep CLI wiring and tests `apps/web/scripts/phase5-sweep.ts`, `apps/web/src/lib/phase5/sweep.ts`, `apps/web/src/lib/phase5/sweep.test.ts`	Rewires sweep deps to use `lookupSponsor` and `getProfile`, removes `enqueueReview`, implements `log_and_bump` and `inline_score` dispatch paths, tracks `inlineResolved`/`inlineInconclusive`/`warned`, and updates tests accordingly.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant scoreCandidate
  participant routeTypeCompatible
  Caller->>scoreCandidate: candidate, sponsor
  scoreCandidate->>routeTypeCompatible: sponsor.route, candidate.type
  alt route incompatible with company type
    routeTypeCompatible-->>scoreCandidate: false
    scoreCandidate-->>Caller: -Infinity
  else route compatible
    scoreCandidate->>scoreCandidate: normalise locality, compute feature scores
    scoreCandidate-->>Caller: sum of locality + status contributions
  end

sequenceDiagram
  participant Resolver
  participant scoreCandidate as scoreCandidate (existing)
  participant scoreCandidate2 as scoreCandidate (proposed)
  participant SuccessionMatch
  Resolver->>scoreCandidate: existing candidate
  scoreCandidate-->>Resolver: score_existing
  Resolver->>scoreCandidate2: proposed candidate
  scoreCandidate2-->>Resolver: score_proposed
  Resolver->>SuccessionMatch: canonicalise names, check previous_company_names
  SuccessionMatch-->>Resolver: succession forward/reverse
  Resolver->>Resolver: apply SUCCESSION_WEIGHT adjustments and UK_PRESENCE_WEIGHT
  alt adjusted_delta >= SCORE_MARGIN
    Resolver-->>Resolver: promote or keep
  else adjusted_delta < SCORE_MARGIN
    Resolver-->>Resolver: inconclusive
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

nikilok/learn-tanstack-start#85: Related Phase 5 sweep wiring and CLI that overlaps with updated sweep orchestration and documentation.
nikilok/learn-tanstack-start#99: Overlaps phase5-sweep-algorithm documentation and SCORE_MARGIN/inline resolution behavior changes.

Poem

🐰 A rabbit hops through scoring fields,
Where routes and companies match and yield,
Locality whispers, old names guide the way,
Scores sway, ties break, and queues shrink each day.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title references 'phase5 manual ranking improvement' but the changeset implements comprehensive Phase 5 inline scoring, candidate comparison, and drain automation—only partially addressing manual verification.	Clarify whether 'manual ranking improvement' accurately describes the primary scope, or revise to emphasize inline scoring (e.g., 'Phase 5 inline scoring and drain automation') or tie-resolution logic.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/phase5-improving-manual-verification

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/web/scripts/drain-review-queue.ts`:
- Around line 191-200: The SELECT DISTINCT ON (organisation_name) query is
non-deterministic for tied counts because ORDER BY ends with "route" only;
update the ORDER BY in that query (the block producing { organisation_name,
town_city, route }) to include town_city as a final tie-breaker (e.g. ORDER BY
organisation_name, n DESC, route, COALESCE(town_city, '') or ORDER BY ... route,
town_city NULLS LAST) so rows with equal n and route are chosen
deterministically.
- Around line 551-554: The code increments the wrong counter when a mapping is
missing: replace the increment of orphaned (orphaned += 1) with the stale
counter (stale += 1) so the branch that calls markResolved(r.id,
`drain_${strategy}_stale`, changedBy) and logs "stale (no mapping row)"
correctly updates the stale tally; locate the block where mapping is checked
(the mapping variable) and the orphaned/stale counters and change the increment
to stale += 1.

In `@apps/web/scripts/hydrate-queue-proposed-profiles.ts`:
- Around line 161-165: The current fetch branch treats all non-OK responses the
same so 401/403 (invalid API key) keeps the loop running; update the non-OK
branch in the fetch logic that uses res and path to special-case authentication
failures: if res.status === 401 || res.status === 403, log a clear message
including the status and path and return a distinct fatal/auth result (e.g., {
kind: 'auth_error', status: res.status }) or throw an Error so the caller of
this function can abort the whole hydration process immediately; keep the
existing return { kind: 'error' } for other non-OK statuses and still return {
kind: 'ok', data: await res.json() } for success.

In `@apps/web/src/lib/phase5/compare-candidates.ts`:
- Around line 115-124: The UK-presence boost is currently applied based only on
types (isUkEstablishment / isForeignEntity) which can wrongly favor unrelated
pairs; change the two-if block so the boost is only added when the two
candidates are the same legal entity — e.g., require an identity guard such as
canonical-name equality or a previous-name linkage before adding
UK_PRESENCE_WEIGHT to s_e or s_p; update the checks around
isUkEstablishment(existing.type) && isForeignEntity(proposed.type) and its
symmetric counterpart to include a same-entity predicate (for example:
(existing.canonicalName === proposed.canonicalName ||
hasPreviousNameLink(existing, proposed)) && ...), so only confirmed same-entity
pairs receive the boost.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 5f50d35e-8294-498e-9bed-499927221df7

📥 Commits

Reviewing files that changed from the base of the PR and between 3a43995 and 04312dc.

📒 Files selected for processing (9)

apps/web/scripts/drain-review-queue.ts
apps/web/scripts/hydrate-queue-proposed-profiles.ts
apps/web/src/lib/phase5/compare-candidates.test.ts
apps/web/src/lib/phase5/compare-candidates.ts
apps/web/src/lib/phase5/score-candidate.test.ts
apps/web/src/lib/phase5/score-candidate.ts
apps/web/src/lib/phase5/sql.ts
docs/phase5-drain-comparison.md
docs/phase5-sweep-algorithm.md

🚧 Files skipped from review as they are similar to previous changes (2)

apps/web/src/lib/phase5/score-candidate.ts
apps/web/src/lib/phase5/score-candidate.test.ts

coderabbitai · 2026-05-27T15:00:30Z

+    const result = await applyPromotion(
+      mapping,
+      proposedResolution,
+      changedBy,
+      applyDeps,
+    );
+
+    if (!result.ok) {
+      lockMissed += 1;
+      console.log(
+        `  ${idx} ${r.organisation_name} → lock_missed (mapping verified_at changed; queue row stays unresolved)`,
+      );
+      continue;
+    }
+
+    swapped += 1;
+    await markResolved(r.id, `drain_${strategy}_swap`, changedBy);


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Resolve the queue row in the same write unit as the promotion.

Lines 582-598 do two separate writes: first applyPromotion, then markResolved. If the process dies after the promotion succeeds but before the queue update, the mapping is already swapped while the queue row stays unresolved; the next run will then classify that row as stale instead of recording a successful drain.

Please fold the queue resolution into the same DB transaction/CTE as the promotion, or add a recovery path that marks rows resolved when the live mapping already matches the proposed number.

- this ensures we no longer keep populating data into the human review table

feat: inital steps towards phase5 manual ranking improvement

3a43995

vercel Bot deployed to Preview May 27, 2026 12:52 View deployment

nikilok added 2 commits May 27, 2026 14:26

fix: hydration to cache ch data locally

4a2d6fb

feat: draining complete

04312dc

vercel Bot deployed to Preview May 27, 2026 14:52 View deployment

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

feat: incorporating the drain mechanism into the phase5 sweep

a6e00a4

- this ensures we no longer keep populating data into the human review table

vercel Bot deployed to Preview May 27, 2026 19:09 View deployment

nikilok merged commit b195be1 into main May 27, 2026
5 checks passed

nikilok deleted the feat/phase5-improving-manual-verification branch May 27, 2026 20:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: inital steps towards phase5 manual ranking improvement#130

feat: inital steps towards phase5 manual ranking improvement#130
nikilok merged 4 commits into
mainfrom
feat/phase5-improving-manual-verification

nikilok commented May 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

vercel Bot commented May 27, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 27, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 27, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nikilok commented May 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nikilok commented May 27, 2026 •

edited by coderabbitai Bot

Loading

vercel Bot commented May 27, 2026 •

edited

Loading

coderabbitai Bot commented May 27, 2026 •

edited

Loading