Fix duplicate TaxaList names causing MultipleObjectsReturned#1108
Fix duplicate TaxaList names causing MultipleObjectsReturned#1108
Conversation
…ate names The TaxaList model allows multiple lists with the same name, but several places in the codebase use get_or_create(name=...) which fails with MultipleObjectsReturned when duplicates exist. This adds a new get_or_create_for_project() method that: - Scopes lookups to a specific project (or global lists if project=None) - Handles existing duplicates gracefully by returning the oldest one - Updates all callers (pipeline.py, import_taxa, update_taxa) to use it Also adds TODO comments to management commands about adding --project parameter support in the future. Co-Authored-By: Claude <noreply@anthropic.com>
✅ Deploy Preview for antenna-ssec canceled.
|
✅ Deploy Preview for antenna-preview canceled.
|
📝 WalkthroughWalkthroughIntroduces a queryset-backed Changes
Sequence DiagramsequenceDiagram
participant Caller as Command / ML Code
participant Manager as TaxaListManager / QuerySet
participant DB as Database
Caller->>Manager: get_or_create_for_project(name, project=None)
alt project is None (global)
Manager->>DB: SELECT TaxaList WHERE name=? AND project_count=0
DB-->>Manager: rows or none
alt match found
Manager-->>Caller: (existing_list, False)
else no match
Manager->>DB: INSERT TaxaList(name=...), created_at=...
DB-->>Manager: new record
Manager-->>Caller: (new_list, True)
end
else project provided
Manager->>DB: SELECT TaxaList JOIN projects WHERE name=? AND project=?
DB-->>Manager: rows or none
alt match found
Manager-->>Caller: (existing_list, False)
else no match
Manager->>DB: INSERT TaxaList(name=...), then INSERT M2M(project)
DB-->>Manager: new record
Manager-->>Caller: (new_list, True)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR addresses a critical bug where duplicate TaxaList names were causing MultipleObjectsReturned exceptions. Since the TaxaList.name field lacks a unique constraint, the code's use of get_or_create(name=...) would fail when duplicates existed.
Changes:
- Introduces
TaxaListQuerySet.get_or_create_for_project()method to scope TaxaList lookups by project (or globally forproject=None) - Updates all three callers (
pipeline.py,import_taxa,update_taxa) to use the new method - Handles existing duplicates gracefully by returning the oldest matching list
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| ami/main/models.py | Adds TaxaListQuerySet with get_or_create_for_project() method and TaxaListManager to support project-scoped taxa list creation and retrieval |
| ami/ml/models/pipeline.py | Updates algorithm taxa list creation to use new method with explicit project=None for global lists |
| ami/main/management/commands/update_taxa.py | Updates taxa list lookup to use new method with project=None and adds TODO comment for future project parameter support |
| ami/main/management/commands/import_taxa.py | Updates taxa list lookup to use new method with project=None and adds TODO comment for future project parameter support |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Covers global lists, project-scoped lists, same-name collisions across scopes, legacy duplicate handling, and defaults-on-create semantics. Also wraps create+projects.add in transaction.atomic() so a failed projects.add rolls back the new row, and clarifies the docstring around the defaults kwarg and the MultipleObjectsReturned race path. Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
ami/main/tests.py (1)
3699-3706: Optional: add duplicate fallback test for project-scoped duplicates.You already validate oldest-record fallback for
project=None; adding the same check for duplicates tied to one project would fully lock in the scoped contract.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ami/main/tests.py` around lines 3699 - 3706, Add a new test mirroring test_handles_existing_duplicates_by_returning_oldest but scoped to a Project: create a Project instance, create three TaxaList rows with the same name and project set to that Project (saving the first as "first"), call TaxaList.objects.get_or_create_for_project(name="Duplicate Name", project=project_instance), then assert created is False and that the returned taxa_list.pk equals first.pk; name the test something like test_handles_existing_duplicates_for_project_scoped_returning_oldest and reference TaxaList and get_or_create_for_project in the assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@ami/main/tests.py`:
- Around line 3699-3706: Add a new test mirroring
test_handles_existing_duplicates_by_returning_oldest but scoped to a Project:
create a Project instance, create three TaxaList rows with the same name and
project set to that Project (saving the first as "first"), call
TaxaList.objects.get_or_create_for_project(name="Duplicate Name",
project=project_instance), then assert created is False and that the returned
taxa_list.pk equals first.pk; name the test something like
test_handles_existing_duplicates_for_project_scoped_returning_oldest and
reference TaxaList and get_or_create_for_project in the assertion.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 08ee3a61-b656-46b2-b8e4-c6e3e6930827
📒 Files selected for processing (3)
ami/main/models.pyami/main/tests.pyami/ml/models/pipeline.py
Rename duplicate TaxaList rows so (name, scope) is unique within each scope (global = no projects, or per-project). Oldest row in each group keeps its name; later rows get a " (duplicate N)" suffix that is globally unique. The migration does not merge taxa from duplicate lists — it only renames them so the MultipleObjectsReturned paths in get_or_create_for_project stop firing. Operators can review renamed rows and merge/delete manually. Covered by TaxaListDedupeMigrationTestCase (8 tests). Co-Authored-By: Claude <noreply@anthropic.com>
Summary
TaxaListQuerySet.get_or_create_for_project(name, project=None)to scope TaxaList lookups by project and survive pre-existing duplicates.pipeline.py,import_taxa,update_taxa) to use the new method (all withproject=None— see regression note below).0083_dedupe_taxalist_namesthat renames existing duplicate rows so(name, scope)is unique going forward.Problem
TaxaList.namehas no unique constraint. Code paths usingget_or_create(name=...)raiseMultipleObjectsReturnedin prod when duplicate-named lists exist, crashing ML pipeline runs.Solution
New query method (
ami/main/models.py)get_or_create_for_project(name, project=None, **defaults):project=None→ scope = "global" (lists with zero project associations)project=<Project>→ scope = that project (viaprojects__M2M lookup)DoesNotExist: creates insidetransaction.atomic()socreate+projects.addare a unitMultipleObjectsReturned: returns the oldest match instead of raising (backstop for the data migration and for the race where two concurrent callers both create)Data migration (
ami/main/migrations/0083_dedupe_taxalist_names.py)For each scope group (same name, same scope) with >1 row, keep the oldest and rename the rest with a globally-unique
" (duplicate N)"suffix. Does not merge taxa — operators can review and clean up manually. Idempotent.About uniqueness at the DB level
Django
UniqueConstraintcannot target M2M fields, so we can't directly expressunique(name, project)on this schema. Three possible paths, not in scope for this PR:projectsM2M for a nullableprojectFK on TaxaList — nativeUniqueConstraint(['name', 'project'])works; changes semantics to one-list-per-project. Cleanest long-term option.name— drift-prone.This PR relies on app-level uniqueness (via
get_or_create_for_project) plus the dedupe migration. A follow-up issue should decide whether to commit to option 1.Prod dupe-count
Checked prod (2026-04-15): 28 total TaxaLists, 0 global dupes, 0 per-project dupes. The
MultipleObjectsReturnedcrash that inspired this PR must have come from a different environment (staging/demo/dev). The dedupe migration is a no-op against prod but a useful safety net for other maintainers' envs.Known regression risk
The three existing call sites now pass
project=None, meaning they only match TaxaLists with zero project associations. If any env has a"Taxa returned by <algorithm>"list that got attached to a project historically,pipeline.py:605will no longer find it and will create a fresh global row alongside. The migration's dedupe does not re-merge these — it only renames same-scope duplicates. If--projectsupport lands in the callers later, this scoping choice should be revisited.Tradeoff in
pipeline.pyAlgorithm taxa lists are now declared "global" by convention. That means two projects sharing an algorithm share one taxa list, and taxa added by one project are visible to the other. If that's wrong, the fix is scoping algorithm lists per-project (requires the
--projectfollow-up).Test plan
get_or_create_for_project— 8 tests covering global/project creation & retrieval, same-name collision resistance across scopes, existing-duplicate handling, anddefaults-only-on-create semantics0083_dedupe_taxalist_names— 8 tests covering global dupes, project-scoped dupes, unique-name no-ops, cross-scope non-collision, multi-project overlap, post-migrationget_or_create_for_projectbehavior, and idempotencymakemigrations --checkclean;migrateapplies cleanly against a fresh DBmanage.py migrate main 0083, confirmed correct renames + idempotency on re-runimport_taxa/update_taxaagainst a local DB to confirm end-to-endMultipleObjectsReturnedin ML pipeline logsFollow-ups (new issues)
--project <slug>flags toimport_taxa/update_taxaso operators can create project-scoped listspipeline.py:605once--projectlands🤖 Drafted with Claude Code
Summary by CodeRabbit