Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 17 additions & 4 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,9 +104,10 @@ The release strategy is evidence before expansion:
```text
v0.3: evidence pack + assembly gate + provenance checksums
v0.4: compare mode for many FASTA files
v0.5: transcriptome profile
v0.6: protein profile
v0.7: reference-panel profile
v0.5: submission readiness gate
v0.6: transcriptome profile
v0.7: protein profile
v0.8: reference-panel profile
later: MCP/tool-agent interface and optional local summaries
```

Expand All @@ -122,9 +123,21 @@ Default product boundaries:
Recommended next big release:

```text
v0.3 should make FastaGuard credible as the default assembly gate before adding broad new biological profiles.
v0.5 should make submission readiness concrete before adding broad new biological profiles.
```

The next planned feature direction is:

```text
Submission Readiness Gate: --gate submission with --submission-target generic|ncbi.
```

This should stay FASTA-level and database-free. It should check identifier
safety, duplicate first-token IDs, unsafe characters, long identifiers, gap-like
N runs, high ambiguity, and tiny-record advisories. It must not claim repository
acceptance, biological completeness, annotation correctness, or contamination
confirmation.

## Collaboration Preference

When moving the project forward, provide a clear recommendation first, then proceed when the user approves or explicitly asks to continue. The default recommendation should favor boring, stable contracts over flashy AI features.
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "fastaguard"
version = "0.4.0"
version = "0.5.0"
edition = "2021"
license = "MIT"
description = "FASTA preflight QC for assembly pipelines"
Expand Down
65 changes: 59 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ CheckM, annotation, or other expensive downstream steps. It validates structure,
flags obvious FASTA-level problems, and writes stable reports for humans,
workflow engines, and future tool agents.

Use it to validate first, fix early, and route smarter.

Run it first when you need to know:

- is this FASTA file structurally valid?
Expand All @@ -23,15 +25,35 @@ Before QUAST. Before BUSCO. Before BlobToolKit. Before annotation.
Run FastaGuard first.
```

## Why FastaGuard?

Most bioinformatics QC tools answer downstream questions: assembly quality,
biological completeness, contamination evidence, taxonomy, annotation
readiness, or report aggregation. FastaGuard runs earlier. It answers whether
the FASTA itself is valid, sane, interpretable, and safe to pass downstream.

Use FastaGuard when you need:

- FASTA preflight before expensive QC, annotation, or submission workflows
- a deterministic PASS/WARN/FAIL gate for Nextflow, Snakemake, nf-core, Galaxy,
or institutional pipelines
- batch triage across many FASTA files with `fastaguard compare`
- submission-readiness signals before official validators
- stable JSON, TSV, HTML, and MultiQC-compatible outputs for humans, workflows,
and tool agents

If FastaGuard fails, fix the FASTA first. If it passes, route to the right
downstream tool.

## Release Status

| Channel | Status |
| --- | --- |
| Source/package metadata | `v0.4.0` is the current source release |
| Source/package metadata | this branch/package metadata targets `v0.5.0`; `v0.4.0` is the latest tagged source release |
| GitHub release | v0.4 GitHub release binaries are built from the `v0.4.0` tag |
| Bioconda | `v0.3.0` is live for Linux and macOS x86_64/ARM64; `v0.4.0` packaging is follow-up |
| BioContainers | `v0.3.0` is live as a pinned workflow image; `v0.4.0` image publication is follow-up |
| Source build | local checkout and the `v0.4.0` Git tag build report package version `0.4.0` |
| Bioconda | `v0.3.0` is live for Linux and macOS x86_64/ARM64; v0.5 is not yet published there |
| BioContainers | `v0.3.0` is live as a pinned workflow image; v0.5 is not yet published there |
| Source build | local checkout builds report the package version from `Cargo.toml` |

## Install

Expand Down Expand Up @@ -100,6 +122,7 @@ fastaguard --version

The `--gate pipeline` examples below require FastaGuard `v0.3.0` or newer.
The `fastaguard compare` example requires FastaGuard `v0.4.0` or newer.
The `--gate submission` example requires the v0.5 source/package contract.

Run the assembly preflight check:

Expand Down Expand Up @@ -133,6 +156,21 @@ fastaguard compare assemblies/*.fa --profile assembly --gate pipeline
This command is part of the v0.4 GitHub release. Bioconda and BioContainers may
still be `v0.3.0` until packaging publication follow-up is complete.

Submission-readiness preflight:

```bash
fastaguard sample.fa \
--profile assembly \
--gate submission \
--submission-target ncbi \
--json fastaguard.json \
--out fastaguard_report.html
```

FastaGuard reports FASTA-level risks before official validators. It does not
guarantee NCBI, ENA, or DDBJ acceptance and does not replace NCBI FCS,
annotation validation, QUAST, BUSCO, BlobToolKit, or CheckM.

Inspect the machine-readable contract:

```bash
Expand Down Expand Up @@ -238,15 +276,26 @@ v0.4 adds preflight readiness and compare mode:
- boundaries that keep FastaGuard upstream of QUAST, BUSCO, BlobToolKit,
CheckM, official validators, and annotation workflows

v0.5 adds the submission-readiness gate:

- `--gate submission` for stricter FASTA-level submission preflight
- `--submission-target generic|ncbi` for target-aware identifier and header
advisories
- submission-readiness fields in JSON, TSV, HTML, MultiQC, and compare outputs
- boundaries that keep FastaGuard upstream of official validators, NCBI FCS,
annotation validation, QUAST, BUSCO, BlobToolKit, and CheckM

## Positioning

FastaGuard should recommend deeper tools when they are appropriate:

- FastQC for raw-read QC
- QUAST for assembly quality evaluation
- BUSCO for biological completeness
- BlobToolKit for contamination and cobiont exploration
- CheckM for microbial genome completeness and contamination
- seqkit for ad hoc sequence operations
- MultiQC for aggregating reports

The strategic wedge is earlier:

Expand All @@ -257,6 +306,7 @@ FastaGuard catches FASTA-level assembly problems before expensive assembly QC.
## Documentation

- [Example reports](examples/reports/README.md)
- [Use cases and positioning](docs/use-cases.md)
- [Product thesis](docs/product-thesis.md)
- [Vision plan](docs/vision-plan.md)
- [MVP spec](docs/mvp-spec.md)
Expand All @@ -270,7 +320,9 @@ FastaGuard catches FASTA-level assembly problems before expensive assembly QC.
- [Benchmarking](docs/benchmarking.md)
- [v0.2 evidence pack](docs/evidence/fastaguard-v0.2-evidence.md)
- [v0.3 evidence workflow](docs/evidence/fastaguard-v0.3-evidence.md)
- [v0.5 submission readiness evidence](docs/evidence/fastaguard-v0.5-submission-readiness.md)
- [Packaging](docs/packaging.md)
- [v0.5.0 release notes](docs/releases/v0.5.0.md)
- [v0.4.0 release notes](docs/releases/v0.4.0.md)
- [v0.3.0 release notes](docs/releases/v0.3.0.md)
- [v0.2.0 release notes](docs/releases/v0.2.0.md)
Expand All @@ -281,8 +333,9 @@ FastaGuard catches FASTA-level assembly problems before expensive assembly QC.

## Status

FastaGuard v0.4.0 is the current source and GitHub release. It adds preflight
readiness, compare mode, and cohort-level FASTA triage outputs.
This branch/package metadata targets FastaGuard v0.5.0. The latest tagged
GitHub release remains v0.4.0, which adds preflight readiness, compare mode,
and cohort-level FASTA triage outputs.

v0.3.0 remains the current Bioconda and BioContainers release until packaging
follow-up is complete.
Expand Down
38 changes: 38 additions & 0 deletions docs/evidence/fastaguard-v0.5-submission-readiness.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# FastaGuard v0.5 Submission Readiness Evidence

This page records tiny local evidence cases for the v0.5 submission-readiness
gate. The goal is to show FASTA-level hazards before official validators and
expensive QC.

## Commands

```bash
mkdir -p target/evidence/v0.5

fastaguard testdata/submission_ids.fa \
--gate submission \
--submission-target ncbi \
--json target/evidence/v0.5/submission_ids.json

fastaguard testdata/submission_warnings.fa \
--gate submission \
--submission-target generic \
--json target/evidence/v0.5/submission_warnings.json
```

## Scope

FastaGuard can report parse validity, identifier safety, duplicate first-token
IDs, invalid sequence symbols, gap-like N runs, high ambiguity, and tiny-record
advisories. It cannot guarantee repository acceptance, biological completeness,
annotation correctness, or contamination status.

Passing `--gate submission` means the FASTA passed FastaGuard's local
FASTA-level checks for the selected `--submission-target`. It does not mean
NCBI, ENA, DDBJ, or other official validators will accept the submission.

## Expected Follow-Up

After FASTA-level blockers are fixed, users should continue to official
validators, NCBI FCS, QUAST, BUSCO, BlobToolKit, CheckM, annotation, or the
next workflow step named in the report.
30 changes: 30 additions & 0 deletions docs/output-contract.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@ cohort_report.html
fastaguard_compare_mqc.json
```

v0.5 adds submission-readiness gate fields to the same JSON, TSV, HTML,
MultiQC, and compare artifacts. JSON remains the source of truth.

## JSON Contract

Example v0.3 shape:
Expand Down Expand Up @@ -341,6 +344,33 @@ Compare mode ranks and routes FASTA files before QUAST, BUSCO, BlobToolKit,
CheckM, official validators, annotation, or other interpretive QC tools; it does
not replace them.

## Submission Gate Contract

The v0.5 contract adds `--gate submission` and
`--submission-target generic|ncbi` for FASTA-level submission readiness:

```bash
fastaguard sample.fa \
--profile assembly \
--gate submission \
--submission-target ncbi \
--json fastaguard.json \
--out fastaguard_report.html
```

Pipeline authors should route on:

- `gate.mode`
- `gate.status`
- `gate.blocking_findings`
- `readiness.categories[id=submission]`

The submission gate promotes existing identifier, header, gap, ambiguity, and
tiny-record findings into a submission-readiness view. It can report
FASTA-level hazards before official validators, but it does not guarantee NCBI,
ENA, or DDBJ repository acceptance and does not replace NCBI FCS, QUAST, BUSCO,
BlobToolKit, CheckM, or annotation validation.

## Machine-Actionable Contract

The JSON output should become the source of truth for humans, workflow engines, dashboards, and future tool-using LLM agents.
Expand Down
16 changes: 16 additions & 0 deletions docs/packaging.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@ Bioconda serves v0.3.0 on Linux and macOS x86_64/ARM64 platforms.
BioContainers provides the pinned v0.3 workflow image generated from the
Bioconda package. Docker remains useful for local smoke tests.

This branch/package metadata targets v0.5.0, including the
`--gate submission` and `--submission-target generic|ncbi` contract. Do not
document v0.5 as published on Bioconda or BioContainers until those packages
exist; the verified published package and workflow image remain v0.3.0.

## Bioconda

Recommended install:
Expand Down Expand Up @@ -60,6 +65,17 @@ Run it:
--multiqc fastaguard_mqc.json
```

Run a local v0.5 submission-readiness preflight before official validators:

```bash
./target/release/fastaguard testdata/submission_ids.fa \
--profile assembly \
--gate submission \
--submission-target ncbi \
--json fastaguard.json \
--out fastaguard_report.html
```

## Docker

Build the image:
Expand Down
20 changes: 20 additions & 0 deletions docs/releases/v0.5.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# FastaGuard v0.5.0

FastaGuard v0.5.0 is the Submission Readiness Gate release.

## Highlights

- Adds `--gate submission`.
- Adds `--submission-target generic|ncbi`.
- Adds submission-readiness fields to JSON, TSV, HTML, MultiQC, and compare outputs.
- Promotes existing identifier, header, gap, ambiguity, and tiny-record findings into a clearer submission-readiness view.

## Boundary

FastaGuard is a FASTA-level preflight tool. It does not replace NCBI, ENA,
DDBJ, NCBI FCS, QUAST, BUSCO, BlobToolKit, CheckM, annotation validation,
official validators, or official repository acceptance checks.

Passing `--gate submission` does not guarantee repository acceptance. It means
FastaGuard did not find the FASTA-level submission-readiness hazards represented
in its local, database-free contract.
32 changes: 28 additions & 4 deletions docs/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,31 @@ Development scope:
CheckM, official validators, annotation, or other downstream tools; it does
not replace them

## v0.5: Transcriptome Profile
## v0.5: Submission Readiness Gate

Goal:

```text
Make assembly FASTA files safer to hand to official validators, annotation, and downstream QC.
```

Development scope:

- `--gate submission` for stricter assembly FASTA preflight
- `--submission-target <generic|ncbi>` for target-aware submission advisories
- stricter identifier and first-token ID safety checks
- gap-like `N` run summaries for submission review
- high ambiguity and tiny-record submission advisories
- submission readiness fields in JSON, TSV, HTML, and MultiQC outputs
- compare-mode aggregation of submission readiness across many FASTA files
- evidence and release notes for `--gate submission` workflows before official
validators
- clear scope boundaries: FastaGuard does not replace NCBI, ENA, DDBJ, FCS,
QUAST, BUSCO, BlobToolKit, CheckM, or annotation validation
- FastaGuard does not replace NCBI, ENA, DDBJ official validators or guarantee
repository acceptance

## v0.6: Transcriptome Profile

Potential additions:

Expand All @@ -104,7 +128,7 @@ Potential additions:
- extreme GC outliers
- isoform-heavy warning heuristics

## v0.6: Protein Profile
## v0.7: Protein Profile

Potential additions:

Expand All @@ -114,7 +138,7 @@ Potential additions:
- low-complexity regions
- suspicious nucleotide-looking proteins

## v0.7: Reference Panel Profile
## v0.8: Reference Panel Profile

Potential additions:

Expand Down Expand Up @@ -166,6 +190,6 @@ Completed foundation:

Recommended next sequence:

- extend evidence tables across future transcriptome, protein, reference, and compare modes
- extend evidence tables across submission, transcriptome, protein, reference, and compare modes
- keep the v0.3 gate contract stable through workflow adoption examples
- explore an MCP or tool-server interface after the CLI schema is stable
Loading