We're seeing failures of the Convergence Ratio analysis runs with the below error. After some digging, the fatal issue is that, by the time convergence ratio is calculated, the blast parquet file contains self-alignments still, resulting in convergence ratio values greater than 1 (which should not be possible).
This continues the work I initiated a couple months ago in #258
The flow needs to be:
- Condense sequences. (ALL_BY_ALL workflow)
- Run all-by-all BLAST calcs. (ALL_BY_ALL workflow)
- Run blastreduce to take the top triangle of the all-by-all BLAST results. Keep self-alignments here. (ALL_BY_ALL workflow)
- Restore from the condensed sequence set to the full sequence set. Remove self-alignments here. (ALL_BY_ALL workflow)
- a DuckDB call happens that outputs condensed.out.
- a restore_condensed_sequences.py call happens that outputs 1.out. This is likely where the self-alignments need to be removed.
- a transcode_restored_blast.py call happens that outputs the 1.out.parquet file.
- Calculate convergence ratio. (REPORTING workflow)
We're seeing failures of the Convergence Ratio analysis runs with the below error. After some digging, the fatal issue is that, by the time convergence ratio is calculated, the blast parquet file contains self-alignments still, resulting in convergence ratio values greater than 1 (which should not be possible).
This continues the work I initiated a couple months ago in #258
The flow needs to be: