diff --git a/_posts/2026-03-16-Benchmarking-Mortgage-Underwriting-Agents.md b/_posts/2026-03-16-Benchmarking-Mortgage-Underwriting-Agents.md index 18f7aeb..bdc7991 100644 --- a/_posts/2026-03-16-Benchmarking-Mortgage-Underwriting-Agents.md +++ b/_posts/2026-03-16-Benchmarking-Mortgage-Underwriting-Agents.md @@ -133,7 +133,7 @@ Where `answer_instruction` is one of # Results & Discussion -![F1 accuracy for SOLO-v2-beta, SOLO (Claude 4.5 backend) and Baseline Claude 4.5](/files/images/posts/benchmarking-mortgage-underwriting-agents/image.png) +![F1 accuracy for SOLO-v2-beta, SOLO (Claude 4.5 backend) and Baseline Claude 4.5](/files/images/blog/benchmarking-mortgage-underwriting-agents/image.png) F1 accuracy for SOLO-v2-beta, SOLO (Claude 4.5 backend) and Baseline Claude 4.5 @@ -143,7 +143,7 @@ We also observe uniformly strong performance by SOLO-v2-beta (89%) compared to C Although strong, we note a systemic bias in SOLO-v2-beta where it tends to omit transactions. -![SOLO-v2-beta failure modes. From top-right counterclockwise: answer is missing at least one transaction; answer is missing at least one transaction and includes at least one irrelevant transaction; answer contains at least one relevant transaction; boolean question is answered incorrectly.](/files/images/posts/benchmarking-mortgage-underwriting-agents/chart_(3).png) +![SOLO-v2-beta failure modes. From top-right counterclockwise: answer is missing at least one transaction; answer is missing at least one transaction and includes at least one irrelevant transaction; answer contains at least one relevant transaction; boolean question is answered incorrectly.](/files/images/blog/benchmarking-mortgage-underwriting-agents/chart_(3).png) SOLO-v2-beta failure modes. From top-right counterclockwise: answer is missing at least one transaction; answer is missing at least one transaction and includes at least one irrelevant transaction; answer contains at least one relevant transaction; boolean question is answered incorrectly. diff --git a/files/images/posts/benchmarking-mortgage-underwriting-agents/chart_(3).png b/files/images/posts/benchmarking-mortgage-underwriting-agents/chart_(3).png deleted file mode 100644 index 2f1703d..0000000 Binary files a/files/images/posts/benchmarking-mortgage-underwriting-agents/chart_(3).png and /dev/null differ diff --git a/files/images/posts/benchmarking-mortgage-underwriting-agents/image.png b/files/images/posts/benchmarking-mortgage-underwriting-agents/image.png deleted file mode 100644 index 6047a10..0000000 Binary files a/files/images/posts/benchmarking-mortgage-underwriting-agents/image.png and /dev/null differ