Skip to content

[Repo Assist] feat: add Imputation.kNearestWeightedImpute for distance-weighted KNN#372

Draft
github-actions[bot] wants to merge 2 commits intodeveloperfrom
repo-assist/fix-issue-318-weighted-knn-impute-20260409-b76cdca75c5f3ed4
Draft

[Repo Assist] feat: add Imputation.kNearestWeightedImpute for distance-weighted KNN#372
github-actions[bot] wants to merge 2 commits intodeveloperfrom
repo-assist/fix-issue-318-weighted-knn-impute-20260409-b76cdca75c5f3ed4

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot commented Apr 9, 2026

🤖 This PR was created by Repo Assist, an automated AI assistant.

Summary

Implements the distance-weighted KNN imputation variant requested in #318, adding Imputation.kNearestWeightedImpute to FSharp.Stats.ML.

Motivation

The existing kNearestImpute treats all k neighbours equally (simple mean). Issue #318 asks for:

  • A weighted version where the contribution of each neighbour is scaled by a user-supplied function of its distance
  • Support for pluggable distance metrics (the existing function hard-codes euclideanNaNSquared)
  • Support for similarity measures like Pearson correlation (where the user passes a reciprocal converter)

Changes

src/FSharp.Stats/ML/Imputation.fs

New function:

val kNearestWeightedImpute :
    distanceMetric   : DistanceMetrics.Distance<float[]>
    -> distanceToWeight : (float -> float)
    -> k             : int
    -> MatrixBaseImputation<float[],float>

Parameters

Parameter Purpose
distanceMetric Any float[] → float[] → float distance; use DistanceMetrics.Array.euclideanNaNSquared to skip NaN positions
distanceToWeight Converts a raw distance to a non-negative weight. For Euclidean use fun d → 1.0 / (d + epsilon); for a correlation similarity measure pass id or its reciprocal
k Number of nearest neighbours

Behaviour

  • Selects the k nearest complete rows by distanceMetric.
  • Computes a weighted average of their values at the missing index, proportional to distanceToWeight(distance).
  • If totalWeight = 0 (all weights zero), falls back to an unweighted mean (graceful degradation).
  • Returns nan if the complete-rows pool is empty.

Typical usage

open FSharp.Stats.ML

let isMissing = System.Double.IsNaN
let invDist d = 1.0 / (d + System.Double.Epsilon)   // inverse-distance weighting
let imputer = Imputation.kNearestWeightedImpute
                  FSharp.Stats.DistanceMetrics.Array.euclideanNaNSquared
                  invDist 3
let imputed = Imputation.imputeBy imputer isMissing rawData
```

### `tests/FSharp.Stats.Tests/Imputation.fs` (new file)

6 tests covering both `kNearestImpute` and `kNearestWeightedImpute`:

| Test | What it checks |
|---|---|
| `kNearestImpute` unweighted mean | Simple mean of 2 nearest neighbours |
| `kNearestImpute` k = dataset size | Mean over all rows |
| `kNearestWeightedImpute` – k=1 | Single neighbour → value unchanged |
| `kNearestWeightedImpute` – inverse-distance | Analytical result: `11.0` |
| `kNearestWeightedImpute` – equal distances | Equal weights → simple mean |
| `kNearestWeightedImpute` – empty dataset | Returns `nan` gracefully |
| `imputeBy` integration test | End-to-end: NaN is replaced, result in plausible range |

## Test Status

✅ **1200 / 1200 tests pass** (0 failures, 0 ignored)

```
EXPECTO! 1,200 tests run in 00:00:02.8s – 1,200 passed, 0 ignored, 0 failed, 0 errored. Success!

Notes & Trade-offs

  • No breaking changes – the existing kNearestImpute is unchanged.
  • Remaining items from [Feature Request] weighted KNN imputation #318 not addressed here: module rename (Impute → already deprecated), missing-value encoding parameterisation, and documentation examples. These could be tackled in follow-up PRs or directly by the maintainer.
  • Overflow guard: when two complete rows have identical non-NaN values as the query, euclideanNaNSquared returns 0. Callers using 1/(d+epsilon) still get numerically stable results because epsilon prevents true division-by-zero and the equal-weight case degrades to the arithmetic mean.

Closes #318

Generated by 🌈 Repo Assist, see workflow run. Learn more.

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/repo-assist.md@7ee2b60744abf71b985bead4599640f165edcd93

Adds Imputation.kNearestWeightedImpute to FSharp.Stats.ML, addressing the
weighted KNN imputation request in #318.  The new function accepts a
pluggable distance metric and a distanceToWeight converter, allowing both
inverse-Euclidean and similarity-based (e.g. Pearson correlation) weighting
strategies.

Changes:
- src/FSharp.Stats/ML/Imputation.fs: new kNearestWeightedImpute function
- tests/FSharp.Stats.Tests/Imputation.fs: 6 new tests (1200/1200 pass)
- tests/.../FSharp.Stats.Tests.fsproj: register new test file

Closes #318

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] weighted KNN imputation

0 participants