Skip to content

Rest of GPR Library#8

Open
vtommasini wants to merge 3 commits into
sxs-collaboration:mainfrom
vtommasini:rest-of-gpr-lib
Open

Rest of GPR Library#8
vtommasini wants to merge 3 commits into
sxs-collaboration:mainfrom
vtommasini:rest-of-gpr-lib

Conversation

@vtommasini
Copy link
Copy Markdown

@vtommasini vtommasini commented May 12, 2026

Complete GPR library with the remaining functions beyond standalone ML utilities pushed previously. Added the functions to run the full pipeline, plot kernel eigenvalue decay, leave-one-out cross-validation, and the functions to restore saved models from disk.

Copilot AI review requested due to automatic review settings May 12, 2026 22:31
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Gaussian Process Regression (GPR) module to the SimulationSupport package, aiming to support Post-Newtonian initial-parameter generation/corrections plus an end-to-end GPR training/prediction/cross-validation workflow, along with unit tests and new runtime dependencies.

Changes:

  • Added SimulationSupport.gpr with GPytorch-based training/prediction utilities, PN helper functions, cross-validation, plotting, and checkpoint loading.
  • Added a large new unit test module covering normalization, training/prediction, parsing, correction application, LOO CV, plotting residuals, and checkpoint restore.
  • Updated pyproject.toml to include ML/plotting dependencies.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/SimulationSupport/gpr.py New GPR/PN library implementation (training, prediction, CV, plotting, checkpoint restore).
tests/test_gpr.py New unit tests for the GPR module and related utilities.
pyproject.toml Adds new dependencies required by the new GPR functionality.
Comments suppressed due to low confidence (7)

src/SimulationSupport/gpr.py:44

  • normalize_data always reshapes X and Y with .reshape(-1, 1), which breaks multidimensional inputs (N, D) and contradicts the docstring claim that multi-D X is passed through unchanged. Handle X.ndim == 1/Y.ndim == 1 separately and only reshape in that case (otherwise fit/transform the 2D arrays directly).
    # Reshape X and Y into 2D arrays before fitting
    # as StandardScaler expects shape (N, 1) for 1D inputs
    X_normalized = scaler_X.fit_transform(X.reshape(-1, 1))
    Y_normalized = scaler_Y.fit_transform(Y.reshape(-1, 1))

src/SimulationSupport/gpr.py:680

  • predict_with_gpr_model does not reshape 1D raw_X to (N, 1) before converting to a tensor, while train_gpr_model reshapes 1D inputs during training. If callers pass a 1D array at prediction time, this can produce a shape mismatch in the GP model (kernels/LinearMean expect 2D inputs). Reshape 1D inputs consistently in both training and prediction paths.
    # Normalize the input using the model's stored parameters
    normalized_X = (raw_X - model.input_mean) / model.input_std
    X_tensor = torch.from_numpy(normalized_X).float()

src/SimulationSupport/gpr.py:813

  • train_model_and_eigenvalue_analysis defaults gpr=None but then unconditionally calls gpr.train_gpr_model, gpr.normalize_data, and gpr.predict_with_gpr_model, which will raise an AttributeError unless the caller always passes a module object. Either default gpr to the current module (or remove the parameter) and call the local functions directly, or raise a clear error if gpr is not provided.
    # Train GPR
    model, likelihood = gpr.train_gpr_model(X, Y)

    # Normalize values - needed for eigenvalue analysis
    X_normalized, Y_normalized, scaler_X, scaler_Y = gpr.normalize_data(X, Y)

tests/test_gpr.py:223

  • setUpClass writes to a hard-coded path under the repo (Path("SimulationSupport") / "gpr_tests"). Tests should avoid creating/cleaning directories in the working tree because failures can leave artifacts or clash with user checkouts. Prefer using tempfile.TemporaryDirectory() (like other tests here) and keep all filesystem writes inside it.
        cls.test_dir = Path("SimulationSupport") / "gpr_tests"
        shutil.rmtree(cls.test_dir, ignore_errors=True)
        cls.test_dir.mkdir(parents=True, exist_ok=True)
        cls.X, cls.Y = cls.make_data(n=20)

src/SimulationSupport/gpr.py:625

  • The training-loop comment states that “early stopping terminates training once the loss plateaus”, but the implementation always runs the full 200 iterations (it only tracks best_state). Either implement actual early stopping (break when no improvement for N steps) or adjust the comment/docstring so it doesn’t claim behavior that isn’t present.
    # Training loop for 200 iterations; 200 was chosen empirically for this dataset.
    # In practice, early stopping terminates training once the loss
    # plateaus, so the exact value isn't critical. If utilizing a larger
    # dataset where more iterations may be needed, increase this value if the loss
    # is still decreasing at iteration 200.

src/SimulationSupport/gpr.py:1076

  • save_gpr_corrected has a malformed docstring opener (""" ") which leaves a stray quote at the start of the docstring. Clean this up to a standard triple-quoted docstring to avoid confusing generated docs/help output.
):
    """ "
    Write and save GPR corrected runs to a text file.
    Each line contains RunID, ZwickyDays, q, chiA, chiB, D0, Omega0, adot0.

src/SimulationSupport/gpr.py:567

  • train_gpr_model normalizes by input_std / output_std computed from the training data without guarding against zeros. If a feature column (or the target) is constant, this will divide by zero and propagate NaNs through training/prediction. Add a small epsilon or use a safer scaling strategy (e.g., replace zeros in std with 1.0, similar to StandardScaler).
    input_mean = raw_X.mean(axis=0)
    input_std = raw_X.std(axis=0)
    output_mean = raw_Y.mean()
    output_std = raw_Y.std()

    # Normalize data column-wise; needed for all dimension > 1
    normalized_X = (raw_X - input_mean) / input_std
    normalized_Y = (raw_Y - output_mean) / output_std

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/SimulationSupport/gpr.py Outdated
Comment thread pyproject.toml
"scipy",
"matplotlib",
"torch",
"gpytorch"
Comment thread tests/test_gpr.py Outdated
Copy link
Copy Markdown
Member

@nilsvu nilsvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also rebase on the merged PR #4 .

Comment thread src/SimulationSupport/gpr.py Outdated
Copy link
Copy Markdown
Member

@nilsvu nilsvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove these other eccentricity-specific functions. Also, we have quite a few diagnostic functions now that do validations and plots (eigenvalue analysis, LOO analysis). Let's put these in a separate file to keep the code manageable. I suggest you move gpr.py -> grp/core.py and put the diagnostic functions in a new file gpr/diagnostics.py. You can also add gpr/__init__.py with the code from .core import *. This way people can still just do from SimulationSupport.gpr import GPRegressionModel without having to import from core.

Comment thread src/SimulationSupport/gpr.py Outdated
Comment thread src/SimulationSupport/gpr.py Outdated
Comment thread src/SimulationSupport/gpr.py Outdated
Comment thread src/SimulationSupport/gpr.py Outdated
Comment thread src/SimulationSupport/gpr.py Outdated
Comment thread src/SimulationSupport/gpr.py Outdated
Comment thread src/SimulationSupport/gpr.py Outdated
Comment thread src/SimulationSupport/gpr.py Outdated
@vtommasini vtommasini closed this May 19, 2026
@vtommasini vtommasini reopened this May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants