Skip to content

[WIP] Add structural analysis to the SepTop protocol#1982

Open
hannahbaumann wants to merge 14 commits into
mainfrom
structural_analysis_septop
Open

[WIP] Add structural analysis to the SepTop protocol#1982
hannahbaumann wants to merge 14 commits into
mainfrom
structural_analysis_septop

Conversation

@hannahbaumann
Copy link
Copy Markdown
Contributor

@hannahbaumann hannahbaumann commented May 28, 2026

Checklist

  • All new code is appropriately documented (user-facing code must have complete docstrings).
  • Added a news entry, or the changes are not user-facing.
  • Ran pre-commit: you can run pre-commit locally or comment on this PR with pre-commit.ci autofix.

Manual Tests: these are slow so don't need to be run every commit, only before merging and when relevant changes are made (generally at reviewer-discretion).

Developers certificate of origin

@hannahbaumann hannahbaumann marked this pull request as draft May 28, 2026 10:05
@hannahbaumann hannahbaumann changed the title Add structural analysis to the SepTop protocol [WIP] Add structural analysis to the SepTop protocol May 28, 2026
@hannahbaumann
Copy link
Copy Markdown
Contributor Author

pre-commit.ci autofix

@hannahbaumann hannahbaumann self-assigned this May 28, 2026
@hannahbaumann
Copy link
Copy Markdown
Contributor Author

This is what it would look like for the complex right now.

protein_2D_RMSD ligand_B_RMSD ligand_B_COM_drift ligand_A_RMSD ligand_A_COM_drift

@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

Codecov Report

❌ Patch coverage is 96.85864% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.58%. Comparing base (5347082) to head (d1e3abe).

Files with missing lines Patch % Lines
src/openfe/protocols/openmm_septop/base_units.py 93.54% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1982      +/-   ##
==========================================
- Coverage   94.94%   90.58%   -4.36%     
==========================================
  Files         216      217       +1     
  Lines       20481    20670     +189     
==========================================
- Hits        19445    18724     -721     
- Misses       1036     1946     +910     
Flag Coverage Δ
fast-tests 90.58% <96.85%> (?)
slow-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hannahbaumann
Copy link
Copy Markdown
Contributor Author

pre-commit.ci autofix

@hannahbaumann hannahbaumann marked this pull request as ready for review June 2, 2026 09:12
@hannahbaumann
Copy link
Copy Markdown
Contributor Author

pre-commit.ci autofix

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

No API break detected ✅

u_top = mda.Universe(pdb_file)
for state_idx in range(n_lambda):
universe = create_universe_single_state(u_top._topology, ds, state=state_idx)
prot = universe.select_atoms(protein_selection)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if its possible or faster but could we do this selection once on the u_top to get the indices of the atoms and then just indexing to get the atoms in each loop?

trj_file: pathlib.Path,
output_directory: pathlib.Path,
dry: bool,
simtype: str,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a literal type here with complex and solvent

n_frames = len(range(0, ds.dimensions["iteration"].size, ds.PositionInterval))
else:
n_frames = ds.dimensions["iteration"].size
skip = max(n_frames // 500, 1)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we want to keep the analysis to only 500 frames, should this be exposed?

Comment on lines +1776 to +1792
if simtype == "complex":
np.savez_compressed(
npz_file,
ligand_A_RMSD=np.asarray(data["ligand_A_RMSD"], dtype=np.float32),
ligand_B_RMSD=np.asarray(data["ligand_B_RMSD"], dtype=np.float32),
ligand_A_COM_drift=np.asarray(data["ligand_A_COM_drift"], dtype=np.float32),
ligand_B_COM_drift=np.asarray(data["ligand_B_COM_drift"], dtype=np.float32),
protein_2D_RMSD=np.asarray(data["protein_2D_RMSD"], dtype=np.float32),
time_ps=np.asarray(data["time_ps"], dtype=np.float32),
)
else:
np.savez_compressed(
npz_file,
ligand_A_RMSD=np.asarray(data["ligand_A_RMSD"], dtype=np.float32),
ligand_B_RMSD=np.asarray(data["ligand_B_RMSD"], dtype=np.float32),
time_ps=np.asarray(data["time_ps"], dtype=np.float32),
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about building a dict of shared data that will be saved in both cases so the time_ps and ligand_A/B_RMSD data and then if the simtype=="complex" then add the extra data to the dict then you can have single np.savez_compressed call?

ligand_B_indices: list[int],
rdmol_A: Chem.Mol,
rdmol_B: Chem.Mol,
protein_selection: str = "protein and name CA",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to expose this to users, and does it make sense to remove the protein_selection default from the private functions so that if you do update the default, you only have to do it in a single place, i.e single source of truth on the public function.

Comment on lines +1912 to +1918
selection_indices = np.array(setup.outputs["selection_indices"])
ligand_A_indices = np.where(np.isin(selection_indices, setup.outputs["ligand_A_indices"]))[
0
].tolist()
ligand_B_indices = np.where(np.isin(selection_indices, setup.outputs["ligand_B_indices"]))[
0
].tolist()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got lost trying to follow this through but could this go wrong if the user accidentally changes the settings to only save the protein and water by mistake?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants