Replies: 1 comment 1 reply
-
|
Hi @Dm0216, thanks for the very detailed report. I tried to reproduce on a fresh fp64 Suggested quick workaround first: please verify that your model is If you'd like to understand the root cause regardless, the most likely structural cause for the unstable MD is runtime
The 7.6 GB file from Quick diagnostic to confirm and localize — please run before LAMMPS, in pure Python, against the exact starting structure LAMMPS reads. Important: if your LAMMPS run uses periodic boundary conditions (the default for import numpy as np
from ase import Atoms
from deepmd.infer import DeepPot
# coord (natoms, 3), atype (natoms,), box (3, 3) from your LAMMPS data file
# Set pbc=True/False matching your LAMMPS `boundary` setting.
USE_PBC = True
dp_un = DeepPot("model5.1.pth")
dp_co = DeepPot("model5.1_compressed.pth")
box_arg = box.reshape(1, 9) if USE_PBC else None
e_un, f_un, _ = dp_un.eval(coord.reshape(1, -1), box_arg, atype)
e_co, f_co, _ = dp_co.eval(coord.reshape(1, -1), box_arg, atype)
print("uncomp |F| max:", float(np.max(np.abs(f_un))))
print("comp |F| max:", float(np.max(np.abs(f_co))))
print("max force diff:", float(np.max(np.abs(f_un - f_co))))
# PBC-aware minimum-image pair distance (ase handles the cell properly)
atoms = Atoms(
positions=coord.reshape(-1, 3),
cell=box.reshape(3, 3) if USE_PBC else None,
pbc=USE_PBC,
)
d = atoms.get_all_distances(mic=USE_PBC)
np.fill_diagonal(d, np.inf) # mask self-distances
print("min pair distance (mic):", float(d.min()))
# And the min_nbor_dist baked into the .pth itself
import torch
m = torch.jit.load("model5.1.pth", map_location="cpu")
print("model min_nbor_dist: ", float(m.min_nbor_dist))Three outcomes tell us where to look:
For reference, the internal indexing/bound calculations I traced in pt-side compression appear correct (table net-name ↔ |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Question
Title: PyTorch backend compression of converted TF
se_e2_amodels causes unstable LAMMPS MDSystem Information:
se_e2_a(Model: 5 elements Pb I C H N)Description:
Compressing a legacy TensorFlow
se_e2_apotential that has been converted to the PyTorch backend consistently results in an unstable model in LAMMPS leaving residual forces that immediately cause a "Lost atoms" error during MD. The uncompressed.pthmodels execute flawlessly and maintain perfect energy/force parity with the original TF model.Steps to Reproduce:
Approach 1: Direct Conversion and Compression
model5.pth. MD test runs immediately explode formodel5_compressed.pth.Approach 2: Conversion + 0-Step Initialization + Compression
To ensure the compression tables had the correct spatial boundaries (
d_low), a 0-step initialization was performed using the full dataset.model5.1.pth. MD test runs immediately explode formodel5.1_compressed.pth.LAMMPS Failure Output (Compressed Models):
Workaround Limitation:
Attempting to force mathematically smooth splines to bypass the CG minimizer failure by increasing the grid resolution (
-s 0.001 -e 10) generates a 7.6 GB.pthfile. Loading this file into LAMMPS immediately triggers a C++libtorchdeserialization crash:ERROR on proc 0: DeePMD-kit C API Error: PK (/home/dm/deepmd-kit-v3.1.4/source/lmp/pair_deepmd.cpp:572)DeePMD-kit Version
No response
Backend and its version
No response
Python Version, CUDA Version, GCC Version, LAMMPS Version, etc
No response
Details
No response
Reproducible Example, Input Files, and Commands
No response
Further Information, Files, and Links
No response
Beta Was this translation helpful? Give feedback.
All reactions