Skip to content

API update for OpenDX and MRC#164

Open
spyke7 wants to merge 21 commits intoMDAnalysis:masterfrom
spyke7:api_update
Open

API update for OpenDX and MRC#164
spyke7 wants to merge 21 commits intoMDAnalysis:masterfrom
spyke7:api_update

Conversation

@spyke7
Copy link
Copy Markdown

@spyke7 spyke7 commented Mar 11, 2026

This is regarding #161

Implemented from_grid() a static method and native (property) inside OpenDX.py and mrc.py

  • Just simply added those two functions inside files
  • register converters
  • change export
  • write tests

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 11, 2026

Codecov Report

❌ Patch coverage is 86.41975% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.02%. Comparing base (f45d52b) to head (c042d4f).

Files with missing lines Patch % Lines
gridData/OpenDX.py 81.81% 20 Missing and 10 partials ⚠️
gridData/mrc.py 95.77% 0 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #164      +/-   ##
==========================================
+ Coverage   88.20%   89.02%   +0.81%     
==========================================
  Files           5        5              
  Lines         814      838      +24     
  Branches      107      108       +1     
==========================================
+ Hits          718      746      +28     
+ Misses         56       54       -2     
+ Partials       40       38       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@spyke7
Copy link
Copy Markdown
Author

spyke7 commented Mar 16, 2026

@orbeckst pls check this kindly

Copy link
Copy Markdown
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already looking pretty good. Please also add doc strings and update CHANGELOG. Will have a closer look once this is done. Thank you!

@spyke7 spyke7 changed the title API update from OpenDX and MRC API update for OpenDX and MRC Mar 19, 2026
@spyke7 spyke7 requested a review from orbeckst March 19, 2026 15:09

assert isinstance(dx_field, gridData.OpenDX.field)

assert any('test_density' and 'test_user' in str(c) for c in dx_field.comments)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a logic error. 'test_density' and 'test_user' in str(c) evaluates as 'test_user' in str(c) due to Python's and short-circuit semantics — 'test_density' is truthy (not None / empty string), so it's ignored. Should be:

assert any('test_density' in str(c) for c in dx_field.comments)                                                                 
assert any('test_user' in str(c) for c in dx_field.comments)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually to get what you are doing there, you might need

assert any(('test_density' in str(c) and 'test_user' in str(c)) for c in dx_field.comments)

If you need them both to be true for a single comment.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert any(('test_density' in str(c) and 'test_user' in str(c)) for c in dx_field.comments)

This is giving error -

>       assert any(('test_density' in str(c) and 'test_user' in str(c)) for c in dx_field.comments)
E       assert False
E        +  where False = any(<generator object test_dx_from_grid.<locals>.<genexpr> at 0x0000021CD04CF220>)

gridData\tests\test_dx.py:104: AssertionError

Either I can do them separately or I can do this -
assert any(('test_density' and 'test_user') in str(c) for c in dx_field.comments)

gridData/core.py Outdated

converter = {
'MRC': mrc.MRC.from_grid,
'VDB': None,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VDB is going to be implemented but it should be removed for now. Currently the self.converter[]() has potential to try and call None() so best to just not include until the VDB support is added (or add a guard for such in the later usage).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can comment out this line.

gridData/core.py Outdated
grid, edges = g.histogramdd()
self._load(grid=grid, edges=edges, metadata=self.metadata)

def convert_to(self, format_specifier, tolerance=None, **kwargs):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tolerance is currently not used at all. Should be removed and left for the PR that introduces VDB.

gridData/mrc.py Outdated
if filename is not None:
self.read(filename, assume_volumetric=assume_volumetric)

@staticmethod
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be classmethods instead of staticmethods?

Copy link
Copy Markdown
Author

@spyke7 spyke7 Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the issue it was staticmethod and it's fine for basic usage as well. But classmethod is more good, if somebody subclasses mrc/dx
Should I change it?

Copy link
Copy Markdown
Member

@BradyAJohnston BradyAJohnston Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have thought it makes more sense to make it classmethod, but @orbeckst was specific about it being static method in the issue. @orbeckst I'm interested why?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did't know any better ;-). After some reading I also recommend going with @classmethod — if we need to work with inheritance or use it from another classmethod then classmethod will work better.

Copy link
Copy Markdown
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good but there are some necessary changes. In particular, the MRC.native should return a MrcFile.

Could you please also add a section to the gridData.core docs about using from_grid() and .native? Have a look at the issue text for #161 and you can probably copy and paste from there. We just want to document somewhere for users that this API exists.

CHANGELOG Outdated
* 1.1.1

Fixes
Changes
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enhancements (not changes)

CHANGELOG Outdated
??/??/???? orbeckst
??/??/???? orbeckst, spyke7

* 1.1.1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to 1.2.0 (under semantic versioning, we must increase minor version for new features)

Comment on lines +568 to +570
grid : Grid
type : str, optional
typequote : str, optional
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add minimal explanations (can copy from other docs)


@property
def native(self):
"""Return native object"""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Add a sentence "The "native" object is the :class:gridData.OpenDX.field itself."
  • add a .. versionadded:: 1.2.0 (with two blank lines above)

Comment on lines +107 to +108
assert_equal(dx_field.components['data'].array, data)
assert_equal(dx_field.components['positions'].origin, g.origin)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These may be floats so use assert_allclose.

g = Grid(data, origin=[0, 0, 0], delta=[1, 1, 1])

dx_field = gridData.OpenDX.field.from_grid(g)
assert dx_field.native is dx_field No newline at end of file
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checks that it's the same object. Add another assertion that checks the type explicitly

assert isinstance(dx_field.native, OpenDX.field)


mrc_obj = mrc.MRC.from_grid(g)

assert mrc_obj.native is mrc_obj
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to change as the native object should be the mrcfile.MrcFile.

  • check type
  • check some values


assert isinstance(mrc_obj, mrc.MRC)

assert_equal(mrc_obj.array, data)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the data are integers, assert_equal is technically correct but for consistency and robustness, also use assert_allclose, please.

@spyke7
Copy link
Copy Markdown
Author

spyke7 commented Mar 26, 2026

@orbeckst @BradyAJohnston what to do exactly for native for mrc.py, as I am not able to understand how to return a mrcfie.MrcFile?
Please review other changes, but kept that part same as previous.

@orbeckst
Copy link
Copy Markdown
Member

Good point, MrcFile is tied to a file on disk. We could simply link it to a tmp file or maybe it works with StringIO? Or pass a filename in convert_to() and as optional kwarg? I have to think about this a little bit more and suggestions are welcome!

@spyke7
Copy link
Copy Markdown
Author

spyke7 commented Apr 2, 2026

Good point, MrcFile is tied to a file on disk. We could simply link it to a tmp file or maybe it works with StringIO?

inside write() the filename can be a BytesIO object, as

if filename is not None:
            self.filename = filename

So ig it will work

Will also try the tempfile as well.

@spyke7
Copy link
Copy Markdown
Author

spyke7 commented Apr 2, 2026

I used the tempfile at last cause of an error while using BytesIO

 def _open_file(self, name):
        """Open a file object to use as the I/O stream."""
>       self._iostream = open(name, self._mode + 'b')

mrcfile uses the open() function and it requires string not bytesIO

@spyke7 spyke7 requested a review from orbeckst April 2, 2026 18:00
@orbeckst
Copy link
Copy Markdown
Member

orbeckst commented Apr 4, 2026

@spyke7 thanks for doing the experiments. Could we use mrcfile.mrcinterpreter.MrcInterpreter (instead of MrcFile) as it is built to take a stream?

Copy link
Copy Markdown
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good that you got it to work! I would like to try and get MRC to work with a stream and avoid a temp file that is not cleaned up. Please see comments.

Tests need a bit more work and the coverage is also below threshold. Have a look at the coverage report to see which lines are not tested.

Good work!

gridData/mrc.py Outdated
Comment on lines +153 to +158
fd, temp_path = tempfile.mkstemp(suffix=".mrc")

os.close(fd)
self.write(temp_path)

return mrcfile.open(temp_path)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reasonable workaround but it's a bit clunky, we have to parse the data twice (once in write, then after open) and we leave a tempfile around that will fill up the disk.

A slightly better solution would be to self.write() to a buffer and then have mrcfile.mrcinterpreter.MrcInterpreter directly read the buffer. That at least avoids the file on disk issue.

If that works then perhaps we can move almost all code from write() into native and the write() would just write the internal stream buffer to a file and in this way we only parse the data once with mrcfile.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try at least the "slightly better" solution — if that works then we can use it for now.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions,
What I am doing is that inside of write I will check whether a stream is given or not and then inside it only, call the MrcInterpreter

assert mrc_obj.rank == 3

def test_mrc_native_property(self):
data = np.ones((3, 3, 3))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a non-symmetric grid eg arange(60).reshape(3,4,5) ; with only ones you can miss issues with axes orientations.


def test_mrc_native_property(self):
data = np.ones((3, 3, 3))
g = Grid(data, origin=[0, 0, 0], delta=[1, 1, 1])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Varydelta = [1., 3., 0.5]. Generally you want as little symmetry as possible so that you can catch errors that just transpose indices.

err_msg="deltas of written grid do not match original")

def test_dx_from_grid():
data = np.ones((5, 5, 5), dtype=np.float32)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a non-symmetric grid eg arange(60).reshape(3,4,5)

assert_allclose(dx_field.components['positions'].origin, g.origin)

def test_dx_native():
data = np.ones((5, 5, 5))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a non-symmetric grid eg arange(60).reshape(3,4,5)


def test_dx_native():
data = np.ones((5, 5, 5))
g = Grid(data, origin=[0, 0, 0], delta=[1, 1, 1])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vary delta

assert_allclose(grid.grid, ccp4data.array)

def test_mrc_from_grid(self):
data = np.arange(27, dtype=np.float32).reshape((3, 3, 3))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even better: use a non-symmetric grid eg arange(60).reshape(3,4,5)

@orbeckst
Copy link
Copy Markdown
Member

orbeckst commented Apr 4, 2026

Regarding coverage: Just check that the lines that you wrote get covered. You ran black on the whole file so this picks up any of those lines. You are not responsible for writing other tests if this was just a formatting change (this is one of the reasons why we normally do reformatting in separate PRs.)

@orbeckst
Copy link
Copy Markdown
Member

orbeckst commented Apr 4, 2026

Looking at https://app.codecov.io/gh/MDAnalysis/GridDataFormats/pull/164 it seems that the Grid.convert_to() method is not tested. This one is important :-) so make sure there are explicit tests for OpenDX and MRC.

Co-authored-by: Oliver Beckstein <orbeckst@gmail.com>
@spyke7
Copy link
Copy Markdown
Author

spyke7 commented Apr 4, 2026

So, my plan is this -

  • check whether the passed filename is io.BytesIO or not using isinstance
  • create a header with numpy using the header data type
  • that header will be given values - nx, ny, nz, mode, map, machst and then the inside the filename(i.e the stream) the header and the data_for_file will be written
  • else all other things same for write

mrcfile utils

problem is -
from a mrcfile.mrcinterpreter.MrcInterpreter object we need to convert it to mrcfile.mrcfile.Mrcfile
for that i guess, we need to tweak values inside __init__ of mrcfile.mrcfile.Mrcfile. That way we can convert it.

There is no other way of writing a stream except the interpreter. It would be helpful if I get some info regarding the Mrcfile codebase @orbeckst @BradyAJohnston pls see once when you have got time
Thanks

- MRC.native generates a new mrcfile.mrcinterpreter.MrcInterpreter instance
- use the MrcInterpreter instance for writing out
  Note: only uncompressed files currently supported.
@orbeckst
Copy link
Copy Markdown
Member

orbeckst commented Apr 7, 2026

I updated with how I think this could work (git pull the changes). The tests passed locally. Please have a quick look and let me know what you think.

@spyke7
Copy link
Copy Markdown
Author

spyke7 commented Apr 8, 2026

I updated with how I think this could work (git pull the changes). The tests passed locally. Please have a quick look and let me know what you think.

Really thanks for the code of Mrcinterpreter. (I didn't have to do much heavylifting 🙏)
I wasn't able to spot the _create_default_attributes func, that is why I told about importing mode, map, etc from the constant and utils. Now the thing is done. And here directly in write() we are giving the _iostream

I was looking if is there any way to still return a Mrcfile object.
I saw into the mrcfile.py and one thing is that if we use __new__ we could get the raw Mrcfile object and then update the _mode, _read and then call the _create_default_attributes then we can return the Mrcfile object.
I have edited the mrc.py and the above thing works.

This is how the __init__ of mrcfile.py looks -

       super(MrcFile, self).__init__(permissive=permissive, **kwargs)
        
        if mode not in ['r', 'r+', 'w+']:
            raise ValueError("Mode '{0}' not supported".format(mode))

        name = str(name)  # in case name is a pathlib Path
        if ('w' in mode and os.path.exists(name) and not overwrite):
            raise ValueError("File '{0}' already exists; set overwrite=True "
                             "to overwrite it".format(name))
        
        self._mode = mode
        self._read_only = (self._mode == 'r')
        
        self._open_file(name)
        
        try:
            if 'w' in mode:
                self._create_default_attributes()
            else:
                self._read(header_only)
        except Exception:
            self._close_file()
            raise

But it will require the io.BytesIO() stream. (The idea we talked about)
So the native looks something like this -

mrc = mrcfile.mrcfile.MrcFile.__new__(mrcfile.mrcfile.MrcFile)
mrc._iostream = io.BytesIO()
mrc._mode = 'r+'
mrc._read_only = False
mrc.permissive = True

mrc._create_default_attributes()

But when I'm doing the isinstance check then it works for both mrcfile.mrcfile.MrcFile and mrcfile.mrcinterpreter.MrcInterpreter 😅

@spyke7
Copy link
Copy Markdown
Author

spyke7 commented Apr 8, 2026

Plz give any suggestions or thoughts, like should I do this?

@orbeckst
Copy link
Copy Markdown
Member

orbeckst commented Apr 9, 2026

Your approach to make the native object MrcFile is interesting. I don't think we really need to return MrcFile, though. There's not that much to be gained by having an underlying IO buffer connected to it and MrcInterpreter works just as we need it to and even though the assigning an open stream to _iostream is not very elegant, it's how the mrcfile library documents use of the class so it's good enough for us.

Copy link
Copy Markdown
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep MrcInterpreter.

Could you please improve the tests as outlined in the comments? I worry that issues are slipping through because we use symmetrical arrays as test inputs. Once that's done (and no other issues come up), the PR is done!

@spyke7 spyke7 requested a review from orbeckst April 9, 2026 07:08
Copy link
Copy Markdown
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to be merged! Good work!! Thank you for addressing the test issues. (There's some missing coverage but that's not your code.)

I am adding two comments and then I'll merge.

@orbeckst
Copy link
Copy Markdown
Member

orbeckst commented Apr 9, 2026

@BradyAJohnston do you want to have a final look? Anything that would prevent this PR to be merged?

@orbeckst orbeckst self-assigned this Apr 9, 2026
@orbeckst orbeckst linked an issue Apr 9, 2026 that may be closed by this pull request
@orbeckst
Copy link
Copy Markdown
Member

@spyke7 please add yourself to AUTHORS for this PR as this will become your first contribution to GridDataFormats.

Copy link
Copy Markdown
Member

@BradyAJohnston BradyAJohnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good, but some minor things that need addressing / answering (np.diag, __comsume).


assert isinstance(dx_field, gridData.OpenDX.field)

assert any(("test_density" and "test_user") in str(c) for c in dx_field.comments)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logical assert is still broken. It's changed from this comment but still doesn't work as you expect.

In [4]: ("test_density" and "test_user") in ["test_density"]
Out[4]: False

Because the initial brackets resolves to this:

In [1]: ("test_density" and "test_user")
Out[1]: 'test_user'

Keeping them combined into a single assert is obviously not great for us to avoid complexity, so just split into separate asserts for each string.

assert any("test_density" in str(c) for c in dx_field.comments)                                                                         
assert any("test_user" in str(c) for c in dx_field.comments)  

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! 👍

gridData/core.py Outdated

converter = {
'MRC': mrc.MRC.from_grid,
# 'VDB': None,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just remove this commented out code from the PR. Cleaner that way.

mrc_obj = cls()

mrc_obj.array = grid.grid
mrc_obj.delta = np.diag(grid.delta)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong here @orbeckst but a Grid is already storing the grid.delta as a 3x3 array, so np.diag(grid.delta) would return a 1D array? So we should just omit the np.diag?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spyke7 and others added 4 commits April 10, 2026 19:45
Co-authored-by: Brady Johnston <36021261+BradyAJohnston@users.noreply.github.com>
Co-authored-by: Brady Johnston <36021261+BradyAJohnston@users.noreply.github.com>
Co-authored-by: Brady Johnston <36021261+BradyAJohnston@users.noreply.github.com>
@orbeckst
Copy link
Copy Markdown
Member

orbeckst commented Apr 10, 2026

@spyke7 don't forget AUTHORS.

EDIT: oops. overlooked that you'd already done it

@spyke7 spyke7 requested a review from BradyAJohnston April 11, 2026 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API-interoperability: Grid.convert_to()

4 participants