Refactor averaging.py and manipulations.py to use quantities by DrPaulSharp · Pull Request #207 · SasView/sasdata

DrPaulSharp · 2026-05-19T11:28:10Z

This PR updates the modules refactored in #47 to use Quantity rather than Data1D/Data2D objects. It should serve as an example of how we use these objects in the refactoring project.

codescene-delta-analysis

No quality gates enabled for this code.

See analysis details in CodeScene

Quality Gate Profile: Custom Configuration
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.

krzywon

Most of my comments are related to three concepts.

The differences between three_dim vs two_dim data and how the former would be handled in the current implementation. Ideally, the Qz values would be used if they exist.
Variances and taking the square root of them versus using the bare values.
x * x vs x ** 2

Overall, this is a good port, but bypasses the mesh paradigm added in the sasdata.slicing package. Where do you see that coming into play?

krzywon · 2026-06-04T19:09:39Z

-    dqy_at_z_max = data2d.dqy_data[np.argmax(data2d.q_data)]
-    dqy_at_z_min = data2d.dqy_data[np.argmin(data2d.q_data)]
+
+    q_data = np.sqrt(data2d._data_contents["Qx"].value**2 + data2d._data_contents["Qy"].value**2)


For three_dim data, would this slice across all Qz spaces?

krzywon · 2026-06-04T19:22:18Z

+    dqx_data = np.sqrt(data2d._data_contents["Qx"].variance.value)
+    dqy_data = np.sqrt(data2d._data_contents["Qy"].variance.value)


Why the square root of the variance? Aren't the variances already the dq values?

From line 1579 of quantity.py we have: self._variance = standard_error**2, where standard_error is the input of a Quantity object. I'll look through and see if there is the opportunity to refactor and use variances here if possible, but my understanding is that this is the correct transcription of the existing code.

Okay, then you are doing the correct operation here, but from a fundamental standpoint, by taking the square of the standard error, aren't we changing the value of the data that should be immutable? And be squaring the data, how do we know which root is correct, the positive or the negative. Also, if future developers aren't aware of this, taking the raw variance will create an issue in our error propagation which will be difficult to debug.

Fundamentally, I'm not sure I agree with the underlying structure for many reasons, but this is something we can discuss.

Yes, I'm having the same thoughts. I don't understand why the Quantity object takes in an error and records a variance. I'll ask around and see what I can find, and we can discuss.

Quantity does have a standard_deviation method, which is:

def standard_deviation(self) -> "Quantity": return self.variance**0.5

which just seems to add to the mystery of why it's structured like this.

In error propagation, the variances are used most often, so I think I understand the reasoning now, but we should be storing the standard_error as-presented, for reproducibility reasons.

krzywon · 2026-06-04T19:35:20Z

+        data_contents = {
+            "Qx": Quantity(np.arange(100), per_angstrom),
+            "Qy": Quantity(np.arange(100), per_angstrom),
+            "I": Quantity(x_0, per_centimeter),
+            "dI": Quantity(dx_0, per_centimeter)
+        }


I'm curious to see what would happen with a Qz term in here. Will the calculation succeed?

pkienzle · 2026-06-05T21:24:20Z

-    if dq_overlap_x > min(data2d.dqx_data):
-        dq_overlap_x = min(data2d.dqx_data)
+    if dq_overlap_x > min(dqx_data):
+        dq_overlap_x = min(dqx_data)


You probably want np.min/np.max which operate directly on numpy arrays rather than builtin min/max which try the arrays as an iterable.

pkienzle · 2026-06-05T21:27:05Z

-    dqx_data = data2d.dqx_data[np.isfinite(data2d.data)]
-    dqy_data = data2d.dqy_data[np.isfinite(data2d.data)] - dq_overlap
+    dqx_data = dqx_data[np.isfinite(data2d.ordinate.value)]
+    dqy_data = dqy_data[np.isfinite(data2d.ordinate.value)] - dq_overlap


Save a little bit of time by computing the index once. Also, easier to maintain since you only have to update the index condition in one place.

pkienzle · 2026-06-05T21:27:43Z

        super().__init__(qx_range=qx_range, qy_range=qy_range)

-    def __call__(self, data2d: Data2D) -> tuple[float, float, float]:
+    def __call__(self, data2d: SasData = None) -> tuple[float, float, float]:


type is SasData|None here and elsewhere.

pkienzle · 2026-06-05T22:12:50Z

-        mask_all = data2D.mask[finite_mask]
+        data_all = data2D.ordinate.value[finite_mask]
+        qx_all = data2D._data_contents["Qx"].value[finite_mask]
+        qy_all = data2D._data_contents["Qy"].value[finite_mask]


You shouldn't be peeking at the internals of SasData from outside the class. You could make data_contents a public attribute. Since users of the class need to compose data_content for the constructor this is appropriate. Or use data2D.abscissae to get an array of (qx, qy) points.

Note: data2D.abscissae makes a copy but data2D.ordinate does not. This will trip somebody up some day. Also, properties should not be doing a lot of work. Maybe turn abscissae into function call, or maybe cache it.

Your type system allows calling with 1D data. Are you sure you don't want a subclass of SasData that you know to be 2D? Then you can have qx,qy accessors and the (qx,qy) -> (q, phi) transforms built into the class.

For q_all below, you already have masked qx_all and qy_all. If you are not using abscissae with the 2-norm or a SasData2D subclass with a q method you can use qx_all and qy_all to calculate q.

Moving the masking operation into SasData seems appropriate. It is an operation that many users of the data will want. You want to be clear on the copy semantics: is it masking in place, or returning a masked copy, even if the copy contains everything.

Comments on the preexisting code:

It is suspicious that this slicer has an isfinite mask but others do not.

Why the _all tag on the variable names?

pkienzle · 2026-06-05T22:14:46Z

+                                                 minor_axis=phi_axis,
+                                                 lims=(major_lims, minor_lims),
+                                                 nbins=self.nbins,
+                                                 base=self.base)


I'm surprised by the indentation style. Doesn't ruff break after '(' and indent by 4?

DrPaulSharp changed the base branch from master to refactor_24 May 19, 2026 11:28

This comment was marked as outdated.

Sign in to view

DrPaulSharp force-pushed the refactor_24_averaging branch from b149a6b to d0e7867 Compare May 19, 2026 15:30

This comment was marked as outdated.

Sign in to view

DrPaulSharp added 5 commits May 20, 2026 12:47

Refactors sasmanipulations to use new data objects for input

ebb93a5

Refactors sasmanipulations to use new data objects for output

068ee67

Adds new dataset type "angle_dim"

f7053ec

Uses ordinate property in manipulations code

0d76d66

Fixes use of "center" quantity value in tests

8d3eb43

DrPaulSharp force-pushed the refactor_24_averaging branch from d0e7867 to 8d3eb43 Compare May 20, 2026 11:49

codescene-delta-analysis Bot approved these changes May 20, 2026

View reviewed changes

DrPaulSharp marked this pull request as ready for review May 20, 2026 11:51

DrPaulSharp requested a review from krzywon May 20, 2026 11:51

krzywon reviewed Jun 4, 2026

View reviewed changes

pkienzle reviewed Jun 5, 2026

View reviewed changes

		dqx_data = np.sqrt(data2d._data_contents["Qx"].variance.value)
		dqy_data = np.sqrt(data2d._data_contents["Qy"].variance.value)

Conversation

DrPaulSharp commented May 19, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

codescene-delta-analysis Bot left a comment

Choose a reason for hiding this comment

Uh oh!

krzywon left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants