fix: allow a column to share its name with the source relation#6020
fix: allow a column to share its name with the source relation#6020lukapeschke wants to merge 1 commit into
Conversation
A derived/selected column whose name matched the source relation (e.g.
`from sales.sales | derive {sales = ...}`) clobbered the relation's input
namespace, which holds the inference template and redirect used to resolve
every other column — so all sibling columns failed with "Unknown name".
Nest the shadowing column under a reserved `NS_SHADOWING_COL` key inside the
input namespace instead of overwriting it. Leaf access (`this.sales`) resolves
to the shadowing column with a clean emitted name, while qualified access
(`sales.col`) and column inference still go through the preserved relation
namespace. Also fix `deduplicate_select_items`, which conflated a table
qualifier with a same-named column alias and dropped the aliased column.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
@max-sixty This one was trickier to fix than I first thought, because it's easy to introduce a regression here 😬 which is why there are so many new tests I believe the change is now robust, but if another human could have a look I'd feel safer 😄 |
| // `x2 = this.bar` resolves to the relation `bar` (the shadowing column | ||
| // is only added later in the tuple), expanding to `*` and losing `x2`. | ||
| assert_snapshot!(crate::tests::compile( | ||
| "from bar | derive { x2 = this.bar, bar = this.a }" |
There was a problem hiding this comment.
I'm not sure this makes sense. If I as a user wrote from bar | derive { x2 = this.bar } I would not expect the generated SQL to be SELECT *, * FROM BAR. I would be expecting that this.bar would be referring to a column inside the bar relation named bar, so the generated SQL ought to be SELECT *, bar AS x2 FROM bar. This is what's generated if the derive references any other name -- from bar | derive { x2 = this.baz } results in SELECT *, baz AS x2 FROM bar.
There was a problem hiding this comment.
I guess maybe the broader question is, why does this.bar resolve to the parent relation bar? It seems that specifying this.anything should always resolve anything to only within the namespace of the current relation and never to the parent namespace. Although maybe I'm missing some currently established bit of PRQL functionality...
There was a problem hiding this comment.
Two things worth separating here, with the behavior verified against the compiler:
This is pre-existing — not introduced by this PR. On main (aab7438, before this branch) the same query already produces the *:
from bar | derive { x2 = this.bar }
→ SELECT *, * FROM bar
The PR documents it in test_column_shadows_relation_name_forward_reference rather than changing it, and it's orthogonal to what this PR actually fixes (a shadowing derived column like derive { bar = ... } previously broke resolution of all its sibling columns).
Why this.bar resolves to the relation: from bar binds the relation itself to the name bar in the current (this) namespace. this.bar looks bar up there, finds the relation, and expands it to *. A name that doesn't collide with the relation — this.baz — finds nothing under that name and is inferred as a column, giving baz AS x2. So the asymmetry you're seeing is the bare leaf colliding with the relation's own name in this.
To reach a column named bar today, the qualified form does the right thing:
from bar | derive { x2 = bar.bar }
→ SELECT *, bar AS x2 FROM bar
Your broader question — should this.X only ever resolve within the relation's columns and never to the relation name itself — is a real design question, but a wider change than this PR's scope. That call is the maintainers'; flagging that the answer to "am I missing established functionality?" is: yes, from <rel> putting <rel> into the this namespace is the current established behavior, and <rel>.<col> is the qualified escape hatch.
There was a problem hiding this comment.
There's no sense in adding a test to the test suite that encodes bad behavior -- that's why this question is relevant against this PR. I'm fine with this test if the behavior is intended but would suggest dropping the test if we'd like to fix this in the future.
That said, what design intent is met by adding the name of the current relation to the this namespace? Is there a reason why I would want to refer to this.bar and have that reflect a relation rather than a column?
There was a problem hiding this comment.
Good question — I traced this through the resolver and verified each result against the compiler on this branch.
Short version: this isn't a flat bag of columns. Each input relation is registered in this as a named sub-namespace, keyed by the relation's name. That registration is exactly what makes qualified access work — bar.x, and after a join the disambiguation between bar.x and foo.y (you can even write this.bar.x). So bar lives in this for the same reason foo does after a join: it's an input. The bare leaf this.bar then collides with that input name, and an input resolves to its wildcard; bar.bar is the escape hatch to the column. So it isn't a parent-namespace leak — the relation is a first-class member of this, colliding with a column of the same name inside that one namespace.
On "encoding bad behavior": this PR actually moves toward the resolution you're describing. When a derived column realizes a name that collides with the relation, the column now wins:
from bar | derive { bar = this.a } | select { this.x, this.bar }
→ SELECT x, a AS bar FROM bar # this.bar resolves to the column, not *
The residual that test_column_shadows_relation_name_forward_reference documents is narrower than the general case — it's a forward reference within the same tuple, where this.bar is read before the shadowing column is added:
from bar | derive { x2 = this.bar, bar = this.a }
→ SELECT *, *, a AS bar FROM bar # x2 = this.bar still hits the relation
Whether that narrow residual is "bad behavior to drop the test for" or "intentional, with bar.bar as the documented escape hatch" is genuinely a maintainer design call, so I won't adjudicate it. The two readings are clear though: if the intent is that leaf this.<rel> should always prefer a column (your broader proposal), then the test does encode behavior you'd want to change, and dropping it or marking it a known-limitation TODO would signal that; if the wildcard expansion is accepted as-is, keeping the test as documentation of the edge is reasonable.
Mechanism + evidence
- Input relations are inserted into
thisby name when a pipeline step resolves:resolver/functions.rs:290(insert_frame(frame, NS_THIS)). - Leaf-name precedence is in
module.rslookup: a column nested underNS_SHADOWING_COLwins; otherwise the bare relation/input name is returned and expands to*. The shadowing column is nested rather than overwriting the input atmodule.rs:487-502. - Every SQL block above was produced by
prqlc compileon this branch (b992209), not hand-derived.
There was a problem hiding this comment.
@kgutwin Thank you for your review :) The test was mostly added to flag existing behavior that I'll try to address later. If you want me to, I can delete the test, or put the correct expected output in it and skip it with an associated issue number ?
Regarding your question:
That said, what design intent is met by adding the name of the current relation to the this namespace? Is there a reason why I would want to refer to this.bar and have that reflect a relation rather than a column?
I believe this is a side effect of a bare this meaning * to have the count this resolve to COUNT(*): https://prql-lang.org/book/reference/syntax/keywords.html#this--that
I grepped through teh docs a little and this.bar referring to a "everything in the bar relation" is not documented so I'd be in favor of removing that behavior. We need to keep supporting count this though since it's in the docs.
Also, I'm not sure what a bare this should mean in other contetxts ? for example what should the behavior be for from bar | select this ? Could be SELECT * FROM bar, could be an error where we suggest using select this.* or explicit column names instead.
I'm happy to help on that but the design is not my decision to make, so I think the work on it should happen in another PR.
Please let me know what you think the behavior should be. I'd be happy over input from other maintainers as well 😄
A derived/selected column whose name matched the source relation (e.g.
from sales.sales | derive {sales = ...}) clobbered the relation's input namespace, which holds the inference template and redirect used to resolve every other column — so all sibling columns failed with "Unknown name".Nest the shadowing column under a reserved
NS_SHADOWING_COLkey inside the input namespace instead of overwriting it. Leaf access (this.sales) resolves to the shadowing column with a clean emitted name, while qualified access (sales.col) and column inference still go through the preserved relation namespace. Also fixdeduplicate_select_items, which conflated a table qualifier with a same-named column alias and dropped the aliased column.