cl.col() and expression syntax #779

genedan · 2026-05-14T02:12:07Z

genedan
May 14, 2026
Maintainer

Pandas 3.0 comes with pd.col() which allows you to replace lambda functions, form expressions, and do method chaining. Other DataFrame libraries such as pyspark and Polars already have this.

If Chainladder were to adopt this we could do something like

triangle.assign(
   reported=cl.col('paid') + cl.col('case'),
   reported_sev = cl.col('reported') / cl.col('reported_claim_count')
)

If we borrow from Polars you'd be able to use filtering syntax such as:

triangle.filter(cl.col("company") == 'Equity Funding Corporation of America')

I'm sure there are more creative examples of cl.col that you can think of. It's expected that existing Pandas functions will accept column syntax, so we should keep an eye out for how popular usage will change and incorporate it if people are expecting it.

henrydingliu · 2026-05-16T17:22:28Z

henrydingliu
May 16, 2026
Maintainer

the first example you supplied would be exceedingly difficult to implement. neither spark's withColumns nor polars' with_columns allow you to create reported then use it to calculate reported_sev in the same invocation. new users occasionally complain, but semi-old heads don't mind it because it's the same way in sql (i.e. if you define reported in one select statement, you have to use a new select statement to reference it). i say semi-old heads because real old heads know that SAS did support it, and i know people who resisted to move off of SAS for years because of it. i just don't see why we need to tackle it for our little niche package when much more general purpose packages haven't.

i also question how useful cl.col would be in practice. triangle manipulation comes in two buckets. simple manipulation (case reserve + payment = reported) or some gnarly origin or development level adjustment like trend. basically, we are never do anything too funky at the triangle level. anything we do at the origin/development level should have a dedicated function (like trend) to handle, rather than asking user to wrangle on their own through cl.col().

(reading this back, i realize i didn't make it clear that i'm automatically assuming that we should never let cl.col to pull from the index. pd['index_name'] doesn't work today and pd.col('index_name') probably doesn't work either. so cl.col('company') also shouldn't work. the analogue would only be cl.col('paid') or cl.col('incurred').)

col (and its complement lit) exists as a necessity for spark and polars because these dataframes have to deal with strings as values. so having col and lit makes the syntax unambiguous. otherwise filter('status' == 'closed') becomes impossible to parse. so we introduce col and lit into the library and then write filter(col('status') == lit('closed')) instead. we don't have strings as values in chainladder. we have strings as index and column names only. so we can entirely avoid needing col and lit.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cl.col() and expression syntax #779

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

cl.col() and expression syntax #779

Uh oh!

genedan May 14, 2026 Maintainer

Replies: 1 comment

Uh oh!

Uh oh!

henrydingliu May 16, 2026 Maintainer

genedan
May 14, 2026
Maintainer

henrydingliu
May 16, 2026
Maintainer