Replies: 1 comment
-
|
the first example you supplied would be exceedingly difficult to implement. neither spark's withColumns nor polars' with_columns allow you to create reported then use it to calculate reported_sev in the same invocation. new users occasionally complain, but semi-old heads don't mind it because it's the same way in sql (i.e. if you define reported in one select statement, you have to use a new select statement to reference it). i say semi-old heads because real old heads know that SAS did support it, and i know people who resisted to move off of SAS for years because of it. i just don't see why we need to tackle it for our little niche package when much more general purpose packages haven't. i also question how useful cl.col would be in practice. triangle manipulation comes in two buckets. simple manipulation (case reserve + payment = reported) or some gnarly origin or development level adjustment like trend. basically, we are never do anything too funky at the triangle level. anything we do at the origin/development level should have a dedicated function (like trend) to handle, rather than asking user to wrangle on their own through cl.col(). (reading this back, i realize i didn't make it clear that i'm automatically assuming that we should never let cl.col to pull from the index. pd['index_name'] doesn't work today and pd.col('index_name') probably doesn't work either. so cl.col('company') also shouldn't work. the analogue would only be cl.col('paid') or cl.col('incurred').) col (and its complement lit) exists as a necessity for spark and polars because these dataframes have to deal with strings as values. so having col and lit makes the syntax unambiguous. otherwise filter('status' == 'closed') becomes impossible to parse. so we introduce col and lit into the library and then write filter(col('status') == lit('closed')) instead. we don't have strings as values in chainladder. we have strings as index and column names only. so we can entirely avoid needing col and lit. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Pandas 3.0 comes with
pd.col()which allows you to replace lambda functions, form expressions, and do method chaining. Other DataFrame libraries such as pyspark and Polars already have this.If Chainladder were to adopt this we could do something like
If we borrow from Polars you'd be able to use filtering syntax such as:
I'm sure there are more creative examples of
cl.colthat you can think of. It's expected that existing Pandas functions will accept column syntax, so we should keep an eye out for how popular usage will change and incorporate it if people are expecting it.Beta Was this translation helpful? Give feedback.
All reactions