Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1388,6 +1388,21 @@ jobs:
if: always()
run: |
cat ./test/robot/reports/output.xml

- name: Upload Robot Reports Artifact
uses: actions/upload-artifact@v4.3.1
if: always() && github.repository == 'stackql/stackql-devel'
with:
name: stackql_darwin_amd64_robot_reports
path: test/robot/reports


- name: Upload Robot Tmp Artifact
uses: actions/upload-artifact@v4.3.1
if: always() && github.repository == 'stackql/stackql-devel'
with:
name: stackql_darwin_amd64_robot_tmp
path: test/robot/functional/tmp

- name: Run robot integration tests
if: env.AZURE_CLIENT_SECRET != '' && startsWith(env.STATE_SOURCE_TAG, 'build-release')
Expand Down
1 change: 1 addition & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,7 @@
"select lhs.proj, lhs.bucket from (select 'testing-project' as proj, 'silly-bucket' as bucket) lhs LEFT OUTER join (select name from google.storage.buckets where project = 'testing-project') rhs on lhs.bucket = rhs.name where rhs.name;",
"insert into google.storage.buckets( project, data__name) select lhs.proj, lhs.bucket from (select 'testing-project' as proj, 'silly-bucket' as bucket) lhs LEFT OUTER join (select name from google.storage.buckets where project = 'testing-project') rhs on lhs.bucket = rhs.name where rhs.name is null returning *;",
"select description, price_monthly, price_hourly from digitalocean.sizes.sizes where price_monthly = 7.0 order by description desc;",
"create or replace view vw_repos_name as select name from stackql_repositories; create or replace view vw_repos_url as select name, url from stackql_repositories; select v1.name from vw_repos_name v1 inner join vw_repos_url v2 on v1.name = v2.name;",
],
"default": "show providers;"
},
Expand Down
46 changes: 46 additions & 0 deletions docs/data_flow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@

# Data flow analysis in stackql

Data flow analysis is impplmented as multiple passes on:

- An inital abstract syntax tree (AST) from the parser.
- Annotated derivatives of the AST.
- `any-sdk` `{ provider, service, resource, method, schema... }` graphs.
- `gonum` DAG adaptations with data flow dependencies representing edges.

Some other aspects of data flow analysis:

- Relational algebra is implemented in a coupled RDBMS (embedded `sqlite` or `postgres` over TCP). There is a query rewriting process to stringify "containers" for this.
- There are `transaction control counter` objects and corresponding RDBMS columns to bound relational algebra "containers" and future proof for gargage collection. Some mutex protection is in place.
- Views in `stackql` permit clobbering of where clause arguments from outside the view. The canonical case is a document-based view in a provider document. A good example are in [test/registry/src/aws/v0.1.0/services/pseudo_s3.yaml](/test/registry/src/aws/v0.1.0/services/pseudo_s3.yaml)at `...s3_bucket_list_and_detail.config.views.select`; one can overwrite `region` here.
- Views, subqueries, materialized views and user space tables are modelled as "indirections".


## Open Issues

## Indirection Data Flow Analysis and Query Execution

Data flow analysis for indirections is not composable:

- It it impossible to join heterogenous collections of these with each other or conventional resources. There is no recusrsive and stable data flow analysis.
- While `stackql` does have a `max depth` parameter, I do not believe it is stable enfoced eagerly. Ie: queries too complex should fail at analysis time. Cannot remember param name of=r default.

The expected fix for this issue:

- Joins, unions etc on indirections work to arbitrary and configurable depth. For depth violations, failure is eager in the analysis phase and error message is plain and in the canonical err stream already widely used.
- Data flow analysis includes assurance on reuired poarams and viability of projections, joins, etc.
- Support for CTEs internal to these indirections is in place.
- Mocked robot tests are added to the canonical test suite, covering off this function.


## Glossary of terms

| Term | Expansion |
|---|---|
| AST | Abstract Syntax Tree |
| CTE | Common Table Expression |
| DAG | Directed Acyclic Graph |
| GC | Garbage Collection |
| RDBMS | Relational Database Management System |
| TCP | Transmission Control Protocol |
| | |
54 changes: 50 additions & 4 deletions docs/views.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

# Views

## *a priori*
## *a priori*

At definition time, it is apparent:

Expand All @@ -24,22 +24,68 @@ The runtime representation of views must support:
- StackQL views DDL stored in some special stackql table designated for this purpose.
- Physical table name such as `__iql__.views`.
- Views need not exist until the `SELECT ... FROM <view>` portion of the query is executed.
This is advantagesous on RDBMS systems where view creation will fail if physical tables do not exist.
This is advantageous on RDBMS systems where view creation will fail if physical tables do not exist.
- We may need a layer of indirection for views to execute, wrt table names containing generation ID.
Simplest option is input table name.
- SQL view definitions (translated to physical tables) are stored in the RDBMS.
- This implies that even quite early in analysis, it must be known that a view is being referenced.
- Some part of the namespace must be reserved for these views; configurable using existing regex / template namespacing?
- Quite possibly some specialised object(s) or extension of the `table` interface stages are used for view analysis and parameter routing.
- Once analysis is complete:
- Acquistion occurs as normal through primitive DAG.
- Acquisition occurs as normal through primitive DAG.
- Selection phase uses physical views.

## Materialized views

Materialized views are similar in nature to views, although eager executed and lacking in mutation of internal `WHERE` clauses from outside.

## User space tables

These map to RDBMS tables. The DDL is somewhat impaired; we imagine these are useful for staging in general and applications across: ELT, IAC.


## Subqueries

Some aspects of subquery analysis and execution will be similar to views, but not all. What are the considerations for view implementation in the short term such that subsequent subquery implmentation is expedited and natural.
Some aspects of subquery analysis and execution will be similar to views, but not all. What are the considerations for view implementation in the short term such that subsequent subquery implementation is expedited and natural.

To be continued...


## Joins and aliasing on Views etc

### Views (lazy evaluated)

Views are rendered as inline subqueries `( SELECT ... ) AS "alias"` in the final SQL. When a user alias is provided (e.g. `FROM my_view v1`), the alias `v1` replaces the view name in the `AS` clause.

**Supported:**
- View aliased and selected from: `SELECT * FROM my_view v1`.
- View JOIN view: `SELECT ... FROM v1 INNER JOIN v2 ON ...`.
- View JOIN provider table: `SELECT ... FROM my_view v1 INNER JOIN provider.svc.resource r ON ...`.
- View JOIN subquery: `SELECT ... FROM my_view v1 INNER JOIN (SELECT ...) sq ON ...`.
- View JOIN materialized view: `SELECT ... FROM my_view v1 INNER JOIN mv ON ...`.
- Nested views (view wrapping a view): supported up to configurable depth (`--indirect-depth-max`, default 5).
- WHERE clause parameter clobbering from outside the view, using **unqualified** parameters (e.g. `WHERE region = 'us-east-1'`).

**Not supported:**
- Table-qualified parameter clobbering into views (e.g. `WHERE v1.region = 'us-east-1'` will not override the view's internal `region` parameter).
- Joins of three or more heterogeneous indirections (e.g. `view JOIN subquery JOIN provider_table`). Binary joins work; three-way and beyond fail with parameter count mismatches in the SQL composition layer.

### Materialized views (eager evaluated)

Materialized views are persisted as physical tables in the RDBMS. They are referenced by their table name directly (not as inline subqueries).

**Supported:**
- Materialized view aliased and selected from.
- Materialized view joined with provider tables, user space tables, views and subqueries.
- `CREATE`, `DROP`, `REFRESH`, `CREATE OR REPLACE` lifecycle.

**Not supported:**
- WHERE clause parameter clobbering from outside (materialized views are snapshot-based).

### Subqueries

Subqueries appear as inline `( SELECT ... )` expressions. CTEs (`WITH ... AS`) are converted to subqueries at AST level and handled identically.

### User space tables

User space tables are RDBMS-resident tables created via `CREATE TABLE`. They can participate in joins with any other indirection type.
5 changes: 3 additions & 2 deletions internal/stackql/acid/tsm_physio/best_effort_orchestrator.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ package tsm_physio //nolint:revive,stylecheck // prefer this nomenclature

import (
"fmt"
"strings"

"github.com/stackql/stackql-parser/go/vt/sqlparser"
"github.com/stackql/stackql/internal/stackql/acid/binlog"
"github.com/stackql/stackql/internal/stackql/acid/tsm"
"github.com/stackql/stackql/internal/stackql/acid/txn_context"
Expand Down Expand Up @@ -42,7 +42,8 @@ func (orc *bestEffortOrchestrator) processQueryOrQueries(
) ([]internaldto.ExecutorOutput, bool) {
var retVal []internaldto.ExecutorOutput
cmdString := handlerCtx.GetRawQuery()
for _, s := range strings.Split(cmdString, ";") {
splitQueries, _ := sqlparser.SplitStatementToPieces(cmdString)
for _, s := range splitQueries {
response, hasResponse := orc.processQuery(handlerCtx, s)
if hasResponse {
retVal = append(retVal, response...)
Expand Down
5 changes: 3 additions & 2 deletions internal/stackql/acid/tsm_physio/txn_orchestrator.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ package tsm_physio //nolint:stylecheck,revive // prefer this nomenclature

import (
"fmt"
"strings"

"github.com/stackql/any-sdk/pkg/constants"
"github.com/stackql/stackql-parser/go/vt/sqlparser"
"github.com/stackql/stackql/internal/stackql/acid/tsm"
"github.com/stackql/stackql/internal/stackql/acid/txn_context"
"github.com/stackql/stackql/internal/stackql/handler"
Expand Down Expand Up @@ -68,7 +68,8 @@ func (orc *standardOrchestrator) processQueryOrQueries(
) ([]internaldto.ExecutorOutput, bool) {
var retVal []internaldto.ExecutorOutput
cmdString := handlerCtx.GetRawQuery()
for _, s := range strings.Split(cmdString, ";") {
splitQueries, _ := sqlparser.SplitStatementToPieces(cmdString)
for _, s := range splitQueries {
response, hasResponse := orc.processQuery(handlerCtx, s)
if hasResponse {
retVal = append(retVal, response...)
Expand Down
22 changes: 19 additions & 3 deletions internal/stackql/astanalysis/earlyanalysis/ast_expand.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ import (
"fmt"
"strings"

"github.com/stackql/any-sdk/pkg/constants"
"github.com/stackql/any-sdk/pkg/logging"
"github.com/stackql/stackql/internal/stackql/astanalysis/annotatedast"
"github.com/stackql/stackql/internal/stackql/astindirect"
Expand Down Expand Up @@ -141,14 +140,31 @@ func (v *indirectExpandAstVisitor) processCTEReference(
}

func (v *indirectExpandAstVisitor) processIndirect(node sqlparser.SQLNode, indirect astindirect.Indirect) error {
// Eager depth check: fail before recursively analyzing an indirection that would exceed the limit.
if v.indirectionDepth+1 > v.handlerCtx.GetRuntimeContext().IndirectDepthMax {
return fmt.Errorf(
"query error: indirection chain length %d > %d and is therefore disallowed; please do not cite views at too deep a level", //nolint:lll
v.indirectionDepth+1,
v.handlerCtx.GetRuntimeContext().IndirectDepthMax,
)
}
err := indirect.Parse()
if err != nil {
return nil //nolint:nilerr //TODO: investigate
}
// Filter parent WHERE params to only pass down unqualified (alias-free) entries.
// Aliased params like "r.org" reference specific outer tables and must not
// leak into child indirection analysis, where the alias would be unresolvable.
filteredWhereParams := parserutil.NewParameterMap()
for k, val := range v.whereParams.GetMap() {
if k.Alias() == "" {
filteredWhereParams.Set(k, val) //nolint:errcheck // best effort
}
}
childAnalyzer, err := NewEarlyScreenerAnalyzer(
v.primitiveGenerator,
v.annotatedAST,
v.whereParams.Clone(),
filteredWhereParams,
v.indirectionDepth+1,
)
if err != nil {
Expand Down Expand Up @@ -178,7 +194,7 @@ func (v *indirectExpandAstVisitor) processIndirect(node sqlparser.SQLNode, indir
return fmt.Errorf(
"query error: indirection chain length %d > %d and is therefore disallowed; please do not cite views at too deep a level", //nolint:lll
maxIndirectCount,
constants.LimitsIndirectMaxChainLength,
v.handlerCtx.GetRuntimeContext().IndirectDepthMax,
)
}
indirectPrimitiveGenerator.GetPrimitiveComposer().GetAst()
Expand Down
13 changes: 11 additions & 2 deletions internal/stackql/astvisit/from_rewrite.go
Original file line number Diff line number Diff line change
Expand Up @@ -650,6 +650,7 @@ func (v *standardFromRewriteAstVisitor) Visit(node sqlparser.SQLNode) error {

case *sqlparser.AliasedTableExpr:
var exprStr, partitionStr string
aliasHandledByIndirect := false
if node.Expr != nil {
anCtx, ok := v.annotations[node]
if !ok {
Expand All @@ -664,9 +665,17 @@ func (v *standardFromRewriteAstVisitor) Visit(node sqlparser.SQLNode) error {
indirectType := indirect.GetType()
switch indirectType {
case astindirect.ViewType:
templateString := fmt.Sprintf(` ( %%s ) AS "%s" `, name)
// Use the user-specified alias if present, otherwise the view name.
// The alias is embedded in the template to prevent double aliasing
// when the node.As fallthrough at the end of this case would append it again.
viewAlias := name
if !node.As.IsEmpty() {
viewAlias = node.As.GetRawVal()
}
templateString := fmt.Sprintf(` ( %%s ) AS "%s" `, viewAlias)
v.rewrittenQuery = templateString
v.indirectContexts = append(v.indirectContexts, indirect.GetSelectContext())
aliasHandledByIndirect = true
case astindirect.SubqueryType:
// Note: CTEs are converted to SubqueryType at AST level,
// so this path handles both regular subqueries and CTEs.
Expand Down Expand Up @@ -726,7 +735,7 @@ func (v *standardFromRewriteAstVisitor) Visit(node sqlparser.SQLNode) error {
partitionStr = v.GetRewrittenQuery()
}
q := fmt.Sprintf("%s%s", exprStr, partitionStr)
if !node.As.IsEmpty() {
if !node.As.IsEmpty() && !aliasHandledByIndirect {
node.As.Accept(v)
asStr := v.GetRewrittenQuery()
q = fmt.Sprintf("%s as %v", q, asStr)
Expand Down
34 changes: 21 additions & 13 deletions internal/stackql/cmd/shell.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ import (

"github.com/stackql/any-sdk/pkg/dto"
"github.com/stackql/any-sdk/pkg/logging"
"github.com/stackql/stackql-parser/go/vt/sqlparser"
"github.com/stackql/stackql/internal/stackql/config"
"github.com/stackql/stackql/internal/stackql/driver"
"github.com/stackql/stackql/internal/stackql/entryutil"
Expand Down Expand Up @@ -225,20 +226,27 @@ var shellCmd = &cobra.Command{
if inlineCommentIdx > -1 {
line = line[:inlineCommentIdx]
}
semiColonIdx := strings.Index(line, ";")
if semiColonIdx > -1 {
line = strings.TrimSpace(line[:semiColonIdx+1])
subSemiColonIdx := strings.Index(line, ";")
sb.WriteString(" " + line[:subSemiColonIdx+1])
rawQuery := sb.String()
queryToExecute, qErr := entryutil.PreprocessInline(runtimeCtx, rawQuery)
if qErr != nil {
io.WriteString(outErrFile, "\r\n"+qErr.Error()+"\r\n") //nolint:errcheck // TODO: investigate
hasRHSSemiColon := strings.HasSuffix(strings.TrimSpace(line), ";")
splitQueries, _ := sqlparser.SplitStatementToPieces(line)
if len(splitQueries) > 0 {
for i, s := range splitQueries {
if i == len(splitQueries)-1 && !hasRHSSemiColon {
// Last piece has no trailing semicolon;
// accumulate for multi-line continuation.
sb.Reset()
sb.WriteString(s)
continue
}
sb.WriteString(" " + s)
rawQuery := sb.String()
queryToExecute, qErr := entryutil.PreprocessInline(runtimeCtx, rawQuery)
if qErr != nil {
io.WriteString(outErrFile, "\r\n"+qErr.Error()+"\r\n") //nolint:errcheck // TODO: investigate
}
l.WriteToHistory(rawQuery) //nolint:errcheck // TODO: investigate
sessionRunnerInstance.RunCommand(queryToExecute)
sb.Reset()
}
l.WriteToHistory(rawQuery) //nolint:errcheck // TODO: investigate
sessionRunnerInstance.RunCommand(queryToExecute)
sb.Reset()
sb.WriteString(line[subSemiColonIdx+1:])
} else {
sb.WriteString(" " + line)
}
Expand Down
Loading
Loading