feat(cimcheck): add CIM/CGMES SPARQL and SHACL validation toolchain#10
Open
spah-soptim wants to merge 59 commits into
Open
feat(cimcheck): add CIM/CGMES SPARQL and SHACL validation toolchain#10spah-soptim wants to merge 59 commits into
spah-soptim wants to merge 59 commits into
Conversation
…update validation
…te related classes
…L shape analyzer rdf:type and other standard W3C terms appearing in SHACL sh:path expressions (e.g. sequence paths like (cim:Prop rdf:type)) were incorrectly flagged as UNKNOWN_PROPERTY because ShaclShapeAnalyzer lacked the same exempt-namespace guard that AlgebraAnalysisVisitor already applies for SPARQL queries.
…es for embedded SPARQL Two improvements to embedded SPARQL validation in SHACL shapes: 1. Prefix fallback: ShaclSparqlExtractor.resolvePrefixes() now seeds the prefix map from the graph's own @Prefix declarations before applying sh:declare entries. This fixes SYNTAX_ERROR ("Unresolved prefixed name") for SHACL files that use sh:prefixes <X> without a corresponding sh:declare block on node X — the cim: prefix is still available from the Turtle file header. 2. \$PATH substitution: EmbeddedSparql gains a shPaths field populated with the simple-URI sh:path values of enclosing sh:PropertyShape nodes. SparqlValidationApi replaces \$PATH / ?PATH variable predicates with the concrete URI before analysis, eliminating the UNSUPPORTED_DYNAMIC_PROPERTY warning for the SHACL-standard \$PATH pattern.
…prefix fallback Reflects three improvements shipped in the preceding commits: - sh:path validation: standard vocabulary terms (rdf:type, rdfs:*, owl:*, etc.) are now exempt from UNKNOWN_PROPERTY checks; updated the SHACL structural checks table and the existing known-limitation comment. - Dynamic predicates: added an exception note to the "Dynamic predicates and classes" section explaining that SHACL \$PATH variables are resolved via the enclosing sh:PropertyShape.sh:path before analysis. Updated the SHACL API description to mention \$PATH substitution alongside \$this typing. - Prefix fallback: clarified in the Pass 2 description that graph-level Turtle @Prefix declarations are used as a fallback when sh:prefixes target nodes carry no sh:declare blocks.
Intermediate variables in SHACL embedded constraints are transient bindings, not entities the author is expected to annotate with rdf:type. Filtering QUERY_IMPLIED_TYPE (INFO) from embedded results eliminates noise when validating ENTSO-E CGMES 3.0 SHACL files where every SELECT uses pattern variables alongside domain-having properties.
…ePath ENTSO-E CGMES SHACL files use sh:alternativePath to list the same property under multiple CIM namespace URIs (e.g. cim16, CIM100, ucaiug.io) for cross-version compatibility. Checking each alternative independently flagged the aliases as UNKNOWN_PROPERTY even when one variant was valid for the loaded profiles. An unknown alternative is now silently suppressed when at least one sibling in the same sh:alternativePath group is a known property with the same local name. Alternatives whose local names differ from every known sibling are still flagged, preserving detection of genuine typos.
…tVocabulary Dead code removed: - SparqlValidationCode.TERM_EXISTS_IN_OTHER_PROFILE (never emitted) - GraphReference.Source.UPDATE_TEMPLATE (never used) - SparqlQueryValidator.intersect() (orphaned private method) Correctness fixes: - SemanticChecks.anySubclassMatch: removed incorrect reverse-direction check that caused false-positive PATH_CHAIN_INCOMPATIBLE suppressions - SparqlValidationApi: use String.replace() instead of replaceAll() for $PATH/$?PATH substitution to avoid regex back-reference interpretation of URIs with '$' - SparqlValidationApi: merge profileDeps/updateProfileDeps into single collectProfileDeps to eliminate duplicated logic API additions: - ShaclValidationResult.isValid(StrictnessLevel): overload for caller-controlled strictness filtering; zero-arg isValid() now delegates to it Refactoring: - Extract ExemptVocabulary shared class from duplicate EXEMPT_NAMESPACES lists in AlgebraAnalysisVisitor and ShaclShapeAnalyzer
LSP fixes: - SparqlWorkspaceService.didChangeWatchedFiles: guard against null params.getChanges() (LSP spec allows omitting the array; this caused NPE on some clients) - SparqlTextDocumentService: remove pending.remove(uri) at start of validateSparql/ validateShacl — races with newer scheduled tasks and removes their cancellation entry - SparqlTextDocumentService.shutdown: use shutdownNow() so pending debounce tasks are not waited on during LSP server shutdown - SparqlTextDocumentService.convertSparqlAnnotation: use tokenLengthInSource() for term-based paths instead of delegating to DiagnosticConverter (which uses full URI length + 2, wrong for prefixed-name tokens in source) - SparqlTextDocumentService.turtleParseErrorDiagnostic: guard e.getMessage() null (some Jena parse exceptions have no message; rendered as "null" in UI) - SchemaManager.shutdown: awaitTermination(2s) before shutdownNow to allow in-flight schema loads to complete gracefully - DefinitionIndex.findSymbols: collect all matches then sort-and-cap at MAX_SYMBOLS so properties are not starved when many classes match the query CLI fixes: - SchemaLoader: pass FileVisitOption.FOLLOW_LINKS to Files.walk so symlinked schema directories are traversed correctly - ValidateCommand: read stdin only once; cache in stdinText so '-' passed multiple times returns the same content rather than empty on the second read
VS Code:
- buildClient: pass context to allow traceOutputChannel to be registered in
subscriptions, preventing leak on extension deactivation
- Remove redundant { dispose: () => client?.stop() } subscription — deactivate()
already calls client.stop(), so this caused double-stop on VS Code shutdown
- JDWP debug address changed from 5005 to 127.0.0.1:5005 to bind only to
localhost and prevent remote debug access
CI:
- Rename "Setup Node.js 18" step labels to "Node.js 20" to match the actual
node-version: "20" that was already in use
- SchemaManager: isolate on-loaded callbacks in per-callback try/catch so a failing callback cannot clear a successfully-loaded schema - SparqlValidationApi: remove getGraphDependencies(String, Collection<VersionIri>) overload that silently ignored its profiles parameter; remove the corresponding test that only documented the dead-parameter behaviour - SparqlTextDocumentService: replace wildcard java.util.concurrent.* import with explicit imports (CompletableFuture, ConcurrentHashMap, Executors, ScheduledExecutorService, ScheduledFuture, TimeUnit) - ShaclSparqlExtractor: fix stray brace left by iterator-close refactor
Resolve three named-graph problems: 1. Relative graph names (<EQ>, <TP>) were resolved against the JVM working directory by Jena, producing file:// URIs that never matched any configured profile. SparqlQueryAnalyzer now passes a stable urn:x-cimcheck:base/ base to QueryFactory/UpdateFactory so relative refs always resolve predictably. 2. namedGraphs was dead config in the LSP: SchemaManager now builds a NamedGraphProfileScope from the config after each schema load and exposes it via namedGraphScope(). SparqlTextDocumentService uses the scope when present, falling back to AllProfilesScope (no GRAPH_NOT_CONFIGURED warnings) when not configured. 3. namedGraphs value type changed from string to array of strings (Map<String,String> -> Map<String,List<String>>) so a single graph can be mapped to multiple profiles. JSON schema updated accordingly. Relative config keys (e.g. "EQ") are matched against the urn:x-cimcheck:base/ base so users can write the same short name in both the query and the config without needing full URIs.
- Update namedGraphs examples from string values to array of strings
- Document short relative graph names ("EQ", "TP") matching FROM NAMED <EQ>
- Clarify default behaviour: no namedGraphs = all profiles, no warnings
- Add named graph scope summary table in core README
- Update VS Code README namedGraphs description and example
Common prefixes (rdf, rdfs, owl, xsd, sh, cim, md) are now automatically
prepended to any SPARQL query or update that does not already declare them,
so users no longer need to repeat PREFIX lines in every file.
Custom prefixes can be supplied via "prefixes" in .cgmes/validation.json;
an explicit object replaces the built-in set entirely, and {} disables all
injection. Line numbers in annotations and Jena error messages are adjusted
back to the original query coordinates after injection.
Instead of hardcoding cim: -> http://iec.ch/TC57/CIM100# (CGMES 3.0 only), scan all class and property IRIs in the loaded schema index, tally their namespace prefixes, and use the dominant one as the cim: default prefix. A namespace wins when it has >= 10 terms and > 2x as many as the next candidate — so mixed-version schemas (2.4.15 + 3.0 loaded together) produce no winner and leave cim: to the user's explicit prefixes config. The detection runs in DefaultPrefixes.withDetectedCimPrefix() and is called from the single-arg SparqlValidationApi constructor, SchemaManager, and ValidateCommand whenever the user has not supplied an explicit prefixes map.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces
CIMCheck— a static analysis toolchain for SPARQL queries and SHACL shapes written against CIM/CGMES schemas. It ships as three Maven modules plus a VS Code extension:cimcheck/core— validation engine: SPARQL algebra visitor, semantic domain/range checks, SHACL shape analyzer, RDFS schema index, and a high-levelSparqlValidationApifaçadecimcheck/lsp— Language Server Protocol server with debounced diagnostics, hover documentation, go-to-definition, and CIM term completioncimcheck/cli— batch validation CLI (cimcheck) with text and JSON output, configurable strictness, and auto-discovery of.cgmes/validation.jsoncimcheck/vscode— VS Code extension wrapping the LSP; ships the fat JAR or accepts an explicitserverJarsettingAdditionally:
cimxml(rdfs:Literalblank-node style)What gets validated
sh:targetClass,sh:class,sh:pathexistence;sh:nodeKind/range compatibility;sh:datatype/sh:classvs range;sh:minCount/sh:maxCountcontradiction; all embeddedsh:select/sh:ask/sh:constructSPARQL fragmentsStrictness is configurable (
permissive/default/strict/pedantic) per-file via.cgmes/validation.jsonor via CLI flag.Notable design decisions
validateSparql()tries query parse first, falls back to update, then attempts;-separated multi-query splitting — no caller configuration required.$PATHsubstitution: embedded SPARQL constraints that reference$PATH/?PATHhave the enclosingsh:pathURI substituted before static analysis, avoiding falseUNSUPPORTED_DYNAMIC_PROPERTYwarnings.sh:alternativePathalias suppression: an unknown alternative in a cross-version compatibility path (cim:Foo | <cim100#Foo> | <ucaiug#Foo>) is suppressed when a sibling with the same local name is known, avoiding noise on multi-namespace shapes.Test coverage
Unit and integration tests cover algebra traversal, semantic validation, squiggle position mapping, SHACL constraint checks,
$thisbinding, SPARQL UPDATE, source locator, and full CGMES 2.4 / 3.0 profile integration.