Node.js CLI for DataBook semantic documents — Markdown files that carry typed RDF/SPARQL/SHACL payloads alongside human-readable prose and self-describing YAML frontmatter.
Spec: github.com/kurtcagle/databook · Namespace: https://w3id.org/databook/ns#
# From the package directory
npm install -g .
# Or run directly
node bin/databook.js <command>Requires Node.js ≥ 18.0.0 (uses native fetch).
A guide to the YAML Front Matter and data block properties for databooks is available here.
Extracts frontmatter and block metadata. Never modifies input. Useful for pipeline inspection and conditional branching.
# Default: frontmatter + block summary as JSON
databook head source.databook.md
# Specific block metadata
databook head source.databook.md --block-id primary-block
# All output formats
databook head source.databook.md --format json # default
databook head source.databook.md --format yaml
databook head source.databook.md --format xml
databook head source.databook.md --format turtle
# Pipeline use: extract block ids with role=primary
databook head source.databook.md --format json \
| jq -r '.blocks[] | select(.role == "primary") | .id'
# Check triple count before processing
TRIPLE_COUNT=$(databook head source.databook.md --format json \
| jq '.frontmatter.graph.triple_count')
# Stdin
cat source.databook.md | databook head --format yamlPushes RDF blocks to a SPARQL-compatible triplestore via the SPARQL 1.1 Graph Store Protocol (GSP). Each block becomes a discrete named graph. Frontmatter provenance is pushed to a #meta graph by default.
Pushable block types: turtle, turtle12, trig, json-ld, shacl, sparql-update
# Push all RDF blocks to Fuseki
databook push ontology.databook.md \
--endpoint http://localhost:3030/ds/sparql
# Push one block with an explicit graph IRI
databook push ontology.databook.md \
--block-id primary-block \
--graph https://example.org/my-graph \
--endpoint http://localhost:3030/ds/sparql
# Merge (POST) instead of replace (PUT)
databook push ontology.databook.md --endpoint ... --merge
# Suppress meta graph
databook push ontology.databook.md --endpoint ... --no-meta
# Dry-run to see what would be sent
databook push ontology.databook.md --endpoint ... --dry-run
# Auth via env var (recommended for CI)
DATABOOK_FUSEKI_AUTH="Basic YWRtaW46cGFzc3dvcmQ=" \
databook push file.databook.md --endpoint http://host/ds/sparqlNamed graph assignment (priority order):
--graph <iri>(single-block only)frontmatter.graph.named_graph(single-block documents only)- Fragment-addressing rule:
{document.id}#{block-id}
GSP endpoint inference: /sparql → /data, /query → /data. Override with --gsp-endpoint for non-Fuseki stores.
Retrieves RDF from a SPARQL endpoint into a DataBook. Operates in three modes:
| Mode | Trigger | Protocol |
|---|---|---|
| Named graph fetch | Default | GSP GET |
| External query | --query <file> |
SPARQL POST |
| Fragment-ref | --fragment <block-id> |
SPARQL POST using embedded block |
# Fetch named graph to stdout
databook pull sensors.databook.md \
--endpoint http://localhost:3030/ds/sparql
# Fetch specific graph IRI
databook pull sensors.databook.md \
--endpoint http://localhost:3030/ds/sparql \
--graph https://example.org/sensors
# Execute embedded SPARQL block and replace data block in-place
databook pull sensors.databook.md \
--endpoint http://localhost:3030/ds/sparql \
--fragment sensor-construct \
--block-id sensor-graph \
--stats \
--out sensors.databook.md # same path = atomic in-place update
# External .sparql/.rq file
databook pull onto.databook.md \
--endpoint http://localhost:3030/ds/sparql \
--query queries/extract.sparql \
-o result.ttl
# Dry-run shows the extracted SPARQL query
databook pull sensors.databook.md \
--fragment sensor-construct \
--dry-run--stats recomputes graph.triple_count and graph.subjects in frontmatter using N3.js after a successful Turtle/TriG pull.
Executes a processor-registry DataBook as a DAG pipeline against a source DataBook. Supports:
- Full pipeline mode (
-P <process-databook>): Multi-stage DAG withbuild:dependsOnordering - Single-operation shorthand:
--sparql,--shapes,--xslt,--xquery
Supported processors (configured via processors.toml):
| Type | Tool | processors.toml key |
|---|---|---|
sparql |
Apache Jena ARQ | jena-sparql |
shacl |
Apache Jena SHACL | jena-shacl |
xslt |
Saxon HE 12 | saxon-xslt |
xquery |
Saxon HE 12 | saxon-xquery |
sparql-anything |
SPARQL Anything 0.9 | sparql-anything |
# Full pipeline
databook process source.databook.md \
-P pipeline.databook.md \
-o output.databook.md
# Single SPARQL CONSTRUCT
databook process source.databook.md \
--sparql queries.databook.md#construct-graph \
-o output.databook.md
# Single SHACL validation
databook process source.databook.md \
--shapes shapes.databook.md#person-shapes \
-o report.databook.md
# With VALUES parameter injection
databook process source.databook.md \
--sparql queries.databook.md#typed-query \
--params '{"type":"ex:Person"}' \
-o people.databook.md
# Dry-run shows execution plan
databook process source.databook.md -P pipeline.databook.md --dry-runDAG execution model: Stages are topologically sorted by build:dependsOn edges. Within a topological layer, build:order is the tiebreaker.
Declares deployment details for processors and endpoints. Three-layer discovery chain (later layers override earlier):
{package}/processors.default.toml ← shipped template (read-only)
~/.config/databook/processors.toml ← user-level
{project}/.databook/processors.toml ← project-level
Do not commit
processors.tomlto version control. Add it to.gitignore.
Example:
[default_endpoint]
sparql = "http://localhost:3030/ds/sparql"
[endpoints."http://localhost:3030"]
auth = "Basic YWRtaW46cGFzc3dvcmQ="
[processor."https://w3id.org/databook/plugins/core#jena-sparql"]
command = "/usr/local/jena/bin/sparql"
version = "6.0.0"
jvm_flags = "-Xmx4g"
[processor."https://w3id.org/databook/plugins/core#jena-shacl"]
command = "/usr/local/jena/bin/shacl"
version = "6.0.0"
jvm_flags = "-Xmx4g"
[processor."https://w3id.org/databook/plugins/core#saxon-xslt"]
jar = "/usr/local/lib/saxon-he-12.jar"
version = "12.0"
jvm_flags = "-Xmx2g"See processors.default.toml for the full schema with all supported keys.
The auth credential is resolved in priority order:
--auth <credential>flagDATABOOK_FUSEKI_AUTHenvironment variableprocessors.toml[endpoints."<url>"].author.auth_env
Credential forms: Basic <base64>, Bearer <token>, or bare <base64> (auto-prefixed with Basic).
| Code | Meaning |
|---|---|
0 |
Success |
1 |
Runtime error (partial failure, processor error) |
2 |
Usage / configuration error (bad args, missing flags) |
3 |
Authentication failure (401 / 403) |
4 |
Endpoint unreachable (connection refused, DNS failure) |
5 |
Empty result — query succeeded but returned no triples/rows |
All commands are POSIX-composable:
# Inspect output of a process pipeline
databook process source.databook.md -P pipeline.databook.md \
| databook head --format json
# Round-trip: push, transform in store, pull back
databook push sensors.databook.md --endpoint http://host/ds/sparql
# ... external store transformations ...
databook pull sensors.databook.md \
--endpoint http://host/ds/sparql \
--fragment sensor-construct \
--block-id sensor-graph \
--stats \
--out sensors.databook.mdAny payload reference uses the form {document}#{block-id}:
# Relative path
--sparql queries.databook.md#construct-graph
# Absolute path
--sparql /data/queries.databook.md#construct-graph
# Current document (same-document fragment)
--fragment sensor-construct
# Full IRI
--sparql https://example.org/queries-v1#select-all| Package | Role |
|---|---|
commander |
CLI argument parsing |
js-yaml |
YAML frontmatter parsing |
@iarna/toml |
processors.toml parsing |
n3 |
Turtle parsing for stats; process DataBook catalogue parsing |
Requires Node.js ≥ 18 for native fetch.
- Conventions — stdin/stdout, fragment addressing,
processors.toml, exit codes - head spec
- push spec
- pull spec
- process spec