Aurora DSQL Data Loader

Fast, parallel data loader for Aurora DSQL. Load CSV, TSV, and Parquet files into DSQL with automatic schema detection and progress tracking.

Migrating from Python v1? See CHANGELOG.md.

Quick Start

1. Install

Download pre-built binary: Latest releases

Or build from source:

# Install Rust (if needed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Clone and install
git clone https://github.com/aws-samples/aurora-dsql-loader.git
cd aurora-dsql-loader
cargo install --path .

2. Configure AWS credentials

aws configure
# or
aws sso login

3. Load data

aurora-dsql-loader load \
  --endpoint your-cluster.dsql.us-east-1.on.aws \
  --source-uri data.csv \
  --table my_table

That's it! The loader will:

Auto-detect the file format (CSV, TSV, or Parquet)
Infer the schema from your data
Load data in parallel with progress tracking
Handle retries and errors automatically

Common Examples

Load from S3:

aurora-dsql-loader load \
  --endpoint your-cluster.dsql.us-east-1.on.aws \
  --source-uri s3://my-bucket/data.parquet \
  --table analytics_data

Create table automatically:

aurora-dsql-loader load \
  --endpoint your-cluster.dsql.us-east-1.on.aws \
  --source-uri data.csv \
  --table new_table \
  --if-not-exists

Validate without loading:

aurora-dsql-loader load \
  --endpoint your-cluster.dsql.us-east-1.on.aws \
  --source-uri data.csv \
  --table my_table \
  --dry-run

Use DB-side defaults (e.g. server-generated UUIDs):

aurora-dsql-loader load \
  --endpoint your-cluster.dsql.us-east-1.on.aws \
  --source-uri data.csv \
  --table my_table \
  --exclude-columns pk_id,created_at

Listed columns are dropped from the INSERT so DSQL applies the column's DEFAULT expression (e.g. gen_random_uuid(), CURRENT_TIMESTAMP). Source records must still contain these columns in their original positions.

Key Features

Fast: Parallel loading with configurable workers
Smart: Auto-detects file format and region
Reliable: Automatic retries and fault-tolerant loading
Flexible: Works with local files or S3 URIs
Formats: CSV, TSV, and Parquet support

Resuming Failed Loads

If a load fails mid-execution, resume from where it left off using --resume-job-id:

# Initial load (note the Job ID in output)
aurora-dsql-loader load \
  --endpoint your-cluster.dsql.us-east-1.on.aws \
  --source-uri large-file.csv \
  --table my_table \
  --manifest-dir ./my-load-manifest

# Resume after failure
aurora-dsql-loader load \
  --endpoint your-cluster.dsql.us-east-1.on.aws \
  --source-uri large-file.csv \
  --table my_table \
  --manifest-dir ./my-load-manifest \
  --resume-job-id abc-123-def-456

Resume automatically retries failed chunks and skips completed ones. For safety against duplicates on retry, use unique constraints on your table—the loader will use ON CONFLICT DO NOTHING to skip duplicates.

Performance Tuning

The loader parallelizes work on two axes. Total in-flight INSERTs ≈ workers × batch-concurrency.

Flag	Default	Effect
`--workers`	`8`	Worker threads competing for file chunks
`--batch-concurrency`	`32`	Concurrent INSERT batches per worker
`--batch-size`	`2000`	Records per INSERT statement
`--chunk-size`	`10MB`	File is split into chunks of this size; workers claim one chunk at a time

To load faster, raise --workers and/or --batch-concurrency. For very large files, a smaller --chunk-size (e.g. 5MB) produces more chunks and lets workers balance load better.

aurora-dsql-loader load \
  --endpoint your-cluster.dsql.us-east-1.on.aws \
  --source-uri big.csv \
  --table my_table \
  --workers 16 \
  --batch-concurrency 64 \
  --chunk-size 5MB

Watch for:

Parameter limit: DSQL caps a statement at 65,535 parameters, so batch-size × column count must stay below that. If you see too many arguments for query, the loader will print a suggested smaller --batch-size.
Rate limits: pushing concurrency very high can get requests throttled by DSQL. Back off --workers or --batch-concurrency if you see retry-heavy behavior.

Requirements

Rust: 1.85 or later (for building)
AWS: Aurora DSQL cluster with dsql:DbConnectAdmin or dsql:DbConnect permissions
Credentials: Configured via AWS CLI, SSO, or IAM role

Options

See all options with:

aurora-dsql-loader load --help

Troubleshooting

Authentication error?

aws sts get-caller-identity  # Verify credentials

Build error?

rustup update stable  # Update Rust

Connection error? See Aurora DSQL Troubleshooting

Contributing

See CONTRIBUTING.md for development setup and guidelines.

License

MIT-0 License. See LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.cargo		.cargo
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aurora DSQL Data Loader

Quick Start

1. Install

2. Configure AWS credentials

3. Load data

Common Examples

Key Features

Resuming Failed Loads

Performance Tuning

Requirements

Options

Troubleshooting

Contributing

License

About

Uh oh!

Releases 10

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Aurora DSQL Data Loader

Quick Start

1. Install

2. Configure AWS credentials

3. Load data

Common Examples

Key Features

Resuming Failed Loads

Performance Tuning

Requirements

Options

Troubleshooting

Contributing

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages