Complete guide for storing logs, metrics, traces, and audit logs in ClickHouse.
TelemetryFlow Core uses ClickHouse as a high-performance storage backend for:
- Audit Logs - IAM audit trail (user actions, entity changes)
- Application Logs - All application and infrastructure logs
- Metrics - Performance and business metrics
- Traces - Distributed tracing data (OpenTelemetry)
Application → ClickHouse Client → ClickHouse
Winston Logger → ClickHouse Transport → ClickHouse
OTEL Collector → ClickHouse Exporter → ClickHouse
Located in src/database/clickhouse/migrations/:
| Migration | Description | Tables/Views |
|---|---|---|
1704240000001-CreateAuditLogsTable.ts |
Audit logs with materialized views | audit_logs, audit_logs_stats, audit_logs_user_activity |
1704240000002-CreateLogsTable.ts |
Application logs with error tracking | logs, logs_stats, logs_errors |
1704240000003-CreateMetricsTable.ts |
Metrics with 1m/1h aggregations | metrics, metrics_1m, metrics_1h |
1704240000004-CreateTracesTable.ts |
Distributed traces with statistics | traces, traces_stats, traces_errors |
Stores IAM audit trail for user actions and entity changes.
Retention: 90 days Partition: Monthly (YYYYMM)
Key Columns:
id- UUID (auto-generated)timestamp- Event timestamp (DateTime64)user_id,user_email,user_first_name,user_last_name- User infoevent_type- AUTH, AUTHZ, DATA, SYSTEMaction- Action performed (e.g., CREATE, UPDATE, DELETE)resource- Resource affected (e.g., users, roles)result- SUCCESS, FAILURE, DENIEDerror_message- Error details if failedip_address,user_agent- Request metadatatenant_id,workspace_id,organization_id- Multi-tenancysession_id- Session trackingduration_ms- Operation duration
Materialized Views:
audit_logs_stats- Statistics by event type and resultaudit_logs_user_activity- User activity summary
Stores application and infrastructure logs.
Retention: 30 days Partition: Daily (YYYYMMDD)
Columns:
timestamp- Log timestampobserved_timestamp- Ingestion timestamptrace_id- Trace correlation IDspan_id- Span correlation IDseverity_text- Log level (ERROR, WARN, INFO, etc.)severity_number- Numeric severity (1-21)service_name- Service identifierorganization_id,workspace_id,tenant_id- Multi-tenancybody- Log messageresource_attributes- Resource metadatalog_attributes- Additional attributes
Materialized Views:
logs_stats- Log statistics by service and severitylogs_errors- Error logs only (severity >= 17)
Stores performance and business metrics.
Retention: 90 days Partition: Daily (YYYYMMDD)
Columns:
timestamp- Metric timestampmetric_name- Metric identifiermetric_type- gauge, counter, histogram, summaryvalue- Metric valueservice_name- Service identifierorganization_id,workspace_id,tenant_id- Multi-tenancyresource_attributes- Resource metadatametric_attributes- Metric labelsunit- Measurement unit
Materialized Views:
metrics_1m- 1-minute aggregationsmetrics_1h- 1-hour aggregations
Stores distributed tracing spans.
Retention: 7 days Partition: Daily (YYYYMMDD)
Columns:
timestamp- Span timestamptrace_id- Trace identifierspan_id- Span identifierparent_span_id- Parent span IDspan_name- Operation namespan_kind- INTERNAL, SERVER, CLIENT, etc.service_name- Service identifierorganization_id,workspace_id,tenant_id- Multi-tenancystatus_code- UNSET, OK, ERRORduration_ns- Span duration in nanosecondsresource_attributes- Resource metadataspan_attributes- Span metadata
Materialized Views:
traces_stats- Trace statistics by servicetraces_errors- Error traces only
Stores IAM audit trail.
Retention: 90 days Partition: Monthly (YYYYMM)
# Run all ClickHouse migrations (recommended)
pnpm db:migrate:clickhouse
# Run all migrations (PostgreSQL + ClickHouse)
pnpm db:migrate
# Run migrations + seeds
pnpm db:migrate:seedMigrations are TypeScript files that use @clickhouse/client and are located in:
src/database/clickhouse/migrations/
Each migration exports up() and down() functions for schema changes.
# Run all ClickHouse seeds
pnpm db:seed:clickhouse
# Run all seeds (PostgreSQL + ClickHouse)
pnpm db:seedSeeds are located in:
src/database/clickhouse/seeds/
Sample data includes:
- 5 audit log entries
- 240 metrics (last 1 hour)
- 30 trace spans (10 traces, last 30 minutes)
# ClickHouse Configuration
CLICKHOUSE_HOST=172.151.151.40
CLICKHOUSE_PORT=8123
CLICKHOUSE_DB=telemetryflow_db
CLICKHOUSE_USER=default
CLICKHOUSE_PASSWORD=telemetryflow123# Check ClickHouse is running
docker ps | grep clickhouse
# Check tables exist
docker exec telemetryflow_core_clickhouse clickhouse-client \
--query "SHOW TABLES FROM telemetryflow_db"
# Expected output:
# audit_logs
# audit_logs_stats
# audit_logs_user_activity
# logs
# logs_errors
# logs_stats
# metrics
# metrics_1h
# metrics_1m
# traces
# traces_errors
# traces_statsimport { createClient } from '@clickhouse/client';
const client = createClient({
url: `http://${process.env.CLICKHOUSE_HOST}:${process.env.CLICKHOUSE_PORT}`,
username: process.env.CLICKHOUSE_USER,
password: process.env.CLICKHOUSE_PASSWORD,
});
// Insert audit log
await client.insert({
table: 'telemetryflow_db.audit_logs',
values: [{
timestamp: new Date().toISOString(),
user_id: 'user-123',
user_email: 'user@example.com',
event_type: 'DATA',
action: 'CREATE',
resource: 'users',
result: 'SUCCESS',
tenant_id: 'tenant-123',
organization_id: 'org-123',
}],
format: 'JSONEachRow',
});
// Query logs
const result = await client.query({
query: `
SELECT * FROM telemetryflow_db.logs
WHERE severity_text = 'ERROR'
AND timestamp >= now() - INTERVAL 1 HOUR
ORDER BY timestamp DESC
LIMIT 100
`,
format: 'JSONEachRow',
});
const logs = await result.json();-- Recent audit events
SELECT
timestamp,
user_email,
event_type,
action,
resource,
result
FROM telemetryflow_db.audit_logs
WHERE timestamp >= now() - INTERVAL 1 HOUR
ORDER BY timestamp DESC
LIMIT 100;
-- Failed operations
SELECT
timestamp,
user_email,
action,
resource,
error_message
FROM telemetryflow_db.audit_logs
WHERE result = 'FAILURE'
AND timestamp >= today()
ORDER BY timestamp DESC;
-- User activity summary
SELECT
user_email,
event_type,
count() AS event_count
FROM telemetryflow_db.audit_logs
WHERE timestamp >= now() - INTERVAL 24 HOUR
GROUP BY user_email, event_type
ORDER BY event_count DESC;
-- Audit statistics (materialized view)
SELECT * FROM telemetryflow_db.audit_logs_stats
WHERE date = today()
ORDER BY event_count DESC;-- Recent error logs
SELECT timestamp, service_name, body, trace_id
FROM telemetryflow_db.logs
WHERE severity_text = 'ERROR'
AND timestamp >= now() - INTERVAL 1 HOUR
ORDER BY timestamp DESC
LIMIT 100;
-- Logs by organization
SELECT timestamp, severity_text, body
FROM telemetryflow_db.logs
WHERE organization_id = 'org-123'
AND timestamp >= today()
ORDER BY timestamp DESC;
-- Log statistics
SELECT
toStartOfHour(timestamp) AS hour,
service_name,
severity_text,
count() AS count
FROM telemetryflow_db.logs
WHERE timestamp >= now() - INTERVAL 24 HOUR
GROUP BY hour, service_name, severity_text
ORDER BY hour DESC;
-- Error logs (materialized view)
SELECT * FROM telemetryflow_db.logs_errors
WHERE date = today()
ORDER BY timestamp DESC;-- Metric values over time
SELECT
toStartOfMinute(timestamp) AS minute,
metric_name,
avg(value) AS avg_value,
max(value) AS max_value
FROM telemetryflow_db.metrics
WHERE metric_name = 'http_requests_total'
AND timestamp >= now() - INTERVAL 1 HOUR
GROUP BY minute, metric_name
ORDER BY minute DESC;
-- Aggregated metrics (1-minute)
SELECT
timestamp_1m,
metric_name,
avgMerge(avg_value) AS avg,
maxMerge(max_value) AS max
FROM telemetryflow_db.metrics_1m
WHERE timestamp_1m >= now() - INTERVAL 1 HOUR
GROUP BY timestamp_1m, metric_name
ORDER BY timestamp_1m DESC;
-- Aggregated metrics (1-hour)
SELECT
timestamp_1h,
metric_name,
avgMerge(avg_value) AS avg,
maxMerge(max_value) AS max
FROM telemetryflow_db.metrics_1h
WHERE timestamp_1h >= now() - INTERVAL 24 HOUR
GROUP BY timestamp_1h, metric_name
ORDER BY timestamp_1h DESC;-- Slow traces
SELECT
timestamp,
trace_id,
span_name,
duration_ns / 1000000 AS duration_ms
FROM telemetryflow_db.traces
WHERE duration_ns > 1000000000 -- > 1 second
AND timestamp >= now() - INTERVAL 1 HOUR
ORDER BY duration_ns DESC
LIMIT 100;
-- Error traces (materialized view)
SELECT *
FROM telemetryflow_db.traces_errors
WHERE date = today()
ORDER BY timestamp DESC;
-- Trace statistics
SELECT
service_name,
span_name,
count() AS count,
avg(duration_ns) / 1000000 AS avg_duration_ms,
max(duration_ns) / 1000000 AS max_duration_ms
FROM telemetryflow_db.traces
WHERE timestamp >= now() - INTERVAL 1 HOUR
GROUP BY service_name, span_name
ORDER BY count DESC;
-- Trace statistics (materialized view)
SELECT * FROM telemetryflow_db.traces_stats
WHERE date = today()
ORDER BY span_count DESC;Always use batch inserts for better performance:
import { createClient } from '@clickhouse/client';
const client = createClient({
url: `http://${process.env.CLICKHOUSE_HOST}:${process.env.CLICKHOUSE_PORT}`,
username: process.env.CLICKHOUSE_USER,
password: process.env.CLICKHOUSE_PASSWORD,
});
// Good - Batch insert
const logs = [
{ timestamp: new Date(), severity_text: 'INFO', body: 'Log 1' },
{ timestamp: new Date(), severity_text: 'INFO', body: 'Log 2' },
// ... 100 logs
];
await client.insert({
table: 'telemetryflow_db.logs',
values: logs,
format: 'JSONEachRow',
});
// Bad - Individual inserts (slow!)
for (const log of logs) {
await client.insert({
table: 'telemetryflow_db.logs',
values: [log],
format: 'JSONEachRow',
});
}Use materialized views for pre-aggregated data:
-- Query pre-aggregated 1-minute metrics (fast)
SELECT * FROM telemetryflow_db.metrics_1m
WHERE timestamp_1m >= now() - INTERVAL 1 HOUR;
-- Instead of aggregating raw data (slow)
SELECT toStartOfMinute(timestamp), avg(value)
FROM telemetryflow_db.metrics
WHERE timestamp >= now() - INTERVAL 1 HOUR
GROUP BY toStartOfMinute(timestamp);| Table | Materialized View | Purpose |
|---|---|---|
| audit_logs | audit_logs_stats | Event statistics by type and result |
| audit_logs | audit_logs_user_activity | User activity summary |
| logs | logs_stats | Log statistics by service and severity |
| logs | logs_errors | Error logs only (severity >= 17) |
| metrics | metrics_1m | 1-minute aggregations |
| metrics | metrics_1h | 1-hour aggregations |
| traces | traces_stats | Trace statistics by service |
| traces | traces_errors | Error traces only |
SELECT
table,
formatReadableSize(sum(bytes)) AS size,
sum(rows) AS rows
FROM system.parts
WHERE database = 'telemetryflow_db'
AND active
GROUP BY table
ORDER BY sum(bytes) DESC;SELECT
table,
partition,
sum(rows) AS rows,
formatReadableSize(sum(bytes)) AS size
FROM system.parts
WHERE database = 'telemetryflow_db'
AND active
GROUP BY table, partition
ORDER BY table, partition DESC;SELECT
table,
partition,
min(min_date) AS oldest_data,
max(max_date) AS newest_data
FROM system.parts
WHERE database = 'telemetryflow_db'
AND active
GROUP BY table, partition
ORDER BY table, oldest_data;Error: container telemetryflow_core_clickhouse is unhealthy
Cause: Old incompatible data from ClickHouse version < 20.7
Solution:
# Stop container
docker stop telemetryflow_core_clickhouse
# Clean data directory
sudo rm -rf /opt/data/docker/telemetryflow-core/clickhouse/*
# Recreate directories with proper permissions
sudo mkdir -p /opt/data/docker/telemetryflow-core/clickhouse/{data,logs}
sudo chown -R 101:101 /opt/data/docker/telemetryflow-core/clickhouse
sudo chmod -R 777 /opt/data/docker/telemetryflow-core/clickhouse
# Start container
docker start telemetryflow_core_clickhouse
# Wait for healthy status
sleep 10 && docker ps --filter name=clickhouse
# Re-run migrations
pnpm db:migrate:clickhouse-
Check ClickHouse is running:
docker ps | grep clickhouse -
Check connection:
docker exec telemetryflow_core_clickhouse clickhouse-client --query "SELECT 1"
-
Verify environment variables:
grep CLICKHOUSE_ .env
-
Run migrations manually:
pnpm db:migrate:clickhouse
-
Check migrations ran successfully:
docker exec telemetryflow_core_clickhouse clickhouse-client \ --query "SHOW TABLES FROM telemetryflow_db"
-
Expected tables:
- audit_logs, audit_logs_stats, audit_logs_user_activity
- logs, logs_stats, logs_errors
- metrics, metrics_1m, metrics_1h
- traces, traces_stats, traces_errors
-
If missing, re-run migrations:
pnpm db:migrate:clickhouse
Error: mkdir: cannot create directory '/var/lib/clickhouse/': Permission denied
Solution:
# Fix directory permissions
sudo chown -R 101:101 /opt/data/docker/telemetryflow-core/clickhouse
sudo chmod -R 777 /opt/data/docker/telemetryflow-core/clickhouse
# Restart container
docker restart telemetryflow_core_clickhouse- Check table sizes (see Monitoring section)
- Verify TTL is working (old data being deleted)
- Consider reducing retention periods in migrations
- Use materialized views for aggregations
- Add appropriate indexes
- Partition by time for better performance
- Use
LIMITclause to restrict result sets
- ClickHouse Documentation
- ClickHouse Node.js Client
- ClickHouse SQL Reference
- ClickHouse MergeTree Engine
- ClickHouse Materialized Views
- Last Updated: 2025-12-05
- Retention Policies: Audit Logs (90d), Logs (30d), Metrics (90d), Traces (7d)