ClickHouse Logging - TelemetryFlow Core

Complete guide for storing logs, metrics, traces, and audit logs in ClickHouse.

Overview

TelemetryFlow Core uses ClickHouse as a high-performance storage backend for:

Audit Logs - IAM audit trail (user actions, entity changes)
Application Logs - All application and infrastructure logs
Metrics - Performance and business metrics
Traces - Distributed tracing data (OpenTelemetry)

Architecture

Application → ClickHouse Client → ClickHouse
Winston Logger → ClickHouse Transport → ClickHouse
OTEL Collector → ClickHouse Exporter → ClickHouse

Database Schema

Migration Files

Located in src/database/clickhouse/migrations/:

Migration	Description	Tables/Views
`1704240000001-CreateAuditLogsTable.ts`	Audit logs with materialized views	audit_logs, audit_logs_stats, audit_logs_user_activity
`1704240000002-CreateLogsTable.ts`	Application logs with error tracking	logs, logs_stats, logs_errors
`1704240000003-CreateMetricsTable.ts`	Metrics with 1m/1h aggregations	metrics, metrics_1m, metrics_1h
`1704240000004-CreateTracesTable.ts`	Distributed traces with statistics	traces, traces_stats, traces_errors

1. Audit Logs Table

Stores IAM audit trail for user actions and entity changes.

Retention: 90 days Partition: Monthly (YYYYMM)

Key Columns:

id - UUID (auto-generated)
timestamp - Event timestamp (DateTime64)
user_id, user_email, user_first_name, user_last_name - User info
event_type - AUTH, AUTHZ, DATA, SYSTEM
action - Action performed (e.g., CREATE, UPDATE, DELETE)
resource - Resource affected (e.g., users, roles)
result - SUCCESS, FAILURE, DENIED
error_message - Error details if failed
ip_address, user_agent - Request metadata
tenant_id, workspace_id, organization_id - Multi-tenancy
session_id - Session tracking
duration_ms - Operation duration

Materialized Views:

audit_logs_stats - Statistics by event type and result
audit_logs_user_activity - User activity summary

2. Logs Table

Stores application and infrastructure logs.

Retention: 30 days Partition: Daily (YYYYMMDD)

Columns:

timestamp - Log timestamp
observed_timestamp - Ingestion timestamp
trace_id - Trace correlation ID
span_id - Span correlation ID
severity_text - Log level (ERROR, WARN, INFO, etc.)
severity_number - Numeric severity (1-21)
service_name - Service identifier
organization_id, workspace_id, tenant_id - Multi-tenancy
body - Log message
resource_attributes - Resource metadata
log_attributes - Additional attributes

Materialized Views:

logs_stats - Log statistics by service and severity
logs_errors - Error logs only (severity >= 17)

2. Metrics Table

Stores performance and business metrics.

Retention: 90 days Partition: Daily (YYYYMMDD)

Columns:

timestamp - Metric timestamp
metric_name - Metric identifier
metric_type - gauge, counter, histogram, summary
value - Metric value
service_name - Service identifier
organization_id, workspace_id, tenant_id - Multi-tenancy
resource_attributes - Resource metadata
metric_attributes - Metric labels
unit - Measurement unit

Materialized Views:

metrics_1m - 1-minute aggregations
metrics_1h - 1-hour aggregations

3. Traces Table

Stores distributed tracing spans.

Retention: 7 days Partition: Daily (YYYYMMDD)

Columns:

timestamp - Span timestamp
trace_id - Trace identifier
span_id - Span identifier
parent_span_id - Parent span ID
span_name - Operation name
span_kind - INTERNAL, SERVER, CLIENT, etc.
service_name - Service identifier
organization_id, workspace_id, tenant_id - Multi-tenancy
status_code - UNSET, OK, ERROR
duration_ns - Span duration in nanoseconds
resource_attributes - Resource metadata
span_attributes - Span metadata

Materialized Views:

traces_stats - Trace statistics by service
traces_errors - Error traces only

4. Audit Logs Table

Stores IAM audit trail.

Retention: 90 days Partition: Monthly (YYYYMM)

See 001-audit-logs.sql

Setup

1. Run Migrations

# Run all ClickHouse migrations (recommended)
pnpm db:migrate:clickhouse

# Run all migrations (PostgreSQL + ClickHouse)
pnpm db:migrate

# Run migrations + seeds
pnpm db:migrate:seed

Migrations are TypeScript files that use @clickhouse/client and are located in:

src/database/clickhouse/migrations/

Each migration exports up() and down() functions for schema changes.

2. Seed Sample Data (Optional)

# Run all ClickHouse seeds
pnpm db:seed:clickhouse

# Run all seeds (PostgreSQL + ClickHouse)
pnpm db:seed

Seeds are located in:

src/database/clickhouse/seeds/

Sample data includes:

5 audit log entries
240 metrics (last 1 hour)
30 trace spans (10 traces, last 30 minutes)

3. Configure Environment

# ClickHouse Configuration
CLICKHOUSE_HOST=172.151.151.40
CLICKHOUSE_PORT=8123
CLICKHOUSE_DB=telemetryflow_db
CLICKHOUSE_USER=default
CLICKHOUSE_PASSWORD=telemetryflow123

4. Verify Setup

# Check ClickHouse is running
docker ps | grep clickhouse

# Check tables exist
docker exec telemetryflow_core_clickhouse clickhouse-client \
  --query "SHOW TABLES FROM telemetryflow_db"

# Expected output:
# audit_logs
# audit_logs_stats
# audit_logs_user_activity
# logs
# logs_errors
# logs_stats
# metrics
# metrics_1h
# metrics_1m
# traces
# traces_errors
# traces_stats

Usage

Direct ClickHouse Client

import { createClient } from '@clickhouse/client';

const client = createClient({
  url: `http://${process.env.CLICKHOUSE_HOST}:${process.env.CLICKHOUSE_PORT}`,
  username: process.env.CLICKHOUSE_USER,
  password: process.env.CLICKHOUSE_PASSWORD,
});

// Insert audit log
await client.insert({
  table: 'telemetryflow_db.audit_logs',
  values: [{
    timestamp: new Date().toISOString(),
    user_id: 'user-123',
    user_email: 'user@example.com',
    event_type: 'DATA',
    action: 'CREATE',
    resource: 'users',
    result: 'SUCCESS',
    tenant_id: 'tenant-123',
    organization_id: 'org-123',
  }],
  format: 'JSONEachRow',
});

// Query logs
const result = await client.query({
  query: `
    SELECT * FROM telemetryflow_db.logs
    WHERE severity_text = 'ERROR'
    AND timestamp >= now() - INTERVAL 1 HOUR
    ORDER BY timestamp DESC
    LIMIT 100
  `,
  format: 'JSONEachRow',
});

const logs = await result.json();

Querying Data

Query Audit Logs

-- Recent audit events
SELECT
  timestamp,
  user_email,
  event_type,
  action,
  resource,
  result
FROM telemetryflow_db.audit_logs
WHERE timestamp >= now() - INTERVAL 1 HOUR
ORDER BY timestamp DESC
LIMIT 100;

-- Failed operations
SELECT
  timestamp,
  user_email,
  action,
  resource,
  error_message
FROM telemetryflow_db.audit_logs
WHERE result = 'FAILURE'
  AND timestamp >= today()
ORDER BY timestamp DESC;

-- User activity summary
SELECT
  user_email,
  event_type,
  count() AS event_count
FROM telemetryflow_db.audit_logs
WHERE timestamp >= now() - INTERVAL 24 HOUR
GROUP BY user_email, event_type
ORDER BY event_count DESC;

-- Audit statistics (materialized view)
SELECT * FROM telemetryflow_db.audit_logs_stats
WHERE date = today()
ORDER BY event_count DESC;

Query Logs

-- Recent error logs
SELECT timestamp, service_name, body, trace_id
FROM telemetryflow_db.logs
WHERE severity_text = 'ERROR'
  AND timestamp >= now() - INTERVAL 1 HOUR
ORDER BY timestamp DESC
LIMIT 100;

-- Logs by organization
SELECT timestamp, severity_text, body
FROM telemetryflow_db.logs
WHERE organization_id = 'org-123'
  AND timestamp >= today()
ORDER BY timestamp DESC;

-- Log statistics
SELECT
  toStartOfHour(timestamp) AS hour,
  service_name,
  severity_text,
  count() AS count
FROM telemetryflow_db.logs
WHERE timestamp >= now() - INTERVAL 24 HOUR
GROUP BY hour, service_name, severity_text
ORDER BY hour DESC;

-- Error logs (materialized view)
SELECT * FROM telemetryflow_db.logs_errors
WHERE date = today()
ORDER BY timestamp DESC;

Query Metrics

-- Metric values over time
SELECT
  toStartOfMinute(timestamp) AS minute,
  metric_name,
  avg(value) AS avg_value,
  max(value) AS max_value
FROM telemetryflow_db.metrics
WHERE metric_name = 'http_requests_total'
  AND timestamp >= now() - INTERVAL 1 HOUR
GROUP BY minute, metric_name
ORDER BY minute DESC;

-- Aggregated metrics (1-minute)
SELECT
  timestamp_1m,
  metric_name,
  avgMerge(avg_value) AS avg,
  maxMerge(max_value) AS max
FROM telemetryflow_db.metrics_1m
WHERE timestamp_1m >= now() - INTERVAL 1 HOUR
GROUP BY timestamp_1m, metric_name
ORDER BY timestamp_1m DESC;

-- Aggregated metrics (1-hour)
SELECT
  timestamp_1h,
  metric_name,
  avgMerge(avg_value) AS avg,
  maxMerge(max_value) AS max
FROM telemetryflow_db.metrics_1h
WHERE timestamp_1h >= now() - INTERVAL 24 HOUR
GROUP BY timestamp_1h, metric_name
ORDER BY timestamp_1h DESC;

Query Traces

-- Slow traces
SELECT
  timestamp,
  trace_id,
  span_name,
  duration_ns / 1000000 AS duration_ms
FROM telemetryflow_db.traces
WHERE duration_ns > 1000000000 -- > 1 second
  AND timestamp >= now() - INTERVAL 1 HOUR
ORDER BY duration_ns DESC
LIMIT 100;

-- Error traces (materialized view)
SELECT *
FROM telemetryflow_db.traces_errors
WHERE date = today()
ORDER BY timestamp DESC;

-- Trace statistics
SELECT
  service_name,
  span_name,
  count() AS count,
  avg(duration_ns) / 1000000 AS avg_duration_ms,
  max(duration_ns) / 1000000 AS max_duration_ms
FROM telemetryflow_db.traces
WHERE timestamp >= now() - INTERVAL 1 HOUR
GROUP BY service_name, span_name
ORDER BY count DESC;

-- Trace statistics (materialized view)
SELECT * FROM telemetryflow_db.traces_stats
WHERE date = today()
ORDER BY span_count DESC;

Performance Optimization

Batch Inserts

Always use batch inserts for better performance:

import { createClient } from '@clickhouse/client';

const client = createClient({
  url: `http://${process.env.CLICKHOUSE_HOST}:${process.env.CLICKHOUSE_PORT}`,
  username: process.env.CLICKHOUSE_USER,
  password: process.env.CLICKHOUSE_PASSWORD,
});

// Good - Batch insert
const logs = [
  { timestamp: new Date(), severity_text: 'INFO', body: 'Log 1' },
  { timestamp: new Date(), severity_text: 'INFO', body: 'Log 2' },
  // ... 100 logs
];

await client.insert({
  table: 'telemetryflow_db.logs',
  values: logs,
  format: 'JSONEachRow',
});

// Bad - Individual inserts (slow!)
for (const log of logs) {
  await client.insert({
    table: 'telemetryflow_db.logs',
    values: [log],
    format: 'JSONEachRow',
  });
}

Materialized Views

Use materialized views for pre-aggregated data:

-- Query pre-aggregated 1-minute metrics (fast)
SELECT * FROM telemetryflow_db.metrics_1m
WHERE timestamp_1m >= now() - INTERVAL 1 HOUR;

-- Instead of aggregating raw data (slow)
SELECT toStartOfMinute(timestamp), avg(value)
FROM telemetryflow_db.metrics
WHERE timestamp >= now() - INTERVAL 1 HOUR
GROUP BY toStartOfMinute(timestamp);

Available Materialized Views

Table	Materialized View	Purpose
audit_logs	audit_logs_stats	Event statistics by type and result
audit_logs	audit_logs_user_activity	User activity summary
logs	logs_stats	Log statistics by service and severity
logs	logs_errors	Error logs only (severity >= 17)
metrics	metrics_1m	1-minute aggregations
metrics	metrics_1h	1-hour aggregations
traces	traces_stats	Trace statistics by service
traces	traces_errors	Error traces only

Monitoring

Check Table Sizes

SELECT
  table,
  formatReadableSize(sum(bytes)) AS size,
  sum(rows) AS rows
FROM system.parts
WHERE database = 'telemetryflow_db'
  AND active
GROUP BY table
ORDER BY sum(bytes) DESC;

Check Partitions

SELECT
  table,
  partition,
  sum(rows) AS rows,
  formatReadableSize(sum(bytes)) AS size
FROM system.parts
WHERE database = 'telemetryflow_db'
  AND active
GROUP BY table, partition
ORDER BY table, partition DESC;

TTL Status

SELECT
  table,
  partition,
  min(min_date) AS oldest_data,
  max(max_date) AS newest_data
FROM system.parts
WHERE database = 'telemetryflow_db'
  AND active
GROUP BY table, partition
ORDER BY table, oldest_data;

Troubleshooting

ClickHouse Container Unhealthy

Error: container telemetryflow_core_clickhouse is unhealthy

Cause: Old incompatible data from ClickHouse version < 20.7

Solution:

# Stop container
docker stop telemetryflow_core_clickhouse

# Clean data directory
sudo rm -rf /opt/data/docker/telemetryflow-core/clickhouse/*

# Recreate directories with proper permissions
sudo mkdir -p /opt/data/docker/telemetryflow-core/clickhouse/{data,logs}
sudo chown -R 101:101 /opt/data/docker/telemetryflow-core/clickhouse
sudo chmod -R 777 /opt/data/docker/telemetryflow-core/clickhouse

# Start container
docker start telemetryflow_core_clickhouse

# Wait for healthy status
sleep 10 && docker ps --filter name=clickhouse

# Re-run migrations
pnpm db:migrate:clickhouse

Migrations Not Running

Check ClickHouse is running:
```
docker ps | grep clickhouse
```

Check connection:

docker exec telemetryflow_core_clickhouse clickhouse-client --query "SELECT 1"

Verify environment variables:
```
grep CLICKHOUSE_ .env
```
Run migrations manually:
```
pnpm db:migrate:clickhouse
```

Tables Not Appearing

Check migrations ran successfully:

docker exec telemetryflow_core_clickhouse clickhouse-client \
  --query "SHOW TABLES FROM telemetryflow_db"

Expected tables:
- audit_logs, audit_logs_stats, audit_logs_user_activity
- logs, logs_stats, logs_errors
- metrics, metrics_1m, metrics_1h
- traces, traces_stats, traces_errors
If missing, re-run migrations:
```
pnpm db:migrate:clickhouse
```

Permission Denied Errors

Error: mkdir: cannot create directory '/var/lib/clickhouse/': Permission denied

Solution:

# Fix directory permissions
sudo chown -R 101:101 /opt/data/docker/telemetryflow-core/clickhouse
sudo chmod -R 777 /opt/data/docker/telemetryflow-core/clickhouse

# Restart container
docker restart telemetryflow_core_clickhouse

High Memory Usage

Check table sizes (see Monitoring section)
Verify TTL is working (old data being deleted)
Consider reducing retention periods in migrations

Slow Queries

Use materialized views for aggregations
Add appropriate indexes
Partition by time for better performance
Use LIMIT clause to restrict result sets

Resources

Migration & Seed Documentation

Last Updated: 2025-12-05
Retention Policies: Audit Logs (90d), Logs (30d), Metrics (90d), Traces (7d)

FilesExpand file tree

CLICKHOUSE_LOGGING.md

Latest commit

History

CLICKHOUSE_LOGGING.md

File metadata and controls

ClickHouse Logging - TelemetryFlow Core

Overview

Architecture

Database Schema

Migration Files

1. Audit Logs Table

2. Logs Table

2. Metrics Table

3. Traces Table

4. Audit Logs Table

Setup

1. Run Migrations

2. Seed Sample Data (Optional)

3. Configure Environment

4. Verify Setup

Usage

Direct ClickHouse Client

Querying Data

Query Audit Logs

Query Logs

Query Metrics

Query Traces

Performance Optimization

Batch Inserts

Materialized Views

Available Materialized Views

Monitoring

Check Table Sizes

Check Partitions

TTL Status

Troubleshooting

ClickHouse Container Unhealthy

Migrations Not Running

Tables Not Appearing

Permission Denied Errors

High Memory Usage

Slow Queries

Resources

Migration & Seed Documentation