Skip to content

aragossa/pii-shield

Repository files navigation

PII-Shield 🛡️

Zero-code log sanitization sidecar for Kubernetes. Prevents data leaks (GDPR/SOC2) by redacting PII from logs before they leave the pod.

License Docker Pulls Go Report Card Go Reference Build Status Coverage Status GitHub release (latest SemVer) Artifact Hub PyPI Downloads npm Downloads

"Don't let PII poison your AI models." PII-Shield ensures that sensitive data never reaches your training dataset, saving you from GDPR-forced model retraining.

Warning

Upgrading to v2.0.0? We have moved entirely to a Helm-based distribution and Distroless Native Sidecars. Kustomize deployment and /bin/sh access inside the sidecar are no longer supported. Read the Migration Guide.

Two Deployment Models

PII-Shield offers two distinct ways to integrate into your stack:

  1. Kubernetes Operator (Zero-code): Our flagship deployment model. A fully automated K8s Operator that injects a highly-secure Distroless Sidecar into your pods to intercept and sanitize logs on the fly.
  2. In-Process WASM (For core integrations): For extreme performance, the core engine can be embedded directly via WASM, providing <1ms latency without network hops.

Why PII-Shield?

Developers often forget to mask sensitive data. Traditional regex filters in Fluentd/Logstash are slow, hard to maintain, and consume expensive CPU on log aggregators.

PII-Shield sits right next to your app container:

  • Production Ready: Optimized for Kubernetes sidecars with ultra-low memory allocations (zero-GC overhead on hot paths) and deterministic O(1) regex matching.
  • Context-Aware Entropy Analysis: Detected high-entropy secrets even without keys (e.g. Error: ... 44saCk9...) by analyzing context keywords.
  • Custom Regex Rules: Deterministic redaction for structured data (UUIDs, IDs) that overrides entropy checks, ensuring 100% compliance for known patterns.
  • 100% Accuracy: Verified against "Wild" stress tests including binary garbage, JSON nesting, and multilingual logs.
  • Deterministic Hashing: Replaces secrets with unique hashes (e.g., [HIDDEN:a1b2c]), allowing QA to correlate errors without seeing the raw data.
  • Drop-in: No code changes required. Works with any language (Node, Python, Java, Go).
  • Whitelist Support: Explicitly allow safe patterns (e.g., git hashes, system IDs) using PII_SAFE_REGEX_LIST to prevent false positives.

Managing PII-Shield across dozens of clusters?

We are building a hosted Control Plane with centralized rule management, Slack alerting, and redaction analytics. Join the Waitlist

Trusted By

GuardSpine (AI Governance Kernel) integrated PII-Shield's In-Process WASM to sanitize sensitive evidence trails directly within their Node.js and Python agents.

We chose the WASM architecture to ensure zero network overhead and <1ms latency. PII-Shield runs directly in-process, preserving the referential integrity of our hash chains while keeping logs compliant.

Performance Considerations

While PII-Shield is highly optimized, deep inspection of complex logs requires careful attention to configuration.

  • Text Logs: Extremely fast (>100k lines/s).
  • JSON Logs: Zero-allocation parsing (no encoding/json overhead). The scanner manually parses JSON structures to ensure high throughput (~7MB/s) without memory spikes.
  • Recommendation: Usage is safe for high throughput. We use recursion safeguards to prevent stack overflows on deeply nested JSON.

Installation

Helm Chart (Kubernetes Operator)

The official and recommended way to deploy PII-Shield in Kubernetes is via our fully-automated Operator:

helm repo add pii-shield https://aragossa.github.io/pii-shield/
helm repo update
helm install pii-shield-operator pii-shield/pii-shield-operator -n operator-system --create-namespace

This deploys the PII-Shield Operator which automatically injects highly-secure, distroless sidecars into your Pods without requiring any code or Dockerfile changes.

Docker

Get the latest lightweight image from Docker Hub or GHCR:

docker pull thelisdeep/pii-shield:v2.0.0
# OR from GitHub Container Registry (Enterprise):
docker pull ghcr.io/aragossa/pii-shield:v2.0.0

Build from Source

You can build the binary directly from the source code:

go build -o pii-shield ./cmd/cleaner/main.go

Configuration

See CONFIGURATION.md for a full list of environment variables, including:

  • PII_SALT: Custom HMAC salt (Required for production).
  • PII_ADAPTIVE_THRESHOLD: Enable dynamic entropy baselines.
  • PII_DISABLE_BIGRAM_CHECK: Optimize for non-English logs.
  • PII_CUSTOM_REGEX_LIST: Custom regex rules for deterministic redaction.
  • PII_SAFE_REGEX_LIST: Whitelist regex rules to ignore (matches are returned as-is).

Entropy Sensitivity Table (Default Threshold: 3.6)

Entropy Data Type Example
0.0 - 3.0 Common words, repeats password, admin, 111111
3.0 - 3.6 CamelCase, partial hashes ProgramCampaignInstanceJob, 8f3a11b2c
3.6 - 4.5 Paths, UUIDs, Weak Passwords /opt/application/runtime, P@ssw0rd2026!
4.5 - 5.0 Medium Tokens E8s9d_2kL1
5.0+ High Entropy Keys (SHA-256, API Keys)

Quick Start

  1. Test Locally (CLI) You can pipe any log output through PII-Shield to see it in action immediately:
# Emulate a log with a sensitive password
echo "Error: User password=MySecretPass123! failed login" | docker run -i --rm ghcr.io/aragossa/pii-shield:v2.0.0

# Output: Error: User password=[HIDDEN:8f3a11] failed login
  1. Kubernetes (Automated Sidecar Injection) With the PII-Shield Operator installed, protecting an application is as simple as creating a PiiPolicy and labeling your Pods.

Create a Policy:

apiVersion: core.pii-shield.io/v1alpha1
kind: PiiPolicy
metadata:
  name: strict-policy
  namespace: default
spec:
  injectionMode: "file"

Label your Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: secure-app
spec:
  template:
    metadata:
      labels:
        pii-shield.io/inject: "true"
      annotations:
        pii-shield.io/policy: "strict-policy"
# ...

The Operator will automatically inject the pii-shield-agent using the Native Sidecar pattern (K8s 1.28+) and securely mask all logs!

Verification

This project is verified with a comprehensive testing suite, ensuring production-readiness for v2.0.0:

  1. Unit Tests: Cover edge cases, multilingual support, and JSON integrity with >85% coverage.
  2. Fuzzing: Native Go fuzzing ensures crash safety against invalid and random binary inputs.
  3. Smoke Testing: ./run_smoke.sh validates 100% detection accuracy on mixed workloads.
  4. End-to-End (E2E) Testing: The operator/tests/run_e2e.sh suite performs full-stack validation using Minikube and Helm. It builds local images, provisions the Operator without cert-manager, deploys target Jobs, and verifies actual log redaction by intercepting sidecar outputs.

License

Distributed under the Apache 2.0 License. See LICENSE for more information.