Zero-code log sanitization sidecar for Kubernetes. Prevents data leaks (GDPR/SOC2) by redacting PII from logs before they leave the pod.
"Don't let PII poison your AI models." PII-Shield ensures that sensitive data never reaches your training dataset, saving you from GDPR-forced model retraining.
Warning
Upgrading to v2.0.0?
We have moved entirely to a Helm-based distribution and Distroless Native Sidecars. Kustomize deployment and /bin/sh access inside the sidecar are no longer supported. Read the Migration Guide.
PII-Shield offers two distinct ways to integrate into your stack:
- Kubernetes Operator (Zero-code): Our flagship deployment model. A fully automated K8s Operator that injects a highly-secure Distroless Sidecar into your pods to intercept and sanitize logs on the fly.
- In-Process WASM (For core integrations): For extreme performance, the core engine can be embedded directly via WASM, providing
<1mslatency without network hops.
Developers often forget to mask sensitive data. Traditional regex filters in Fluentd/Logstash are slow, hard to maintain, and consume expensive CPU on log aggregators.
PII-Shield sits right next to your app container:
- Production Ready: Optimized for Kubernetes sidecars with ultra-low memory allocations (zero-GC overhead on hot paths) and deterministic O(1) regex matching.
- Context-Aware Entropy Analysis: Detected high-entropy secrets even without keys (e.g.
Error: ... 44saCk9...) by analyzing context keywords. - Custom Regex Rules: Deterministic redaction for structured data (UUIDs, IDs) that overrides entropy checks, ensuring 100% compliance for known patterns.
- 100% Accuracy: Verified against "Wild" stress tests including binary garbage, JSON nesting, and multilingual logs.
- Deterministic Hashing: Replaces secrets with unique hashes (e.g.,
[HIDDEN:a1b2c]), allowing QA to correlate errors without seeing the raw data. - Drop-in: No code changes required. Works with any language (Node, Python, Java, Go).
- Whitelist Support: Explicitly allow safe patterns (e.g., git hashes, system IDs) using
PII_SAFE_REGEX_LISTto prevent false positives.
We are building a hosted Control Plane with centralized rule management, Slack alerting, and redaction analytics.
GuardSpine (AI Governance Kernel) integrated PII-Shield's In-Process WASM to sanitize sensitive evidence trails directly within their Node.js and Python agents.
We chose the WASM architecture to ensure zero network overhead and <1ms latency. PII-Shield runs directly in-process, preserving the referential integrity of our hash chains while keeping logs compliant.
While PII-Shield is highly optimized, deep inspection of complex logs requires careful attention to configuration.
- Text Logs: Extremely fast (>100k lines/s).
- JSON Logs: Zero-allocation parsing (no
encoding/jsonoverhead). The scanner manually parses JSON structures to ensure high throughput (~7MB/s) without memory spikes. - Recommendation: Usage is safe for high throughput. We use recursion safeguards to prevent stack overflows on deeply nested JSON.
The official and recommended way to deploy PII-Shield in Kubernetes is via our fully-automated Operator:
helm repo add pii-shield https://aragossa.github.io/pii-shield/
helm repo update
helm install pii-shield-operator pii-shield/pii-shield-operator -n operator-system --create-namespaceThis deploys the PII-Shield Operator which automatically injects highly-secure, distroless sidecars into your Pods without requiring any code or Dockerfile changes.
Get the latest lightweight image from Docker Hub or GHCR:
docker pull thelisdeep/pii-shield:v2.0.0
# OR from GitHub Container Registry (Enterprise):
docker pull ghcr.io/aragossa/pii-shield:v2.0.0You can build the binary directly from the source code:
go build -o pii-shield ./cmd/cleaner/main.goSee CONFIGURATION.md for a full list of environment variables, including:
PII_SALT: Custom HMAC salt (Required for production).PII_ADAPTIVE_THRESHOLD: Enable dynamic entropy baselines.PII_DISABLE_BIGRAM_CHECK: Optimize for non-English logs.PII_CUSTOM_REGEX_LIST: Custom regex rules for deterministic redaction.PII_SAFE_REGEX_LIST: Whitelist regex rules to ignore (matches are returned as-is).
| Entropy | Data Type | Example |
|---|---|---|
| 0.0 - 3.0 | Common words, repeats | password, admin, 111111 |
| 3.0 - 3.6 | CamelCase, partial hashes | ProgramCampaignInstanceJob, 8f3a11b2c |
| 3.6 - 4.5 | Paths, UUIDs, Weak Passwords | /opt/application/runtime, P@ssw0rd2026! |
| 4.5 - 5.0 | Medium Tokens | E8s9d_2kL1 |
| 5.0+ | High Entropy Keys | (SHA-256, API Keys) |
- Test Locally (CLI) You can pipe any log output through PII-Shield to see it in action immediately:
# Emulate a log with a sensitive password
echo "Error: User password=MySecretPass123! failed login" | docker run -i --rm ghcr.io/aragossa/pii-shield:v2.0.0
# Output: Error: User password=[HIDDEN:8f3a11] failed login- Kubernetes (Automated Sidecar Injection)
With the PII-Shield Operator installed, protecting an application is as simple as creating a
PiiPolicyand labeling your Pods.
Create a Policy:
apiVersion: core.pii-shield.io/v1alpha1
kind: PiiPolicy
metadata:
name: strict-policy
namespace: default
spec:
injectionMode: "file"Label your Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: secure-app
spec:
template:
metadata:
labels:
pii-shield.io/inject: "true"
annotations:
pii-shield.io/policy: "strict-policy"
# ...The Operator will automatically inject the pii-shield-agent using the Native Sidecar pattern (K8s 1.28+) and securely mask all logs!
This project is verified with a comprehensive testing suite, ensuring production-readiness for v2.0.0:
- Unit Tests: Cover edge cases, multilingual support, and JSON integrity with >85% coverage.
- Fuzzing: Native Go fuzzing ensures crash safety against invalid and random binary inputs.
- Smoke Testing:
./run_smoke.shvalidates 100% detection accuracy on mixed workloads. - End-to-End (E2E) Testing: The
operator/tests/run_e2e.shsuite performs full-stack validation using Minikube and Helm. It builds local images, provisions the Operator without cert-manager, deploys target Jobs, and verifies actual log redaction by intercepting sidecar outputs.
Distributed under the Apache 2.0 License. See LICENSE for more information.