PII-Shield 🛡️

Zero-code log sanitization sidecar for Kubernetes. Prevents data leaks (GDPR/SOC2) by redacting PII from logs before they leave the pod.

"Don't let PII poison your AI models." PII-Shield ensures that sensitive data never reaches your training dataset, saving you from GDPR-forced model retraining.

Warning

Upgrading to v2.0.0? We have moved entirely to a Helm-based distribution and Distroless Native Sidecars. Kustomize deployment and /bin/sh access inside the sidecar are no longer supported. Read the Migration Guide.

Two Deployment Models

PII-Shield offers two distinct ways to integrate into your stack:

Kubernetes Operator (Zero-code): Our flagship deployment model. A fully automated K8s Operator that injects a highly-secure Distroless Sidecar into your pods to intercept and sanitize logs on the fly.
In-Process WASM (For core integrations): For extreme performance, the core engine can be embedded directly via WASM, providing <1ms latency without network hops.

Why PII-Shield?

Developers often forget to mask sensitive data. Traditional regex filters in Fluentd/Logstash are slow, hard to maintain, and consume expensive CPU on log aggregators.

PII-Shield sits right next to your app container:

Production Ready: Optimized for Kubernetes sidecars with ultra-low memory allocations (zero-GC overhead on hot paths) and deterministic O(1) regex matching.
Context-Aware Entropy Analysis: Detected high-entropy secrets even without keys (e.g. Error: ... 44saCk9...) by analyzing context keywords.
Custom Regex Rules: Deterministic redaction for structured data (UUIDs, IDs) that overrides entropy checks, ensuring 100% compliance for known patterns.
100% Accuracy: Verified against "Wild" stress tests including binary garbage, JSON nesting, and multilingual logs.
Deterministic Hashing: Replaces secrets with unique hashes (e.g., [HIDDEN:a1b2c]), allowing QA to correlate errors without seeing the raw data.
Drop-in: No code changes required. Works with any language (Node, Python, Java, Go).
Whitelist Support: Explicitly allow safe patterns (e.g., git hashes, system IDs) using PII_SAFE_REGEX_LIST to prevent false positives.

Managing PII-Shield across dozens of clusters?

We are building a hosted Control Plane with centralized rule management, Slack alerting, and redaction analytics.

Trusted By

GuardSpine (AI Governance Kernel) integrated PII-Shield's In-Process WASM to sanitize sensitive evidence trails directly within their Node.js and Python agents.

We chose the WASM architecture to ensure zero network overhead and <1ms latency. PII-Shield runs directly in-process, preserving the referential integrity of our hash chains while keeping logs compliant.

Performance Considerations

While PII-Shield is highly optimized, deep inspection of complex logs requires careful attention to configuration.

Text Logs: Extremely fast (>100k lines/s).
JSON Logs: Zero-allocation parsing (no encoding/json overhead). The scanner manually parses JSON structures to ensure high throughput (~7MB/s) without memory spikes.
Recommendation: Usage is safe for high throughput. We use recursion safeguards to prevent stack overflows on deeply nested JSON.

Installation

Helm Chart (Kubernetes Operator)

The official and recommended way to deploy PII-Shield in Kubernetes is via our fully-automated Operator:

helm repo add pii-shield https://aragossa.github.io/pii-shield/
helm repo update
helm install pii-shield-operator pii-shield/pii-shield-operator -n operator-system --create-namespace

This deploys the PII-Shield Operator which automatically injects highly-secure, distroless sidecars into your Pods without requiring any code or Dockerfile changes.

Docker

Get the latest lightweight image from Docker Hub or GHCR:

docker pull thelisdeep/pii-shield:v2.0.0
# OR from GitHub Container Registry (Enterprise):
docker pull ghcr.io/aragossa/pii-shield:v2.0.0

Build from Source

You can build the binary directly from the source code:

go build -o pii-shield ./cmd/cleaner/main.go

Configuration

See CONFIGURATION.md for a full list of environment variables, including:

PII_SALT: Custom HMAC salt (Required for production).
PII_ADAPTIVE_THRESHOLD: Enable dynamic entropy baselines.
PII_DISABLE_BIGRAM_CHECK: Optimize for non-English logs.
PII_CUSTOM_REGEX_LIST: Custom regex rules for deterministic redaction.
PII_SAFE_REGEX_LIST: Whitelist regex rules to ignore (matches are returned as-is).

Entropy Sensitivity Table (Default Threshold: 3.6)

Entropy	Data Type	Example
0.0 - 3.0	Common words, repeats	`password`, `admin`, `111111`
3.0 - 3.6	CamelCase, partial hashes	`ProgramCampaignInstanceJob`, `8f3a11b2c`
3.6 - 4.5	Paths, UUIDs, Weak Passwords	`/opt/application/runtime`, `P@ssw0rd2026!`
4.5 - 5.0	Medium Tokens	`E8s9d_2kL1`
5.0+	High Entropy Keys	(SHA-256, API Keys)

Quick Start

Test Locally (CLI) You can pipe any log output through PII-Shield to see it in action immediately:

# Emulate a log with a sensitive password
echo "Error: User password=MySecretPass123! failed login" | docker run -i --rm ghcr.io/aragossa/pii-shield:v2.0.0

# Output: Error: User password=[HIDDEN:8f3a11] failed login

Kubernetes (Automated Sidecar Injection) With the PII-Shield Operator installed, protecting an application is as simple as creating a PiiPolicy and labeling your Pods.

Create a Policy:

apiVersion: core.pii-shield.io/v1alpha1
kind: PiiPolicy
metadata:
  name: strict-policy
  namespace: default
spec:
  injectionMode: "file"

Label your Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: secure-app
spec:
  template:
    metadata:
      labels:
        pii-shield.io/inject: "true"
      annotations:
        pii-shield.io/policy: "strict-policy"
# ...

The Operator will automatically inject the pii-shield-agent using the Native Sidecar pattern (K8s 1.28+) and securely mask all logs!

Verification

This project is verified with a comprehensive testing suite, ensuring production-readiness for v2.0.0:

Unit Tests: Cover edge cases, multilingual support, and JSON integrity with >85% coverage.
Fuzzing: Native Go fuzzing ensures crash safety against invalid and random binary inputs.
Smoke Testing: ./run_smoke.sh validates 100% detection accuracy on mixed workloads.
End-to-End (E2E) Testing: The operator/tests/run_e2e.sh suite performs full-stack validation using Minikube and Helm. It builds local images, provisions the Operator without cert-manager, deploys target Jobs, and verifies actual log redaction by intercepting sidecar outputs.

License

Distributed under the Apache 2.0 License. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
.github/workflows		.github/workflows
benchmark		benchmark
charts		charts
cmd		cmd
operator		operator
pkg		pkg
sdks		sdks
.codecov.yml		.codecov.yml
.gitignore		.gitignore
.goreleaser.yaml		.goreleaser.yaml
CONFIGURATION.md		CONFIGURATION.md
Dockerfile		Dockerfile
Dockerfile.agent		Dockerfile.agent
LICENSE		LICENSE
MANUAL_TESTING.md		MANUAL_TESTING.md
README.md		README.md
artifacthub-repo.yml		artifacthub-repo.yml
go.mod		go.mod
go.sum		go.sum
run_smoke.sh		run_smoke.sh
run_unit.sh		run_unit.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PII-Shield 🛡️

Two Deployment Models

Why PII-Shield?

Managing PII-Shield across dozens of clusters?

Trusted By

Performance Considerations

Installation

Helm Chart (Kubernetes Operator)

Docker

Build from Source

Configuration

Entropy Sensitivity Table (Default Threshold: 3.6)

Quick Start

Verification

License

About

Uh oh!

Releases 26

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PII-Shield 🛡️

Two Deployment Models

Why PII-Shield?

Managing PII-Shield across dozens of clusters?

Trusted By

Performance Considerations

Installation

Helm Chart (Kubernetes Operator)

Docker

Build from Source

Configuration

Entropy Sensitivity Table (Default Threshold: 3.6)

Quick Start

Verification

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 26

Contributors

Uh oh!

Languages