Deployment Guide

This guide covers deployment options for Cyber-AutoAgent in various environments.

Invocation Methods

Cyber-AutoAgent supports 4 invocation methods, each with different use cases:

1. Python CLI (Direct Execution)

Best for: Automation, scripting, CI/CD pipelines

# Configure via environment variables
export AZURE_API_KEY="your_key"
export AZURE_API_BASE="https://your-endpoint.openai.azure.com/"
export AZURE_API_VERSION="2024-12-01-preview"
export CYBER_AGENT_LLM_MODEL="azure/gpt-5"
export CYBER_AGENT_EMBEDDING_MODEL="azure/text-embedding-3-large"
export REASONING_EFFORT="medium"
export KMP_DUPLICATE_LIB_OK="TRUE"

# Run with uv (recommended)
uv run python src/cyberautoagent.py \
  --target "https://example.com" \
  --objective "Bug bounty assessment" \
  --iterations 150 \
  --provider litellm

2. NPM Auto-Run (Config File)

Best for: Repeated testing with saved config, development

# Uses ~/.cyber-autoagent/config.json for settings
cd src/modules/interfaces/react
npm start -- --auto-run \
  --target "https://example.com" \
  --objective "Security assessment" \
  --iterations 50

Configure via ~/.cyber-autoagent/config.json:

{
  "modelProvider": "litellm",
  "modelId": "azure/gpt-5",
  "embeddingModel": "azure/text-embedding-3-large",
  "azureApiKey": "your_key",
  "azureApiBase": "https://your-endpoint.openai.azure.com/",
  "azureApiVersion": "2024-12-01-preview",
  "reasoningEffort": "medium"
}

3. Docker (Standalone Container)

Best for: Isolated environments, clean tooling, reproducibility

With Interactive React Terminal:

docker run -it --rm \
  -e AZURE_API_KEY=your_key \
  -e AZURE_API_BASE=https://your-endpoint.openai.azure.com/ \
  -e CYBER_AGENT_LLM_MODEL=azure/gpt-5 \
  -v $(pwd)/outputs:/app/outputs \
  cyber-autoagent:latest

Direct Python Execution (Override Entrypoint):

docker run --rm --entrypoint python \
  -e AZURE_API_KEY=your_key \
  -e AZURE_API_BASE=https://your-endpoint.openai.azure.com/ \
  -e AZURE_API_VERSION=2024-12-01-preview \
  -e CYBER_AGENT_LLM_MODEL=azure/gpt-5 \
  -e CYBER_AGENT_EMBEDDING_MODEL=azure/text-embedding-3-large \
  -e REASONING_EFFORT=medium \
  -v $(pwd)/outputs:/app/outputs \
  cyber-autoagent:latest \
  src/cyberautoagent.py \
  --target https://example.com \
  --objective "Security assessment" \
  --iterations 50 \
  --provider litellm

4. Docker Compose (Full Stack)

Best for: Observability, team deployments, production monitoring

# Uses docker/.env for configuration
docker compose -f docker/docker-compose.yml up -d

Universal Provider Support

Cyber-AutoAgent supports 300+ LLM providers via LiteLLM. Examples:

Azure OpenAI:

-e AZURE_API_KEY=your_key
-e AZURE_API_BASE=https://your-endpoint.openai.azure.com/
-e AZURE_API_VERSION=2024-12-01-preview
-e CYBER_AGENT_LLM_MODEL=azure/gpt-5
-e CYBER_AGENT_EMBEDDING_MODEL=azure/text-embedding-3-large

AWS Bedrock:

-e AWS_ACCESS_KEY_ID=your_key
-e AWS_SECRET_ACCESS_KEY=your_secret
-e CYBER_AGENT_LLM_MODEL=us.anthropic.claude-sonnet-4-5-20250929-v1:0
-e CYBER_AGENT_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0

OpenRouter:

-e OPENROUTER_API_KEY=your_key
-e CYBER_AGENT_LLM_MODEL=openrouter/openrouter/polaris-alpha
-e CYBER_AGENT_EMBEDDING_MODEL=azure/text-embedding-3-large

Moonshot AI:

-e MOONSHOT_API_KEY=your_key
-e OPENAI_API_KEY=your_key  # Required for Mem0 OpenAI-compatible providers
-e CYBER_AGENT_LLM_MODEL=moonshot/kimi-k2-thinking
-e CYBER_AGENT_EMBEDDING_MODEL=azure/text-embedding-3-large
-e MEM0_LLM_MODEL=azure/gpt-4o  # Memory system LLM (use Azure/Anthropic/Bedrock for Mem0)
-e AZURE_API_KEY=azure_key  # Required for embeddings and Mem0
-e AZURE_API_BASE=https://your-endpoint.openai.azure.com/
-e AZURE_API_VERSION=2024-12-01-preview

Note: When using OpenAI-compatible providers (Moonshot, OpenRouter, etc.) with Mem0, you must:

Set OPENAI_API_KEY to the provider's API key for Mem0 compatibility
Use a supported Mem0 provider (Azure, OpenAI, Anthropic, Bedrock) for MEM0_LLM_MODEL

Mixed Providers: You can combine any LLM with any embedding model!

Quick Start

Using Docker

# Clone the repository
git clone https://github.com/double16/Cyber-AutoAgent-ng.git
cd cyber-autoagent

# Build and run with Docker Compose (includes observability)
cd docker
docker-compose up -d

# Run a penetration test
docker run --rm \
  --network cyber-autoagent_default \
  -e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
  -e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
  -e LANGFUSE_HOST=http://langfuse-web:3000 \
  -e LANGFUSE_PUBLIC_KEY=cyber-public \
  -e LANGFUSE_SECRET_KEY=cyber-secret \
  -v $(pwd)/outputs:/app/outputs \
  cyber-autoagent \
  --target "example.com" \
  --objective "Web application security assessment"

Standalone Docker

For just the agent without observability:

# Build the image
# (optional) docker build --pull -f docker/Dockerfile.tools -t public.ecr.aws/bramblethorn/cyber-autoagent-ng/tools:latest .
docker build -t cyber-autoagent -f docker/Dockerfile .

# Run with AWS Bedrock
docker run --rm \
  -e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
  -e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
  -e AWS_REGION=${AWS_REGION:-us-east-1} \
  -v $(pwd)/outputs:/app/outputs \
  cyber-autoagent \
  --target "192.168.1.100" \
  --objective "Network security assessment" \
  --provider bedrock

# Run with Ollama (local)
docker run --rm \
  -e OLLAMA_HOST=http://host.docker.internal:11434 \
  -e OLLAMA_CONTEXT_LENGTH=32768 \
  -v $(pwd)/outputs:/app/outputs \
  cyber-autoagent \
  --target "testsite.local" \
  --objective "Basic security scan" \
  --provider ollama \
  --model qwen3-coder:30b-a3b-q4_K_M

Production Deployment

Security Considerations

Network Isolation: Deploy in an isolated network segment
Resource Limits: Set memory and CPU limits in docker-compose.yml

Secure Keys: Generate proper encryption keys for Langfuse:

# Generate secure keys
openssl rand -hex 32  # For ENCRYPTION_KEY
openssl rand -base64 32  # For SALT
openssl rand -base64 32  # For NEXTAUTH_SECRET

Configuration System

Architecture

Cyber-AutoAgent uses a modular, three-tier configuration system with automatic model detection and safe token limit allocation.

Configuration Modules:

config/
├── manager.py           # Core orchestration (ConfigManager)
├── types.py             # Type definitions and dataclasses
├── models/              # Model creation and capabilities
├── system/              # Environment, logging, validation
└── providers/           # Provider-specific helpers

See src/modules/config/README.md for complete module documentation.

Configuration Precedence

Settings are applied in this priority order:

1. CLI/API Arguments (Highest)
   └─ Flags: --provider, --model, --iterations
   └─ Direct parameters to create_agent()

2. Environment Variables (Override)
   └─ CYBER_AGENT_LLM_MODEL
   └─ CYBER_AGENT_EMBEDDING_MODEL
   └─ REASONING_EFFORT
   └─ Provider-specific: AZURE_API_KEY, AWS_REGION, etc.

3. Provider Defaults (Fallback)
   └─ Safe defaults for all providers
   └─ Automatically selected based on provider

Example:

# Default: temperature=0.5 (from provider defaults)
# Override via environment: CYBER_LLM_TEMPERATURE=0.8
# Override via CLI: create_agent(..., temperature=0.7)
# Result: Uses 0.7 (CLI has highest priority)

Models.dev Integration

Token limits are automatically detected using the models.dev API with resilient fallback:

Three-Tier Fallback:

Disk cache (~/.cache/cyber-autoagent/models.json, 24h TTL)
Live API (https://models.dev/api.json)
Embedded snapshot (models_snapshot.json, 432KB bundled)

Benefits:

Accurate limits for 1,100+ models across 58 providers
Works offline (embedded snapshot)
Safe token allocation (50% of actual limit by default)
Automatic capability detection (reasoning, tools, attachments)

Safe Token Limits:

# Specialist tools use 50% of model's output limit for reliability
safe_max = model_output_limit * 0.5

# Example: Bedrock Claude 3.5 Sonnet v2
# Actual limit: 8,192 tokens
# Safe allocation: 4,096 tokens

Token Limit Resolution

Token limits use five-tier precedence:

Explicit override - CYBER_PROMPT_LIMIT_FORCE environment variable
Context window maximum - CYBER_CONTEXT_LIMIT, if defined, limits the following
Models.dev API - Authoritative registry (preferred)
Fallback mappings - CYBER_CONTEXT_WINDOW_FALLBACKS (JSON)
Provider defaults - Safe defaults per provider

Example fallback configuration:

export CYBER_CONTEXT_WINDOW_FALLBACKS='[
  {"azure/gpt-5": ["azure/gpt-4o", "azure/gpt-4"]},
  {"anthropic/claude-opus": ["anthropic/claude-sonnet-4-5"]}
]'

Environment Variables

Variable	Description	Required
`CYBER_AGENT_PROVIDER`	Provider choice (bedrock/ollama/litellm)	No (auto-detected)
`CYBER_AGENT_LLM_MODEL`	Main LLM model ID	Yes
`CYBER_AGENT_EMBEDDING_MODEL`	Embedding model ID	No (provider default)
`REASONING_EFFORT`	Reasoning effort (low/medium/high)	No (default: medium)
`MAX_TOKENS`	Override LLM max (output) tokens	No (models.dev default)
`CYBER_AGENT_SWARM_MODEL`	Swarm LLM model ID	No
`CYBER_AGENT_SWARM_MAX_TOKENS`	Override specialist max tokens	No (models.dev default)
`MAX_TOKENS_LIMIT`	Override LLM output token upper bound	No (12,000 default)
`MAX_TOKENS_REASONING_LIMIT`	Override LLM output token upper bound (reasoning)	No (32,000 default)
`CYBER_CONTEXT_LIMIT`	Limit detected prompt tokens	No (auto-detected)
`CYBER_PROMPT_LIMIT_FORCE`	Force prompt token limit	No (auto-detected)
`AWS_ACCESS_KEY_ID`	AWS credentials for Bedrock	For Bedrock provider
`AWS_SECRET_ACCESS_KEY`	AWS credentials for Bedrock	For Bedrock provider
`AWS_REGION`	AWS region (default: us-east-1)	For Bedrock provider
`OLLAMA_HOST`	Ollama API endpoint	For Ollama provider
`OLLAMA_CONTEXT_LENGTH`	Ollama model context length	No, Ollama default
`OLLAMA_TIMEOUT`	Ollama API timeout in seconds	No (default: 120)
`AZURE_API_KEY`	Azure OpenAI API key	For Azure/LiteLLM
`AZURE_API_BASE`	Azure endpoint URL	For Azure/LiteLLM
`AZURE_API_VERSION`	Azure API version	For Azure/LiteLLM
`MEM0_API_KEY`	Mem0 Platform API key	For cloud memory backend
`MEM0_LLM_MODEL`	Memory system LLM	No (auto-aligned)
`OPENSEARCH_HOST`	OpenSearch endpoint	For OpenSearch memory backend
`LANGFUSE_HOST`	Langfuse observability endpoint	For observability
`LANGFUSE_PUBLIC_KEY`	Langfuse API public key	For observability
`LANGFUSE_SECRET_KEY`	Langfuse API secret key	For observability
`ENABLE_AUTO_EVALUATION`	Enable automatic Ragas evaluation	For evaluation
`CYBER_RATE_LIMIT_REQ_PER_MIN`	Limit model requests per minute	No (no limit)
`CYBER_RATE_LIMIT_TOKENS_PER_MIN`	Limit model tokens per minute	No (no limit)
`CYBER_RATE_LIMIT_MAX_CONCURRENT`	Limit model concurrent requests	No (Ollama defaults to 1)
`CYBER_AGENT_PRICING_INPUT`	Model price per 1M input tokens	No (defaults to models.dev)
`CYBER_AGENT_PRICING_OUTPUT`	Model price per 1M output tokens	No (defaults to models.dev)
`CYBER_AGENT_PRICING_CACHE_READ`	Model price per 1M cache read tokens	No (defaults to models.dev)
`CYBER_AGENT_PRICING_CACHE_WRITE`	Model price per 1M cache write tokens	No (defaults to models.dev)

Kubernetes Deployment

Example deployment manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cyber-autoagent
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cyber-autoagent
  template:
    metadata:
      labels:
        app: cyber-autoagent
    spec:
      containers:
      - name: cyber-autoagent
        image: cyber-autoagent:latest
        env:
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: aws-credentials
              key: access-key-id
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              name: aws-credentials
              key: secret-access-key
        volumeMounts:
        - name: outputs
          mountPath: /app/outputs
      volumes:
      - name: outputs
        persistentVolumeClaim:
          claimName: outputs-pvc

Monitoring

Access Langfuse UI at http://localhost:3000
Default credentials: admin@cyber-autoagent.com / changeme
View real-time traces of agent operations
Export results for reporting

React Interface Deployment

The React terminal interface provides interactive configuration and real-time monitoring:

# Install and build
cd src/modules/interfaces/react
npm install
npm run build

# Start the interface
npm start

# The interface will guide you through:
# 1. Docker environment setup
# 2. Deployment mode selection (local-cli, single-container, full-stack)
# 3. Model provider configuration (Bedrock, Ollama, LiteLLM)
# 4. First assessment execution

Access the interface at http://localhost:3000 when using full-stack deployment with observability.

Memory Backend Configuration

Cyber-AutoAgent supports three memory backends with automatic selection:

Backend	Priority	Environment Variable	Use Case
Mem0 Platform	1	`MEM0_API_KEY`	Cloud-hosted, managed service
OpenSearch	2	`OPENSEARCH_HOST`	AWS managed search, production scale
FAISS	3	None (default)	Local vector storage, development

Memory persists in outputs/<target>/memory/ for cross-operation learning.

Configuration Examples

Azure OpenAI with Reasoning

export AZURE_API_KEY=your_key
export AZURE_API_BASE=https://your-endpoint.openai.azure.com/
export AZURE_API_VERSION=2024-12-01-preview
export CYBER_AGENT_LLM_MODEL=azure/gpt-5
export CYBER_AGENT_EMBEDDING_MODEL=azure/text-embedding-3-large
export REASONING_EFFORT=high
export MAX_TOKENS=8000  # Optional: Override default

AWS Bedrock with Memory

export AWS_REGION=us-east-1
export CYBER_AGENT_LLM_MODEL=us.anthropic.claude-sonnet-4-5-20250929-v1:0
export CYBER_AGENT_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
export MEM0_API_KEY=your_mem0_key  # Cloud memory backend
export REASONING_EFFORT=medium

Moonshot AI (Mixed Providers)

export MOONSHOT_API_KEY=your_key
export CYBER_AGENT_LLM_MODEL=moonshot/kimi-k2-thinking
export CYBER_AGENT_EMBEDDING_MODEL=azure/text-embedding-3-large
export AZURE_API_KEY=your_azure_key  # For embeddings
export AZURE_API_BASE=https://your-endpoint.openai.azure.com/
export AZURE_API_VERSION=2024-12-01-preview
export MEM0_LLM_MODEL=azure/gpt-4o  # Memory system uses Azure
export OPENAI_API_KEY=your_moonshot_key  # Mem0 compatibility

Ollama with Context Window Fallbacks

export OLLAMA_HOST=http://localhost:11434
export OLLAMA_CONTEXT_LENGTH=32768
export CYBER_AGENT_LLM_MODEL=qwen3-coder:30b-a3b-q4_K_M
export CYBER_AGENT_EMBEDDING_MODEL=nomic-embed-text:latest
export CYBER_CONTEXT_WINDOW_FALLBACKS='[
  {"qwen3-coder:30b": ["qwen3-coder:14b", "llama3.2:3b"]}
]'

Troubleshooting

Common deployment issues:

Container fails to start: Check Docker logs with docker logs cyber-autoagent
AWS credentials error: Ensure IAM role has Bedrock access and correct region
Ollama connection failed: Verify Ollama is running and accessible at specified host
Out of memory: Increase Docker memory limits or reduce --iterations parameter
React interface issues: Run npm run build after any code changes
Memory backend errors: Verify environment variables and network connectivity
Model not found: Check model ID format (use provider/model for LiteLLM)
Token limit errors: Verify models.dev snapshot exists at src/modules/config/models/models_snapshot.json
Specialist failures: Check swarm max_tokens configuration (should be >100 tokens)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deployment Guide

Invocation Methods

1. Python CLI (Direct Execution)

2. NPM Auto-Run (Config File)

3. Docker (Standalone Container)

4. Docker Compose (Full Stack)

Universal Provider Support

Quick Start

Using Docker

Standalone Docker

Production Deployment

Security Considerations

Configuration System

Architecture

Configuration Precedence

Models.dev Integration

Token Limit Resolution

Environment Variables

Kubernetes Deployment

Monitoring

React Interface Deployment

Memory Backend Configuration

Configuration Examples

Azure OpenAI with Reasoning

AWS Bedrock with Memory

Moonshot AI (Mixed Providers)

Ollama with Context Window Fallbacks

Troubleshooting

Uh oh!

FilesExpand file tree

deployment.md

Latest commit

History

deployment.md

File metadata and controls

Deployment Guide

Invocation Methods

1. Python CLI (Direct Execution)

2. NPM Auto-Run (Config File)

3. Docker (Standalone Container)

4. Docker Compose (Full Stack)

Universal Provider Support

Quick Start

Using Docker

Standalone Docker

Production Deployment

Security Considerations

Configuration System

Architecture

Configuration Precedence

Models.dev Integration

Token Limit Resolution

Environment Variables

Kubernetes Deployment

Monitoring

React Interface Deployment

Memory Backend Configuration

Configuration Examples

Azure OpenAI with Reasoning

AWS Bedrock with Memory

Moonshot AI (Mixed Providers)

Ollama with Context Window Fallbacks

Troubleshooting