Testing Guide

This document describes the testing strategy and how to run tests for Claw Codex.

Test Structure

tests/
├── test_agent_loop.py
├── test_claude_code_tool_parity.py
├── test_config.py
├── test_context_system.py
├── test_output_styles.py
├── test_porting_workspace.py
├── test_providers.py
├── test_repl.py
├── test_skills_system.py
└── test_tool_system_tools.py

Running Tests

Run All Tests

# Activate the project environment first
source .venv/bin/activate

# Using pytest (recommended)
python -m pytest tests/ -q

# Using unittest
python -m unittest discover -s tests -v

Run Specific Test File

# Test configuration
python -m pytest tests/test_config.py -q

# Test providers
python -m pytest tests/test_providers.py -q

# Test REPL
python -m pytest tests/test_repl.py -q

# Test context and agent loop
python -m pytest tests/test_context_system.py tests/test_agent_loop.py -q

Run Specific Test

# Run specific test by name
python -m pytest tests/test_config.py::TestLoadSaveConfig::test_save_and_load_config -v

# Run tests matching pattern
python -m pytest tests/ -k "api_key" -v

Run with Coverage

# Install coverage tool
uv pip install pytest-cov

# Run tests with coverage report
python -m pytest tests/ --cov=src --cov-report=html

# Open coverage report
open htmlcov/index.html  # macOS
xdg-open htmlcov/index.html  # Linux

Test Categories

1. Configuration Tests (`test_config.py`)

Tests for configuration management:

Config Path: Test config file location and directory creation
Default Config: Test default configuration values
API Key Encoding: Test base64 encoding/decoding
Load/Save: Test config persistence
Provider Config: Test provider-specific settings
Set API Key: Test API key configuration
Default Provider: Test default provider management

Example:

def test_save_and_load_config(self):
    """Test save and load roundtrip."""
    config = {
        "default_provider": "glm",
        "providers": {
            "glm": {
                "api_key": "test_key",
                "base_url": "https://example.com",
                "default_model": "glm-4"
            }
        }
    }

    save_config(config)
    loaded = load_config()

    assert loaded["default_provider"] == "glm"

2. Provider Tests (`test_providers.py`)

Tests for LLM provider implementations:

ChatMessage: Test message dataclass
ChatResponse: Test response dataclass
Anthropic Provider: Test Claude integration
OpenAI Provider: Test GPT integration
GLM Provider: Test GLM integration
Provider Selection: Test provider class retrieval

Example:

@patch('anthropic.Anthropic')
def test_chat(self, mock_anthropic):
    """Test synchronous chat."""
    # Setup mock response
    mock_response = MagicMock()
    mock_response.content = [MagicMock(text="Hello!")]
    mock_response.model = "claude-sonnet-4-20250514"
    mock_response.usage = MagicMock(input_tokens=10, output_tokens=5)
    mock_response.stop_reason = "end_turn"

    # Test
    provider = AnthropicProvider(api_key="test_key")
    messages = [ChatMessage(role="user", content="Hi")]
    response = provider.chat(messages)

    assert response.content == "Hello!"

3. REPL Tests (`test_repl.py`)

Tests for interactive REPL:

REPL Initialization: Test REPL setup
Command Handling: Test slash commands
Session Management: Test save/load sessions
Conversation: Test message management
Multiline Mode: Test multiline input

Example:

def test_handle_command_multiline_toggle(self):
    """Test /multiline command."""
    repl = ClawcodexREPL(provider_name="glm")

    # Initially False
    assert repl.multiline_mode is False

    # Toggle to True
    repl.handle_command("/multiline")
    assert repl.multiline_mode is True

    # Toggle back to False
    repl.handle_command("/multiline")
    assert repl.multiline_mode is False

4. Porting Workspace Tests (`test_porting_workspace.py`)

Tests for porting completeness:

Manifest: Test file and module counts
Query Engine: Test summary generation
CLI Commands: Test command execution
Parity Audit: Test coverage verification
Session Tracking: Test turn state

Test Strategy

Unit Tests

Test individual functions and classes
Mock external dependencies (API calls)
Fast execution (< 1 second per test)
Independent and isolated

Integration Tests

Test component interactions
Use real API keys only in CI/CD (with secrets)
Longer execution time
May require cleanup

End-to-End Checks

Test complete workflows in the real REPL
Currently performed manually for provider login, REPL interaction, skills, and context behavior
Useful when validating prompt behavior or CLI UX changes

Writing Tests

Test Naming Convention

def test_<what_is_being_tested>(self):
    """Test description."""
    pass

Test Structure (AAA Pattern)

def test_feature(self):
    # Arrange - Set up test data
    config = {"default_provider": "glm"}

    # Act - Execute the code
    save_config(config)
    loaded = load_config()

    # Assert - Verify results
    assert loaded["default_provider"] == "glm"

Best Practices

One assertion per test (when practical)
Use descriptive test names
Test edge cases and error conditions
Keep tests independent
Use fixtures for common setup
Mock external dependencies

Example Test with Mock

@patch('src.providers.openai.OpenAI')
def test_openai_chat(self, mock_openai):
    """Test OpenAI chat with mock."""
    # Arrange
    mock_client = MagicMock()
    mock_response = MagicMock()
    mock_response.choices[0].message.content = "Response"
    mock_client.chat.completions.create.return_value = mock_response
    mock_openai.return_value = mock_client

    # Act
    provider = OpenAIProvider(api_key="test")
    response = provider.chat([ChatMessage(role="user", content="Hi")])

    # Assert
    self.assertEqual(response.content, "Response")

Test Coverage

Current Coverage

Coverage changes as features evolve
Use the commands below to generate up-to-date local reports
Prefer focusing on critical paths rather than preserving a stale percentage in docs

Coverage Goals

Minimum: 80%
Target: 90%+
Critical paths: 100%

Check Coverage

# Generate coverage report
python -m pytest tests/ --cov=src --cov-report=term-missing

# View missing lines
python -m pytest tests/ --cov=src --cov-report=term-missing | grep "TOTAL"

Continuous Integration

Tests run automatically on:

Pull requests
Commits to main branch
Releases

CI Configuration

Tests are configured in .github/workflows/ (if exists):

- name: Run tests
  run: python -m pytest tests/ -q --cov=src

Test Data

Fixtures

Common test data is stored in fixtures:

# In test file
def setUp(self):
    """Set up test fixtures."""
    self.temp_dir = tempfile.mkdtemp()
    self.config_dir = Path(self.temp_dir) / ".clawcodex"
    self.config_dir.mkdir(parents=True, exist_ok=True)

Test Sessions

Test sessions are created in temporary directories and cleaned up after tests.

Troubleshooting

Common Issues

Import errors: Ensure src/ is in Python path
```
export PYTHONPATH="${PYTHONPATH}:$(pwd)"
```
API key errors: Tests should use mocks, not real API keys
Permission errors: Check file permissions in test directories
Slow tests: Check for network calls (should be mocked)

Debug Tests

# Run with verbose output
python -m pytest tests/ -v -s

# Run with pdb debugger
python -m pytest tests/ --pdb

# Run specific failing test with output
python -m pytest tests/test_config.py::TestClassName::test_name -v -s

Performance Tests

# Run performance benchmarks
python -m pytest tests/ --benchmark-only

Security Tests

API keys are never logged
Config files use encoded keys
Secrets are not in git
.env is in .gitignore

Contributing Tests

When adding new features:

Write tests first (TDD approach)
Test edge cases
Document test purpose
Ensure all tests pass
Check coverage

Test Maintenance

Review and update tests when:
- Adding new features
- Fixing bugs
- Refactoring code
- Updating dependencies

Summary

Good testing practices ensure:

Code reliability
Regression prevention
Documentation of behavior
Confidence in refactoring
Better code design

Run tests before every commit!

source .venv/bin/activate
python -m pytest tests/ -q

FilesExpand file tree

TESTING.md

Latest commit

History

TESTING.md

File metadata and controls

Testing Guide

Test Structure

Running Tests

Run All Tests

Run Specific Test File

Run Specific Test

Run with Coverage

Test Categories

1. Configuration Tests (test_config.py)

2. Provider Tests (test_providers.py)

3. REPL Tests (test_repl.py)

4. Porting Workspace Tests (test_porting_workspace.py)

Test Strategy

Unit Tests

Integration Tests

End-to-End Checks

Writing Tests

Test Naming Convention

Test Structure (AAA Pattern)

Best Practices

Example Test with Mock

Test Coverage

Current Coverage

Coverage Goals

Check Coverage

Continuous Integration

CI Configuration

Test Data

Fixtures

Test Sessions

Troubleshooting

Common Issues

Debug Tests

Performance Tests

Security Tests

Contributing Tests

Test Maintenance

Summary

1. Configuration Tests (`test_config.py`)

2. Provider Tests (`test_providers.py`)

3. REPL Tests (`test_repl.py`)

4. Porting Workspace Tests (`test_porting_workspace.py`)