This document describes the testing strategy and how to run tests for Claw Codex.
tests/
├── test_agent_loop.py
├── test_claude_code_tool_parity.py
├── test_config.py
├── test_context_system.py
├── test_output_styles.py
├── test_porting_workspace.py
├── test_providers.py
├── test_repl.py
├── test_skills_system.py
└── test_tool_system_tools.py
# Activate the project environment first
source .venv/bin/activate
# Using pytest (recommended)
python -m pytest tests/ -q
# Using unittest
python -m unittest discover -s tests -v# Test configuration
python -m pytest tests/test_config.py -q
# Test providers
python -m pytest tests/test_providers.py -q
# Test REPL
python -m pytest tests/test_repl.py -q
# Test context and agent loop
python -m pytest tests/test_context_system.py tests/test_agent_loop.py -q# Run specific test by name
python -m pytest tests/test_config.py::TestLoadSaveConfig::test_save_and_load_config -v
# Run tests matching pattern
python -m pytest tests/ -k "api_key" -v# Install coverage tool
uv pip install pytest-cov
# Run tests with coverage report
python -m pytest tests/ --cov=src --cov-report=html
# Open coverage report
open htmlcov/index.html # macOS
xdg-open htmlcov/index.html # LinuxTests for configuration management:
- Config Path: Test config file location and directory creation
- Default Config: Test default configuration values
- API Key Encoding: Test base64 encoding/decoding
- Load/Save: Test config persistence
- Provider Config: Test provider-specific settings
- Set API Key: Test API key configuration
- Default Provider: Test default provider management
Example:
def test_save_and_load_config(self):
"""Test save and load roundtrip."""
config = {
"default_provider": "glm",
"providers": {
"glm": {
"api_key": "test_key",
"base_url": "https://example.com",
"default_model": "glm-4"
}
}
}
save_config(config)
loaded = load_config()
assert loaded["default_provider"] == "glm"Tests for LLM provider implementations:
- ChatMessage: Test message dataclass
- ChatResponse: Test response dataclass
- Anthropic Provider: Test Claude integration
- OpenAI Provider: Test GPT integration
- GLM Provider: Test GLM integration
- Provider Selection: Test provider class retrieval
Example:
@patch('anthropic.Anthropic')
def test_chat(self, mock_anthropic):
"""Test synchronous chat."""
# Setup mock response
mock_response = MagicMock()
mock_response.content = [MagicMock(text="Hello!")]
mock_response.model = "claude-sonnet-4-20250514"
mock_response.usage = MagicMock(input_tokens=10, output_tokens=5)
mock_response.stop_reason = "end_turn"
# Test
provider = AnthropicProvider(api_key="test_key")
messages = [ChatMessage(role="user", content="Hi")]
response = provider.chat(messages)
assert response.content == "Hello!"Tests for interactive REPL:
- REPL Initialization: Test REPL setup
- Command Handling: Test slash commands
- Session Management: Test save/load sessions
- Conversation: Test message management
- Multiline Mode: Test multiline input
Example:
def test_handle_command_multiline_toggle(self):
"""Test /multiline command."""
repl = ClawcodexREPL(provider_name="glm")
# Initially False
assert repl.multiline_mode is False
# Toggle to True
repl.handle_command("/multiline")
assert repl.multiline_mode is True
# Toggle back to False
repl.handle_command("/multiline")
assert repl.multiline_mode is FalseTests for porting completeness:
- Manifest: Test file and module counts
- Query Engine: Test summary generation
- CLI Commands: Test command execution
- Parity Audit: Test coverage verification
- Session Tracking: Test turn state
- Test individual functions and classes
- Mock external dependencies (API calls)
- Fast execution (< 1 second per test)
- Independent and isolated
- Test component interactions
- Use real API keys only in CI/CD (with secrets)
- Longer execution time
- May require cleanup
- Test complete workflows in the real REPL
- Currently performed manually for provider login, REPL interaction, skills, and context behavior
- Useful when validating prompt behavior or CLI UX changes
def test_<what_is_being_tested>(self):
"""Test description."""
passdef test_feature(self):
# Arrange - Set up test data
config = {"default_provider": "glm"}
# Act - Execute the code
save_config(config)
loaded = load_config()
# Assert - Verify results
assert loaded["default_provider"] == "glm"- One assertion per test (when practical)
- Use descriptive test names
- Test edge cases and error conditions
- Keep tests independent
- Use fixtures for common setup
- Mock external dependencies
@patch('src.providers.openai.OpenAI')
def test_openai_chat(self, mock_openai):
"""Test OpenAI chat with mock."""
# Arrange
mock_client = MagicMock()
mock_response = MagicMock()
mock_response.choices[0].message.content = "Response"
mock_client.chat.completions.create.return_value = mock_response
mock_openai.return_value = mock_client
# Act
provider = OpenAIProvider(api_key="test")
response = provider.chat([ChatMessage(role="user", content="Hi")])
# Assert
self.assertEqual(response.content, "Response")- Coverage changes as features evolve
- Use the commands below to generate up-to-date local reports
- Prefer focusing on critical paths rather than preserving a stale percentage in docs
- Minimum: 80%
- Target: 90%+
- Critical paths: 100%
# Generate coverage report
python -m pytest tests/ --cov=src --cov-report=term-missing
# View missing lines
python -m pytest tests/ --cov=src --cov-report=term-missing | grep "TOTAL"Tests run automatically on:
- Pull requests
- Commits to main branch
- Releases
Tests are configured in .github/workflows/ (if exists):
- name: Run tests
run: python -m pytest tests/ -q --cov=srcCommon test data is stored in fixtures:
# In test file
def setUp(self):
"""Set up test fixtures."""
self.temp_dir = tempfile.mkdtemp()
self.config_dir = Path(self.temp_dir) / ".clawcodex"
self.config_dir.mkdir(parents=True, exist_ok=True)Test sessions are created in temporary directories and cleaned up after tests.
-
Import errors: Ensure
src/is in Python pathexport PYTHONPATH="${PYTHONPATH}:$(pwd)"
-
API key errors: Tests should use mocks, not real API keys
-
Permission errors: Check file permissions in test directories
-
Slow tests: Check for network calls (should be mocked)
# Run with verbose output
python -m pytest tests/ -v -s
# Run with pdb debugger
python -m pytest tests/ --pdb
# Run specific failing test with output
python -m pytest tests/test_config.py::TestClassName::test_name -v -s# Run performance benchmarks
python -m pytest tests/ --benchmark-only- API keys are never logged
- Config files use encoded keys
- Secrets are not in git
.envis in.gitignore
When adding new features:
- Write tests first (TDD approach)
- Test edge cases
- Document test purpose
- Ensure all tests pass
- Check coverage
- Review and update tests when:
- Adding new features
- Fixing bugs
- Refactoring code
- Updating dependencies
Good testing practices ensure:
- Code reliability
- Regression prevention
- Documentation of behavior
- Confidence in refactoring
- Better code design
Run tests before every commit!
source .venv/bin/activate
python -m pytest tests/ -q