Thanks for your interest in contributing. This project is early-stage so things move fast.
git clone https://github.com/kai-linux/proof.git
cd proof
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,all]"pytestThis project uses ruff for linting and formatting.
ruff check .
ruff format .Tasks live in tasks/ as YAML files. See existing tasks for the schema. Each task needs:
id— unique identifier (matches filename)name— human-readable nameprompt— the prompt sent to the modelexpected— what constitutes a correct responsescoring— how to evaluate (contains, exact, regex, json_schema)
Provider integrations live in proof/runner.py. Add a new _call_<provider> async function and register it in call_model().
- Keep PRs small and focused
- Include a task YAML if adding a new benchmark scenario
- Run
ruff checkandpytestbefore submitting