openai · alpnix · Mar 14, 2026
diff --git a/CLI.md b/CLI.md
@@ -0,0 +1,211 @@
+# tiktoken CLI
+
+Command-line interface for counting tokens in files and directories.
+
+## Installation
+
+After installing tiktoken, the `tiktoken` command will be available:
+
+```bash
+pip install tiktoken
+```
+
+## Usage
+
+### Basic Token Counting
+
+Count tokens in a single file:
+```bash
+tiktoken count file.txt
+```
+
+Output:
+```
+42
+```
+
+### Using Specific Models
+
+Count tokens using a specific model's encoding:
+```bash
+tiktoken count --model gpt-4o document.txt
+tiktoken count --model gpt-4-turbo code.py
+```
+
+### Directory Operations
+
+Count tokens in all files in a directory:
+```bash
+tiktoken count --recursive ./src/
+```
+
+Use glob patterns to filter files:
+```bash
+tiktoken count --glob "*.py" ./project/
+tiktoken count --recursive --glob "*.md" ./docs/
+```
+
+### Output Formats
+
+#### JSON Output
+```bash
+tiktoken count --json file.txt
+```
+
+Output:
+```json
+{
+  "summary": {
+    "total_files": 1,
+    "total_tokens": 1250,
+    "total_characters": 5432,
+    "average_tokens_per_file": 1250
+  },
+  "files": [
+    {
+      "file": "file.txt",
+      "tokens": 1250,
+      "chars": 5432,
+      "lines": 85
+    }
+  ]
+}
+```
+
+#### CSV Output
+```bash
+tiktoken count --csv ./src/
+```
+
+Output:
+```csv
+file,tokens,characters,lines
+src/main.py,450,2100,65
+src/utils.py,320,1540,48
+src/config.py,180,850,28
+```
+
+#### Per-File Breakdown
+```bash
+tiktoken count --per-file ./src/
+```
+
+Output:
+```
+src/main.py: 450 tokens
+src/utils.py: 320 tokens
+src/config.py: 180 tokens
+
+Total files: 3
+Total tokens: 950
+Total characters: 4490
+Average tokens per file: 316
+```
+
+## Use Cases
+
+### Estimating Context Window Usage
+
+Check if your codebase fits in a model's context window:
+
+```bash
+# GPT-4 Turbo has 128k token context
+tiktoken count --model gpt-4-turbo --recursive ./my-project/
+
+# Output: Total tokens: 45,230
+# Result: Fits comfortably in context window
+```
+
+### Cost Estimation
+
+Estimate API costs by counting tokens:
+
+```bash
+tiktoken count --json --recursive ./documents/ > token_report.json
+# Use the token count to calculate costs based on model pricing
+```
+
+### CI/CD Integration
+
+Add token counting to your CI pipeline:
+
+```bash
+#!/bin/bash
+TOKEN_COUNT=$(tiktoken count --recursive ./src/ | grep "Total tokens" | awk '{print $3}' | tr -d ',')
+MAX_TOKENS=50000
+
+if [ $TOKEN_COUNT -gt $MAX_TOKENS ]; then
+  echo "Error: Codebase exceeds $MAX_TOKENS tokens (found: $TOKEN_COUNT)"
+  exit 1
+fi
+```
+
+### Documentation Analysis
+
+Analyze documentation token usage:
+
+```bash
+tiktoken count --recursive --glob "*.md" --per-file ./docs/ | tee docs_tokens.txt
+```
+
+## Command Reference
+
+### Arguments
+
+- `paths`: One or more files or directories to process
+
+### Options
+
+- `-m, --model MODEL`: Use encoding for specific OpenAI model (e.g., `gpt-4o`, `gpt-4-turbo`)
+- `-e, --encoding ENCODING`: Specify encoding directly (default: `o200k_base`)
+- `-r, --recursive`: Process directories recursively
+- `-g, --glob PATTERN`: Filter files using glob pattern (e.g., `"*.py"`)
+- `--json`: Output results as JSON
+- `--csv`: Output results as CSV
+- `--summary`: Show summary statistics
+- `--per-file`: Show per-file token counts
+
+## Examples
+
+### Count tokens in Python files
+```bash
+tiktoken count --glob "*.py" --recursive ./project/
+```
+
+### Generate JSON report for multiple files
+```bash
+tiktoken count --json file1.txt file2.txt file3.txt > report.json
+```
+
+### Check specific model compatibility
+```bash
+tiktoken count --model gpt-4o --summary ./codebase/
+```
+
+### Export to CSV for analysis
+```bash
+tiktoken count --csv --recursive ./src/ > tokens.csv
+```
+
+## Tips
+
+1. **Performance**: The CLI processes files quickly thanks to tiktoken's fast Rust implementation
+2. **Binary Files**: Binary files are automatically skipped
+3. **Large Directories**: Use `--glob` to filter files and speed up processing
+4. **Shell Integration**: Pipe output to other tools for further processing
+
+## Troubleshooting
+
+**Error: "No files found to process"**
+- Check your glob pattern syntax
+- Ensure files exist in the specified path
+- Use `--recursive` for subdirectories
+
+**Error: "Unknown model 'xyz'"**
+- The model name might be incorrect
+- Use `--encoding` instead to specify encoding directly
+- Check [OpenAI's model documentation](https://platform.openai.com/docs/models) for valid model names
+
+**Binary file warnings**
+- The CLI automatically skips binary files
+- This is expected behavior and can be ignored
diff --git a/setup.py b/setup.py
@@ -15,5 +15,10 @@
     ],
     package_data={"tiktoken": ["py.typed"]},
     packages=["tiktoken", "tiktoken_ext"],
+    entry_points={
+        "console_scripts": [
+            "tiktoken=tiktoken.cli:main",
+        ],
+    },
     zip_safe=False,
 )
diff --git a/tests/test_cli.py b/tests/test_cli.py
@@ -0,0 +1,119 @@
+"""
+Test suite for tiktoken CLI.
+
+Run with: pytest tests/test_cli.py
+"""
+
+import os
+import sys
+import tempfile
+from pathlib import Path
+
+# Add parent directory to path for imports
+sys.path.insert(0, str(Path(__file__).parent.parent))
+
+from tiktoken.cli import (
+    count_tokens_in_text,
+    count_tokens_in_file,
+    collect_files,
+    format_output_json,
+    format_output_csv,
+)
+
+
+def test_count_tokens_in_text():
+    """Test basic token counting."""
+    text = "Hello, world!"
+    count = count_tokens_in_text(text, "o200k_base")
+    assert count > 0
+    assert isinstance(count, int)
+
+
+def test_count_tokens_in_file():
+    """Test counting tokens in a file."""
+    with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.txt') as f:
+        f.write("This is a test file for tiktoken CLI.")
+        temp_path = f.name
+
+    try:
+        result = count_tokens_in_file(Path(temp_path), "o200k_base")
+        assert result is not None
+        assert 'tokens' in result
+        assert 'chars' in result
+        assert 'lines' in result
+        assert result['tokens'] > 0
+    finally:
+        os.unlink(temp_path)
+
+
+def test_collect_files_single_file():
+    """Test collecting a single file."""
+    with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.txt') as f:
+        temp_path = f.name
+
+    try:
+        files = collect_files([temp_path], False, None)
+        assert len(files) == 1
+        assert files[0] == Path(temp_path)
+    finally:
+        os.unlink(temp_path)
+
+
+def test_collect_files_directory():
+    """Test collecting files from a directory."""
+    with tempfile.TemporaryDirectory() as tmpdir:
+        # Create test files
+        test_dir = Path(tmpdir)
+        (test_dir / "file1.txt").write_text("content 1")
+        (test_dir / "file2.txt").write_text("content 2")
+
+        files = collect_files([tmpdir], False, None)
+        assert len(files) == 2
+
+
+def test_format_output_json():
+    """Test JSON output formatting."""
+    results = [
+        {'file': 'test.txt', 'tokens': 100, 'chars': 500, 'lines': 10}
+    ]
+
+    output = format_output_json(results)
+    assert 'summary' in output
+    assert 'total_tokens' in output
+    assert '100' in output
+
+
+def test_format_output_csv():
+    """Test CSV output formatting."""
+    results = [
+        {'file': 'test.txt', 'tokens': 100, 'chars': 500, 'lines': 10}
+    ]
+
+    output = format_output_csv(results)
+    assert 'file,tokens,characters,lines' in output
+    assert 'test.txt,100,500,10' in output
+
+
+if __name__ == '__main__':
+    # Run basic tests
+    print("Running tiktoken CLI tests...")
+
+    test_count_tokens_in_text()
+    print("✓ test_count_tokens_in_text")
+
+    test_count_tokens_in_file()
+    print("✓ test_count_tokens_in_file")
+
+    test_collect_files_single_file()
+    print("✓ test_collect_files_single_file")
+
+    test_collect_files_directory()
+    print("✓ test_collect_files_directory")
+
+    test_format_output_json()
+    print("✓ test_format_output_json")
+
+    test_format_output_csv()
+    print("✓ test_format_output_csv")
+
+    print("\n✅ All tests passed!")