Self-Adaptive Prompt Engineering (SAPE)

This repository contains the official implementation of the paper:

Self-Adaptive Prompt Engineering for Cost-Efficient Language Models: Switching between Chain-of-Thought and Direct Answer

We propose Self-Adaptive Prompt Engineering (SAPE), a simple yet effective prompting method that enables instruction-tuned language models to dynamically adjust the level of reasoning depending on input complexity. SAPE consistently reduces token usage compared to Chain-of-Thought (CoT) prompting while maintaining comparable accuracy.

🔍 Overview

Modern prompting strategies like CoT improve accuracy through step-by-step reasoning, but at the cost of longer outputs. On the other hand, DA prompts are efficient but often less accurate. SAPE bridges the two: it lets the model choose whether to reason step-by-step or answer directly.

DA Prompt: “Only write the final answer.”
CoT Prompt: “Think step-by-step, then give the final answer.”
SAPE Prompt: “Use step-by-step reasoning only if it helps. Otherwise, answer directly.”

🧪 Experiments

We evaluate SAPE across 3 instruction-tuned models:

Llama-3.2-3B-Instruct
Phi-4-mini-Instruct
Mistral-7B-Instruct-v0.3

Using 8 benchmark datasets:

GSM8K / GSM8K-Hard / MATH-500
CommonSenseQA / HellaSwag / SimpleQA
GPQA-Extended / MMLU-Pro

We measure:

Accuracy
Token length
Average metrics

📌 Key Findings

SAPE reduces average token usage to nearly one-third of CoT with minimal accuracy loss.
Models show distinct prompt-following behaviors.
- Llama aligns well with intended prompts.
- Phi ignores prompt structure.
- Mistral shows intermediate performance.
SAPE adapts well to question complexity, providing flexible and cost-effective generation.

⚠️ One limitation is that models cannot reliably identify or report their own reasoning mode. This black-box behavior warrants further investigation.

📝 Evaluation framework is adapted from Evalchemy.

💾 Repository Structure

├── LLM_eval/
│   ├── benchmarks
│   ├── run scripts
│   └── evaluation .py codes
├── LLM_metrics/
│   └── run results
│   └── visualizations
└── README.md

🚀 Getting Started

# Run experiments
bash LLM_eval/scripts/run_experiments.sh

# Visualize results
python LLM_metrics/viz.py

Requires Python 3.8+ and access to Hugging Face models or locally downloaded checkpoints.

📄 Citation

If you use this work, please cite:

@misc{kim2024sape,
  title={Self-Adaptive Prompt Engineering for Cost-Efficient Language Models},
  author={Yongjin Kim},
  year={2024},
  url={https://github.com/yjK199905/Self-Adaptive-Prompt-Engineering}
}

📬 Contact

If you have questions or feedback, feel free to open an issue or contact Yongjin Kim.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
LLM_eval/Evalchemy		LLM_eval/Evalchemy
LLM_metrics		LLM_metrics
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Adaptive Prompt Engineering (SAPE)

🔍 Overview

🧪 Experiments

📌 Key Findings

💾 Repository Structure

🚀 Getting Started

📄 Citation

📬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Self-Adaptive Prompt Engineering (SAPE)

🔍 Overview

🧪 Experiments

📌 Key Findings

💾 Repository Structure

🚀 Getting Started

📄 Citation

📬 Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages