Automated Performance Bug Report Classification

Bug tracking systems receive thousands of issue reports, the vast majority of which are not performance-related. Manually triaging these to identify performance bugs — memory leaks, latency regressions, CPU/GPU bottlenecks — is time-consuming and requires domain expertise.

This project trains and evaluates four classifiers on GitHub issue reports from five open-source deep learning frameworks, using a domain-aware text feature pipeline combining TF-IDF, character n-grams, title-specific features, and a hand-crafted performance keyword lexicon.

Datasets: PyTorch · TensorFlow · Keras · Apache MXNet · Caffe (3,712 reports total)

Getting Started

Python version 3.13 strictly required.

Clone the Repository

git clone https://github.com/Ayush272002/Automated-Performance-Bug-Report-Classification.git
cd Automated-Performance-Bug-Report-Classification

Install Dependencies

Using uv (recommended):

pip3 install uv
uv sync

Using pip:

python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip3 install -r requirements.txt

Usage

# Run all models on all 5 projects (30 repeats)
uv run main.py

# Run a single model alongside the baseline
uv run main.py logreg
uv run main.py linearsvc
uv run main.py cnn_w2v

# Run baseline only
uv run main.py baseline

Replace uv run with python3 if not using uv.

Results are printed to the terminal and saved to results/run_<model>.log.

Results

All three proposed models significantly outperform the Gaussian Naive Bayes baseline (p < 0.001, Â₁₂ = 1.0). Logistic Regression and LinearSVC are statistically equivalent to each other, confirming the feature pipeline is the primary driver of performance.

Project	Baseline F1	LogReg F1	LinearSVC F1	CNN F1
PyTorch	0.5624	0.8197	0.7805	0.7304
TensorFlow	0.5388	0.8672	0.8596	0.8293
Keras	0.5412	0.8154	0.8059	0.7643
MXNet	0.5159	0.8167	0.7805	0.6507
Caffe	0.4611	0.8000	0.7751	0.5330

Full results including Accuracy, Precision, Recall, AUC and statistical tests are in results/run_all.log.

Project Structure

├── main.py                  # Entry point — CLI, logging, statistical tests
├── runner.py                # Shared experiment loop and metrics
├── br_classification.py     # Baseline: Gaussian Naive Bayes + TF-IDF
├── logreg.py                # Logistic Regression
├── linearsvc.py             # LinearSVC with Platt scaling
├── cnn_w2v.py               # CNN with Word2Vec embeddings
├── datasets/                # CSV datasets (one per project)
├── results/                 # Output logs and CSVs
├── report/                  # LaTeX report
├── manual/                  # User manual
└── replication/             # Replication instructions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Performance Bug Report Classification

Getting Started

Clone the Repository

Install Dependencies

Usage

Results

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
datasets		datasets
manual		manual
replication		replication
report		report
requirements		requirements
results		results
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
br_classification.py		br_classification.py
cnn_w2v.py		cnn_w2v.py
linearsvc.py		linearsvc.py
logreg.py		logreg.py
main.py		main.py
manual.pdf		manual.pdf
pyproject.toml		pyproject.toml
replication.pdf		replication.pdf
report.pdf		report.pdf
requirements.pdf		requirements.pdf
requirements.txt		requirements.txt
runner.py		runner.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Automated Performance Bug Report Classification

Getting Started

Clone the Repository

Install Dependencies

Usage

Results

Project Structure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages