Bug Classifier

A machine learning project that classifies GitHub issue reports from popular ML frameworks as performance bugs (class 1) or non-performance bugs (class 0).

Dataset

Issues were collected from five open-source ML frameworks:

Project	Total	Performance	Non-Performance
TensorFlow	1,490	279	1,211
PyTorch	752	95	657
Keras	668	135	533
Apache MXNet	516	65	451
Caffe	286	33	253
Total	3,712	607	3,105

The raw data lives in data/. Each CSV contains issue metadata (title, body, labels, comments, code snippets, etc.) plus a class column (0/1). The processed training-ready file is data/final_dataset.csv — two columns: report_text and class.

Notebooks

Run the notebooks in this order:

Notebook	Purpose
`notebooks/exploration.ipynb`	Initial EDA — class distributions, text/word length, label analysis
`notebooks/relevance_checks.ipynb`	Data quality and relevance checks
`notebooks/data_prep.ipynb`	Combines all project CSVs, cleans text, outputs `final_dataset.csv`
`notebooks/baseline_model_training.ipynb`	Naive Bayes + TF-IDF baseline (two configs)
`notebooks/other_model_experiments.ipynb`	SVM and Logistic Regression with TF-IDF and sentence embeddings
`notebooks/statistical_tests.ipynb`	Wilcoxon signed-rank tests comparing all models

Models

Trained models are saved as .pkl files in notebooks/models/:

File	Description
`baseline_default.pkl`	Naive Bayes, course-provided config (TF-IDF, 1k features, ROC AUC scoring)
`baseline_self.pkl`	Naive Bayes, custom config (TF-IDF, 18k features, F1-macro scoring)
`svm.pkl`	LinearSVC + TF-IDF
`logistic_regression.pkl`	Logistic Regression + TF-IDF
`svm_st.pkl`	LinearSVC + `all-MiniLM-L6-v2` sentence embeddings
`svm_st_balanced.pkl`	LinearSVC (class-balanced) + MiniLM embeddings
`svm_st_balanced_enhanced.pkl`	RBF SVC (class-balanced) + `all-mpnet-base-v2` embeddings

Results

All models evaluated with 5×5 repeated stratified cross-validation, scored on F1 (macro):

Model	Mean F1
NB + TF-IDF (baseline course)	0.443
NB + TF-IDF (reimplemented)	0.429
LR + TF-IDF	0.661
SVM + TF-IDF	0.653
SVM + MiniLM	0.752
SVM + MiniLM (balanced)	0.755
SVM + MiniLM (enhanced)	0.779
SVM + MPNet (best)	0.797

Wilcoxon signed-rank tests confirm all sentence-transformer models are statistically significantly better (p < 0.05) than both baselines.

Setup

Requires Python 3.11+.

pip install pipenv
pipenv install
pipenv shell
jupyter notebook

Dependencies: numpy, pandas, matplotlib, seaborn, scikit-learn, sentence-transformers, ipykernel.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
notebooks		notebooks
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
manual.pdf		manual.pdf
replication.pdf		replication.pdf
report.pdf		report.pdf
requirements.pdf		requirements.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bug Classifier

Dataset

Notebooks

Models

Results

Setup

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bug Classifier

Dataset

Notebooks

Models

Results

Setup

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages