A machine learning project that classifies GitHub issue reports from popular ML frameworks as performance bugs (class 1) or non-performance bugs (class 0).
Issues were collected from five open-source ML frameworks:
| Project | Total | Performance | Non-Performance |
|---|---|---|---|
| TensorFlow | 1,490 | 279 | 1,211 |
| PyTorch | 752 | 95 | 657 |
| Keras | 668 | 135 | 533 |
| Apache MXNet | 516 | 65 | 451 |
| Caffe | 286 | 33 | 253 |
| Total | 3,712 | 607 | 3,105 |
The raw data lives in data/. Each CSV contains issue metadata (title, body, labels, comments, code snippets, etc.) plus a class column (0/1). The processed training-ready file is data/final_dataset.csv — two columns: report_text and class.
Run the notebooks in this order:
| Notebook | Purpose |
|---|---|
notebooks/exploration.ipynb |
Initial EDA — class distributions, text/word length, label analysis |
notebooks/relevance_checks.ipynb |
Data quality and relevance checks |
notebooks/data_prep.ipynb |
Combines all project CSVs, cleans text, outputs final_dataset.csv |
notebooks/baseline_model_training.ipynb |
Naive Bayes + TF-IDF baseline (two configs) |
notebooks/other_model_experiments.ipynb |
SVM and Logistic Regression with TF-IDF and sentence embeddings |
notebooks/statistical_tests.ipynb |
Wilcoxon signed-rank tests comparing all models |
Trained models are saved as .pkl files in notebooks/models/:
| File | Description |
|---|---|
baseline_default.pkl |
Naive Bayes, course-provided config (TF-IDF, 1k features, ROC AUC scoring) |
baseline_self.pkl |
Naive Bayes, custom config (TF-IDF, 18k features, F1-macro scoring) |
svm.pkl |
LinearSVC + TF-IDF |
logistic_regression.pkl |
Logistic Regression + TF-IDF |
svm_st.pkl |
LinearSVC + all-MiniLM-L6-v2 sentence embeddings |
svm_st_balanced.pkl |
LinearSVC (class-balanced) + MiniLM embeddings |
svm_st_balanced_enhanced.pkl |
RBF SVC (class-balanced) + all-mpnet-base-v2 embeddings |
All models evaluated with 5×5 repeated stratified cross-validation, scored on F1 (macro):
| Model | Mean F1 |
|---|---|
| NB + TF-IDF (baseline course) | 0.443 |
| NB + TF-IDF (reimplemented) | 0.429 |
| LR + TF-IDF | 0.661 |
| SVM + TF-IDF | 0.653 |
| SVM + MiniLM | 0.752 |
| SVM + MiniLM (balanced) | 0.755 |
| SVM + MiniLM (enhanced) | 0.779 |
| SVM + MPNet (best) | 0.797 |
Wilcoxon signed-rank tests confirm all sentence-transformer models are statistically significantly better (p < 0.05) than both baselines.
Requires Python 3.11+.
pip install pipenv
pipenv install
pipenv shell
jupyter notebookDependencies: numpy, pandas, matplotlib, seaborn, scikit-learn, sentence-transformers, ipykernel.