This project uses a Vision Transformer (ViT) to classify images of leather samples into one of 6 defect categories. It includes a working training pipeline with PyTorch.
Place your dataset under the data/ directory in the following structure:
data/
└────── Folding marks/
├── Grain off/
├── Growth marks/
├── loose grains/
├── non defective/
└── pinhole/
Each subfolder should contain around 600 images of that defect type.
-
Clone the repository:
git clone https://github.com/chiraggarg03/leather-defect-detection cd leather-defect-detection -
Create a virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate -
Install the required packages:
pip install -r requirements.txtuse
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118for cuda acceleration -
Run the Jupyter Notebook:
jupyter notebook -
Open
notebooks/baseline_vit.ipynbin your browser and run all cells to train the model.
- Model: Vision Transformer (
vit_b_16) - Optimizer: Adam
- Loss: CrossEntropyLoss
- Accuracy: ~73% validation after 10 epochs
- The
.pthmodel weights are not committed to the repo due to size limits. - If you wish to save model checkpoints, modify the notebook to save using:
torch.save(model.state_dict(), "baseline.pth")
Dataset at https://www.kaggle.com/datasets/praveen2084/leather-defect-classification/