An end-to-end, production-ready computer vision and software architecture designed to solve the domain shift problem in automated white blood cell (WBC) classification.
While vanilla deep learning models severely degrade when exposed to out-of-distribution (OOD) data arising from different microscopy hardware and staining protocols, this system achieves a 98.53% in-distribution accuracy while simultaneously demonstrating extreme resilience to domain shift, reaching 89.05% OOD accuracy (a +32.09 percentage point boost over the unadapted baseline) via a completely retraining-free inference adaptation pipeline.
πΉπ· TΓΌrkΓ§e Β |Β π¬π§ English
- Medical Enhanced Filter (MEF): A deterministic, 5-step image preprocessing pipeline that normalizes cross-device brightness, exposure, and color variability at the pixel level before feature extraction.
-
WBCAttention & MedSwish: A sequential, parameter-efficient CBAM-style attention block (132K params) combined with a custom activation function utilizing learnable parameters (
$\alpha, \beta$ ) to suppress the "Dying ReLU" effect on fine morphological chromatin details. -
Dynamic Training-Time XAI Guardrail: Features
XAIFocusMonitor, a custom Keras callback that actively calculates Grad-CAM foreground focus ratios during training, automatically stopping execution if the model attempts to exploit spurious background correlations (shortcut learning). - Closed-Loop Remediation Interface: Implements an agentic inference head powered by an autonomous multi-modal LLM (GPT-4o with a localized Gemini 2.5 Flash fallback) that interprets Grad-CAM heatmaps post-hoc, validating model focus against hematological criteria and dynamically triggering stain normalization if background focus is detected.
Evaluated on Giemsa-stained peripheral blood smear images across hardware splits (Professional Laboratory Camera vs. Consumer Smartphone).
| Evaluation Set | Target Distribution | n | Base Accuracy | Proposed Pipeline Accuracy | Weighted F1 |
|---|---|---|---|---|---|
| TestA | In-Distribution (IND) | 4,339 | 97.46% | 98.53% | 0.9854 |
| TestB | Out-of-Distribution (OOD) | 2,119 | 56.96% | 89.05% | 0.9111 |
| Combined | Joint Evaluation | 6,458 | 84.17% | 95.42% | 0.9554 |
Note: TestB captures severe hardware-induced domain shift (contains Lymphocyte and Neutrophil classes collected from unseen acquisition devices).
Extreme class imbalances (e.g., Basophil rarity) are managed natively via class-weighted WBCFocalLoss:
| Leukocyte Subtype | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Basophil | 1.0000 | 1.0000 | 1.0000 | 89 |
| Eosinophil | 0.9265 | 0.9783 | 0.9517 | 322 |
| Lymphocyte | 0.9865 | 0.9884 | 0.9874 | 1,034 |
| Monocyte | 0.9372 | 0.9573 | 0.9471 | 234 |
| Neutrophil | 0.9962 | 0.9868 | 0.9915 | 2,660 |
All models were trained under identical configurations to benchmark the footprint vs. performance trade-offs.
| Core Architecture Backbone | Total Parameters | Validation Accuracy | Macro F1 | Inference Latency |
|---|---|---|---|---|
| VGG16 | 15.11 M | 98.56% | 0.9724 | 18.1 ms |
| ResNet50V2 | 24.75 M | 98.17% | 0.9704 | 103.9 ms |
| MobileNetV2 | 3.05 M | 97.90% | 0.9577 | 96.0 ms |
| EfficientNetB0 | 4.84 M | 97.05% | 0.9418 | 185.4 ms |
| DenseNet121 (Vanilla) | 7.70 M | 98.89% | 0.9803 | 232.2 ms |
| Proposed (DenseNet121 + WBCAttention + MedSwish) | 7.83 M | 98.53% | 0.9853 | 14.2 ms |
- Percentile Clipping: Standardizes luminance by stretching the 2ndβ98th percentile per channel to suppress microscope exposure variations.
-
Dual-Scale LAB CLAHE: Applies Local Contrast Enhancement exclusively to the L-channel using fused tile configurations (
$4\times4$ for nuclear chromatin,$8\times8$ for cytoplasmic boundaries) via Canny edge-weighted masks to block hue shifts. -
Bilateral Filtering: Cleans microscopy shot noise (
$d=9, \sigma_c=65, \sigma_s=65$ ) while preserving membrane boundaries. -
Morphological Nucleus Highlighting: Computes blended inner (
$k_{3\times3}$ ) and outer ($k_{7\times7}$ ) elliptical gradients to explicitly amplify nuclear lobation structures. - Selective LoG Sharpening: Applies localized Laplacian-of-Gaussian sharpening exclusively to edge boundaries, leaving flat backgrounds intact.
The data demonstrates that aggressive structural tampering without calibration degrades cytoplasm-dependent subtypes:
| Preprocessing Variant Configuration | TestA (IND) | TestB (OOD) | Combined |
|---|---|---|---|
| v1 β MEF Original (Proposed Configuration) | 98.41% | 85.65% | 94.22% |
| v2 β Adaptive CLAHE TileGrid ( |
97.99% | 87.92% | 94.69% |
| v3 β v2 + Top-Hat / Bottom-Hat Morphology | 95.18% | 77.58% | 89.41% |
| v4 β v3 + Macenko Stain Normalization (Uncalibrated) | 57.78% | 42.28% | 52.69% |
wbc-final/ βββ app.py # Production Flask API + Multi-Modal Agent Orchestration βββ train_main_model.py # Two-Phase Curriculum Training + Online XAI Monitoring βββ train_baseline_comparison.py # Comparative Benchmarking Engine for Cross-Backbones βββ eval_final.py # Evaluation Wrapper (TTA + Binary Routing + Reinhard) βββ eval_baseline.py # Baseline Backbone Isolation Validation Engine βββ preprint/ # Academic Publication Artifacts β βββ wbc_preprint.pdf # Compiled arXiv preprint (Full Paper) β βββ main.tex # LaTeX source code for the manuscript β βββ references.bib # BibTeX citation library βββ src/ β βββ custom_layers.py # Tensor Definitions for WBCAttentionBlock & MedSwish β βββ custom_losses.py # Class-Weighted WBCFocalLoss Matrix Definitions β βββ preprocessing.py # Operational Implementations of MEF (v1βv4) βββ data/ β βββ models/ # Local Storage Bin for Production Weights β βββ raabin-wbc-data/ # Structural Directory for Train/TestA/TestB Partitions βββ outputs/ # Runtime Target Directory for Classification Matrices & Reports
Clone the repository and install Python dependencies:
git clone https://github.com/frissonitte/wbc-analyzer-final.git
cd wbc-analyzer-final
pip install -r requirements.txtDownload the production model file wbc_final_model_densenet.keras and place it under:
data/models/wbc_final_model_densenet.keras
(The repository includes a data/models/ folder where production weights are expected.)
Create a .env file in the project root to store API tokens used by the multi-modal agent layers. Example:
GITHUB_TOKEN=your_github_models_token
GEMINI_API_KEY=your_gemini_api_key
Keep this file out of version control (add to .gitignore) for security.
Start the Flask production engine:
python app.pyThe server will start on http://localhost:5000 by default. You can POST microscopy images to the /predict endpoint to receive class predictions, Grad-CAM overlays and LLM-based analytical reports.
Note for Windows developers: for native GPU acceleration run scripts in WSL2 with CUDA Toolkit configured.
Run the final evaluation (inference-time adaptation stack: Reinhard color normalization + binary routing + light TTA):
python eval_final.py \
--model-path data/models/wbc_final_model_densenet.keras \
--data-root data/raabin-wbc-data \
--output-dir outputs/final_model_results \
--testb-binary-mode main \
--tta light \
--color-normalization reinhard \
--preprocessing v1Train the two-phase curriculum from scratch:
python train_main_model.py \
--data-root data/raabin-wbc-data \
--phase1-epochs 15 \
--phase2-epochs 15 \
--main-loss cce \
--label-smoothing 0.1 \
--crop-prob 0.2 \
--bg-randomization-prob 0.15 \
--stain-jitter-prob 0.3 \
--aux-loss-weight 1.0 \
--xai-focus-threshold 0.55 \
--xai-every-n-epochs 2 \
--model-path data/models/wbc_final_model_densenet.kerasRequest (multipart/form-data):
POST /predict
Form field:
fileβ binary stream of the microscopy image (JPG, PNG, BMP, TIFF, WebP accepted)
Successful JSON response (200 OK) example:
{
"class": "Neutrophil",
"confidence": 0.977,
"all_probabilities": {
"Basophil": 0.001,
"Eosinophil": 0.002,
"Lymphocyte": 0.012,
"Monocyte": 0.008,
"Neutrophil": 0.977
},
"gradcam_image": "data:image/png;base64,iVBORw0KGgo...",
"llm_report": "Grad-CAM confirmation report: Model focus heavily localized on primary nuclear lobation patterns and fine violet cytoplasmic granulation. Zero background shortcuts detected."
}If you use this work, cite:
@article{yildirim2026wbc,
title={Achieving Robust Out-of-Distribution Generalization in Peripheral Blood Smears via Custom Attention Mechanisms, Medical Enhanced Filtering, and Inference-Time Domain Adaptation},
author={Yildirim, Emirhan},
publisher={ResearchGate},
doi={10.13140/RG.2.2.34201.79208},
url={https://doi.org/10.13140/RG.2.2.34201.79208},
year={2026}
}