Deep Learning ResearchResearch Concluded

Retinal Screening DL

Medical Imaging Classification via MobileNetV3

The Problem

Ophthalmologists in rural areas require rapid, interpretable second opinions for retinal diseases like Diabetic Macular Edema (DME) and CNV, but lack access to experts.

Why it matters

Early detection of retinal abnormalities prevents permanent blindness. Automated edge-deployable screening democratizes access.

Who is affected

Rural patients, remote diagnostic clinics, and overburdened ophthalmologists.

Architecture & System Design

Frontend

N/A (Jupyter Notebook / Python Scripts for Research environment)

ML Pipeline

End-to-end PyTorch pipeline: Local data generators -> Augmentation -> Mixed Precision (FP16) Forward Pass -> Grad-CAM interpretability overlay.

Architectural Reasoning

Chose MobileNetV3Large over heavy models like ResNet152 specifically to support future Edge-device deployment without massive VRAM.

Alternatives Considered

Tested ResNet50 and EfficientNet. EfficientNet had marginally better AUC but 3x the inference latency, violating edge-compute constraints.

ML & Technical Deep Dive

Model Selection & Training

Core Architecture

MobileNetV3Large (ImageNet Pretrained)

Training Methodology

Two-stage transfer learning: Frozen feature extractor (10 epochs) followed by unfrozen end-to-end fine-tuning at an aggressively lower learning rate (1e-5).

Dataset

84,000+ Optical Coherence Tomography (OCT) images (Kermany et al.) split into 4 classes: CNV, DME, Drusen, Normal.

Preprocessing Pipeline

.01Bicubic interpolation resizing to 224x224
.02Stochastic data augmentation (flips, rotations, color jittering)
.03Z-score normalization

Evaluation Metrics

96.4%

Categorical Accuracy

98.1%

Precision (CNV class)

12ms/image

Inference Time (T4 GPU)

Technical Challenges

Problem: Severe Class Imbalance

Solution: Implemented weighted Cross-Entropy loss heavily penalizing misclassification on the minority disease classes.

Problem: Doctor 'Black-Box' Mistrust

Solution: Engineered Grad-CAM overlays mapping model activations back to the original image, visually pointing to the fluid/lesions the model detected.

Core Features

4-Class Classification

Efficiently distinguishes between Normal, Choroidal Neovascularization, Diabetic Macular Edema, and Drusen.

Grad-CAM Heatmaps

Generates interpretable visual heatmaps highlighting the exact retinal topology causing the prediction.

Edge-Ready Optimization

Mixed precision training and native lightweight convolutions make the model directly exportable to ONNX for mobile.

Results & Impact

Transitioned from a computationally heavy proof-of-concept to an optimized, interpretable, clincal-grade pipeline.

Proves viability for low-cost, automated medical screening in regions lacking immediate specialized care.

AUC of 0.98+ across all disease states

40% VRAM reduction during FP16 training

Takeaways & Learnings

What I Learned

Mastered PyTorch optimization loops. Learned how crucial Explainable AI (XAI) is in the medical domain; accuracy alone does not drive adoption.

Trade-Offs Made

MobileNet inherently trades a fraction of a percent of accuracy for massive gains in latency and compute efficiency.

Future Improvements

Incorporate Vision Transformers (ViT) to see if global attention mechanisms outperform spatial convolutions on this specific OCT texture.

Tech Stack Foundation

ML / AI

PyTorch
MobileNetV3
Transfer Learning
Grad-CAM

Tools

Jupyter
Matplotlib
scikit-learn

Data

NumPy
Pandas
OpenCV

Deployment

ONNX

Interested in this architecture?

Let's talk about how I can build something similar for your team.

All Projects Get in Touch