Model Architecture: Building a TFLite-Ready MobileNetV3 Classifier
- Anie Etor-Udofia
- Apr 1
- 2 min read
Why MobileNetV3?
NOMA AI needed to run on a Raspberry Pi 4 (4GB RAM, no GPU). After evaluating several architectures, MobileNetV3 emerged as the clear choice:
Depthwise Separable Convolutions: Reduce parameters by 90% compared to standard convolutions
H-Swish Activation: Optimized for edge deployment (faster than standard swish)
Compound Scaling: Automatically balances network depth, width, and resolution
TFLite Compatibility: Built with SELECT_TF_OPS disabled, ensuring smooth conversion
The Model Architecture
Here's the complete architecture I built:
text
Layer (type) Output Shape Param #
================================================================
InputLayer (None, 224, 224, 3) 0
Rescaling (to [-1, 1]) (None, 224, 224, 3) 0
MobileNetV3Small (pretrained) (None, 7, 7, 576) 939,120
GlobalAveragePooling2D (None, 576) 0
Dropout (0.3) (None, 576) 0
Dense (512, ReLU) (None, 512) 295,424
BatchNormalization (None, 512) 2,048
Dropout (0.4) (None, 512) 0
Dense (256, ReLU) (None, 256) 131,328
Dropout (0.3) (None, 256) 0
Dense (24, Softmax) (None, 24) 6,168
================================================================
Total params: 1,374,088 (5.24 MB)
Trainable params: 433,944 (1.66 MB)
Non-trainable params: 940,144 (3.59 MB)The Classification Head: Designed for Accuracy
The MobileNetV3 base extracts features, but the classification head transforms them into predictions:
Global Average Pooling: Reduces the 7×7×576 feature map to a single 576-dimensional vector—preserves spatial information better than flattening
Dropout (0.3): Initial regularization to prevent overfitting
Dense (512) + BatchNorm: Learns complex patterns with batch normalization for training stability
Dropout (0.4): Aggressive regularization—the model must generalize, not memorize
Dense (256): Further feature refinement
Dropout (0.3): Final regularization before classification
Softmax (24): Multi-class probability distribution
Three-Stage Training Strategy
Stage 1: Classifier Training (30 epochs, LR=0.001)
Base MobileNetV3 layers frozen
Only custom head trained
Purpose: Initialize the classification layers without disturbing pre-trained features
Stage 2: Fine-Tuning (40 epochs, LR=0.0001)
Last 40 layers of MobileNetV3 unfrozen
Reduced learning rate to prevent catastrophic forgetting
Purpose: Adapt ImageNet features to skin lesion patterns
Stage 3: Full Training (30 epochs, LR=0.00001, conditional)
Only activated if validation accuracy < 75%
Full model unfreezing with very low learning rate
Purpose: Fine-tune entire network when needed
TFLite Optimization
The final model needed to run on the Pi 4. Key conversion steps:
python
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()Critical Note: I ensured SELECT_TF_OPS wasn't used during conversion. This would have made the model incompatible with the Pi 4's TensorFlow Lite runtime.
Performance Results
Training Accuracy: 96.88%Validation Accuracy: 61.51%
The gap (35%) indicates overfitting—the model memorized training data. This was expected given the class imbalance and limited data for rare conditions. However, the validation accuracy is still strong for a 24-class medical classification task.
Class-Level Performance Highlights
Class | Precision | Recall | F1-Score | Support |
Melanoma | 0.938 | 0.927 | 0.933 | 82 |
Normal | 0.930 | 0.930 | 0.930 | 71 |
Vitiligo | 0.860 | 0.854 | 0.857 | 137 |
Basal Cell Carcinoma | 0.857 | 0.795 | 0.825 | 83 |
Key Insight
The model excels at detecting malignant conditions (melanoma, basal cell carcinoma) and normal skin, exactly what you want in a screening tool. Lower performance on some benign conditions is acceptable since the goal is catching cancer, not perfect classification of every rash.




Comments