What is Label-Aware Quantization (LAQ)?
Post-training quantization (PTQ) techniques typically use data from the same distribution as the training data to achieve high performance under memory constraints. LAQ, on the other hand, leverages data from different distributions.
Why LAQ?
Greedy Path-Following Quantization (GPFQ):
Research Question: Does LAQ boost CNN performance on subset classification tasks?
CIFAR-100
Subset Generation (From Section 3):
Model Variations (From Section 4):
Our study demonstrates that label-aware quantization effectiveness is strongly influenced by inter-class similarity, with higher accuracy maintained when quantizing for dissimilar class subsets across all tested architectures. While ResNet50, VGG16, and GoogleNet demonstrated strong resilience to 4-bit precision reduction, MobileNetV2 exhibited significantly greater sensitivity. This shows how fundamental architecture design significantly influences quantization effectiveness. We observed that fine-tuning provides greater benefits for similar-class tasks, whereas direct label-aware quantization performs best with distinct classes. These findings offer practical solutions for resource-constrained environments: developers can apply label-aware quantization directly for tasks involving distinct objects, while fine-tuning before quantization would be better suited for similar-class tasks. This approach enables significant model compression while maintaining high accuracy for specific subtasks, emphasizing the viability of neural network quantization in resource-constrained edge devices.