When Less is More: Surprising Gains from Label-Aware Quantization

Halıcıoğlu Data Science Institute, UC San Diego

Introduction

What is Label-Aware Quantization (LAQ)?

Post-training quantization (PTQ) techniques typically use data from the same distribution as the training data to achieve high performance under memory constraints. LAQ, on the other hand, leverages data from different distributions.

Why LAQ?

Greedy Path-Following Quantization (GPFQ):

Research Question: Does LAQ boost CNN performance on subset classification tasks?

Dataset

CIFAR-100

Dataset Visualization
Figure 1: Visualizing Class Centers

Methods

Subset Generation (From Section 3):

  1. Feature Extraction (FE): Flattened output of convolutional layers from a pre-trained ResNet-50 (2048 dimensions).
  2. Dimensionality Reduction (DR): UMAP preserves cluster structure & location (2 dimensions).
  3. Inter-Class Distance (ICD): KL divergence (Gaussian closed-form) for every unique pair of classes.
  4. Subset Selection: Selecting 10 classes greedily based on inter-class similarity for spread along the x-axis:
    • Similar Classes: Low Median Distance
    • Dissimilar Classes: High Median Distance
    • Random Classes: Intermediate Median Distance
Data Processing
Figure 2: Data Preprocessing & Generation

Model Variations (From Section 4):

  • Original: Uses pre-trained weights.
  • Quant: Quantized using the training split of the subset.
  • Fine-Tuned (FT): Fine-tuned using the training split of the subset.
  • FT + Quant: Fine-tuned first, then quantized.
Workflow
Figure 3: Experimental Setup

Results

Evaluation Method: All

Subset Classes

Conclusion

Our study demonstrates that label-aware quantization effectiveness is strongly influenced by inter-class similarity, with higher accuracy maintained when quantizing for dissimilar class subsets across all tested architectures. While ResNet50, VGG16, and GoogleNet demonstrated strong resilience to 4-bit precision reduction, MobileNetV2 exhibited significantly greater sensitivity. This shows how fundamental architecture design significantly influences quantization effectiveness. We observed that fine-tuning provides greater benefits for similar-class tasks, whereas direct label-aware quantization performs best with distinct classes. These findings offer practical solutions for resource-constrained environments: developers can apply label-aware quantization directly for tasks involving distinct objects, while fine-tuning before quantization would be better suited for similar-class tasks. This approach enables significant model compression while maintaining high accuracy for specific subtasks, emphasizing the viability of neural network quantization in resource-constrained edge devices.