quantization docs update

author Manu Mathew <a0393608@ti.com>

Fri, 1 May 2020 15:40:35 +0000 (21:10 +0530)

committer Manu Mathew <a0393608@ti.com>

Fri, 1 May 2020 15:40:49 +0000 (21:10 +0530)
author Manu Mathew <a0393608@ti.com>
Fri, 1 May 2020 15:40:35 +0000 (21:10 +0530)
committer Manu Mathew <a0393608@ti.com>
Fri, 1 May 2020 15:40:49 +0000 (21:10 +0530)
diff --git a/docs/Quantization.md b/docs/Quantization.md

index 4b10cfc69d4cafa5ecf60b138a89a625c0b688d8..2198c2032cb3704c099cfd5b1812dbe207fc1e35 100644 (file)
--- a/docs/Quantization.md
+++ b/docs/Quantization.md
@@ -24,13 +24,16 @@ To get best accuracy at the quantization stage, it is important that the model i
  - Ensure that the Convolution layers in the network have Batch Normalization layers immediately after that. The only exception allowed to this rule is for the very last Convolution layer in the network (for example the prediction layer in a segmentation network or detection network, where adding Batch normalization might hurt the floating point accuracy).<br>
  
  ## Implementation Notes, Limitations & Recommendations
  - Ensure that the Convolution layers in the network have Batch Normalization layers immediately after that. The only exception allowed to this rule is for the very last Convolution layer in the network (for example the prediction layer in a segmentation network or detection network, where adding Batch normalization might hurt the floating point accuracy).<br>
  
  ## Implementation Notes, Limitations & Recommendations
-- **Please read carefully** - closely following these recommendations can save hours & hours of debug related to quantization accuracy issues.
+- **Please read carefully** - closely following these recommendations can save hours or days of debug related to quantization accuracy issues.
+- **Use Modules instead of functions** (by Module we mean classes derived from torch.nn.Module). We make use of Modules heavily in our quantization tools - in order to do range collection, in order to merge Convolution/BN/ReLU in order to decide whether to quantize a certain tensor and so on. For example use torch.nn.ReLU instead of torch.nn.functional.relu(), torch.nn.AdaptiveAvgPool2d() instead of torch.nn.functional.adaptive_avg_pool2d(), torch.nn.Flatten() instead of torch.nn.functional.flatten() etc.<br>
+- **The same module should not be re-used multiple times within the module** in order that the feature map range estimation is correct. Unfortunately, in the torchvision ResNet models, the ReLU module in the BasicBlock and BottleneckBlock are re-used multiple times. We have corrected this by defining separate ReLU modules. This change is minor and **does not** affect the loading of existing pretrained weights. See the [our modified ResNet model definition here](./modules/pytorch_jacinto_ai/vision/models/resnet.py).<br>
+- If you have done QAT and is getting poor accuracy either in the Python code or during inference in the platform, please inspect your model carefully to see if the above recommendations have been followed - some of these can be easily missed by oversight - and can result in painful debugging that could have been avoided.<br>
+- However, if a function does not change the range of feature map, it is not critical to use it in Module form. An example of this is torch.nn.functional.interpolate<br>
  - **Multi-GPU training/calibration/validation with DataParallel is not yet working with our quantization modules** QuantTrainModule/QuantCalibrateModule/QuantTestModule. We recommend not to wrap the modules in DataParallel if you are training/calibrating/testing with quantization - i.e. if your model is wrapped in QuantTrainModule/QuantCalibrateModule/QuantTestModule.<br>
  - If you get an error during training related to weights and input not being in the same GPU, please check and ensure that you are not using DataParallel with QuantTrainModule/QuantCalibrateModule/QuantTestModule. This may not be such a problem as calibration and quantization may not take as much time as the original floating point training. The original floating point training (without quantization) can use Multi-GPU as usual and we do not have any restrictions on that.<br>
  - If your calibration/training crashes with insufficient GPU memory, reduce the batch size and try again.
  - **Multi-GPU training/calibration/validation with DataParallel is not yet working with our quantization modules** QuantTrainModule/QuantCalibrateModule/QuantTestModule. We recommend not to wrap the modules in DataParallel if you are training/calibrating/testing with quantization - i.e. if your model is wrapped in QuantTrainModule/QuantCalibrateModule/QuantTestModule.<br>
  - If you get an error during training related to weights and input not being in the same GPU, please check and ensure that you are not using DataParallel with QuantTrainModule/QuantCalibrateModule/QuantTestModule. This may not be such a problem as calibration and quantization may not take as much time as the original floating point training. The original floating point training (without quantization) can use Multi-GPU as usual and we do not have any restrictions on that.<br>
  - If your calibration/training crashes with insufficient GPU memory, reduce the batch size and try again.
-- **The same module should not be re-used multiple times within the module** in order that the activation range estimation is correct. Unfortunately, in the torchvision ResNet models, the ReLU module in the BasicBlock and BottleneckBlock are re-used multiple times. We have corrected this by defining separate ReLU modules. This change is minor and **does not** affect the loading of existing pretrained weights. See the [our modified ResNet model definition here](./modules/pytorch_jacinto_ai/vision/models/resnet.py).<br>
-- **Use Modules instead of functions** (we make use of modules to decide whether to do activation range clipping or not). For example use torch.nn.reLU instead of torch.nn.functional.relu(), torch.nn.AdaptiveAvgPool2d() instead of torch.nn.functional.adaptive_avg_pool2d(), torch.nn.Flatten() instead of torch.nn.functional.flatten() etc. If you are using functions in your model and is giving poor quantized accuracy, then consider replacing those functions by the corresponding modules.<br>
  - If you are using TIDL to infer a model trained using QAT (or calibratied using PTQ) tools provided in this repository, please set **quantizationStyle = 3** in TIDL import config to use power of 2 quantization.
  - If you are using TIDL to infer a model trained using QAT (or calibratied using PTQ) tools provided in this repository, please set **quantizationStyle = 3** in TIDL import config to use power of 2 quantization.
+- We have provided several useful functions and Modules as part of the xnn python module in this repository. Most notable ones are: [xnn.layers.resize_with, xnn.layers.ResizeWith](../modules/pytorch_jacinto_ai/xnn/resize_blocks.py) to export a clean resize/interpolate/upsamle graph, [xnn.layers.AddBlock, xnn.layers.CatBlock](../modules/pytorch_jacinto_ai/xnn/common_blocks.py) to do elementwise addition & concatenation in a torch.nn.Module form.
  
  ## Post Training Calibration For Quantization (PTQ a.k.a. Calibration)
  **Note: this is not our recommended method in PyTorch.**<br>
  
  ## Post Training Calibration For Quantization (PTQ a.k.a. Calibration)
  **Note: this is not our recommended method in PyTorch.**<br>
author	Manu Mathew <a0393608@ti.com>
	Fri, 1 May 2020 15:40:35 +0000 (21:10 +0530)
committer	Manu Mathew <a0393608@ti.com>
	Fri, 1 May 2020 15:40:49 +0000 (21:10 +0530)