Quantization documentation updated

[jacinto-ai/pytorch-jacinto-ai-devkit.git] / docs / Quantization.md
diff --git a/docs/Quantization.md b/docs/Quantization.md

index f999b5d124688952642d2fcf52f21f0927c7354e..7e03c52601d900a8a9b9a9af975ebf9ea7ab3c1a 100644 (file)
--- a/docs/Quantization.md
+++ b/docs/Quantization.md
@@ -33,12 +33,17 @@ To get best accuracy at the quantization stage, it is important that the model i
  - However, if a function does not change the range of feature map, it is not critical to use it in Module form. An example of this is torch.nn.functional.interpolate<br>
  - **Multi-GPU training/validation with DataParallel** is supported with our QAT module QuantTrainModule and Test module QuantTestModule. This takes care of a major concern that was earlier there in doing QAT with QuantTrainModule. (However it is not supported for QuantCalibrateModule - calibration take much less time - so hopefully this is not a big issue. In our example training scripts train_classification.py and train_pixel2pixel.py in pytorch_jacinto_ai/engine, we do not wrap the model in DataParallel if the model is QuantCalibrateModule, but we do that for QuantTrainModule and QuantTestModule).<br>
  - If your training/calibration crashes because of insufficient GPU memory, reduce the batch size and try again.
-- If you are using TIDL to infer a model trained using QAT (or calibratied using PTQ) tools provided in this repository, please set the following in the import config file for best accuracy: **quantizationStyle = 3** to use power of 2 quantization. **foldPreBnConv2D = 0** to avoid a slight accuracy degradation due to incorrect folding of BatchNormalization that comes before Convolution (input mean/scale is implemented in TIDL as a PreBN - so this affects most networks).
+- If you are using TIDL to infer a model trained using QAT (or Calibrated model using the PTQ Calibration that is simulated here) tools provided in this repository, please set the following in the import config file of TIDL for best accuracy: <br>
+  **quantizationStyle = 3** to use power of 2 quantization. <br> 
+  **foldPreBnConv2D = 0** to avoid a slight accuracy degradation due to incorrect folding of BatchNormalization that comes before Convolution (input mean/scale is implemented in TIDL as a PreBN - so this affects most networks). <br> 
+  **calibrationOption = 0** to avoid further Calibration in TIDL. <br>
  
  ## Post Training Calibration For Quantization (PTQ a.k.a. Calibration)
-**Note: this is not our recommended method in PyTorch.**<br>
-Post Training Calibration or simply Calibration is a method to reduce the accuracy loss with quantization. This is an approximate method and does not require ground truth or back-propagation - hence it is suitable for implementation in an Import/Calibration tool. We have simulated this in PyTorch and can be used as fast method to improve the accuracy with Quantization. If you are interested, you can take a look at the [documentation of Calibration here](Calibration.md).<br>
-However, in a training frame work such as PyTorch, it is possible to get better accuracy with Quantization Aware Training and we recommend to use that (next section).
+- **Note: this is not our recommended method in PyTorch.**<br>
+- Post Training Calibration or simply Calibration is a method to reduce the accuracy loss with quantization. This is an approximate method and does not require ground truth or back-propagation - hence it is suitable for implementation in an Import/Calibration tool. 
+- For example, PTQ with Advanced Calibration can be enabled in TIDL by setting **calibrationOption = 7**. Please consult the TIDL documentation for further explanation fo this option.
+- We have simulated PTQ with Advanced Calibration in PyTorch. If you are interested, you can take a look at the [documentation of Calibration here](Calibration.md).<br>
+- However, in a training frame work such as PyTorch, it is possible to get better accuracy with Quantization Aware Training (QAT) and we recommend to use that (next section).
  
  ## Quantization Aware Training (QAT)
  Quantization Aware Training (QAT) is easy to incorporate into an existing PyTorch training code. We provide a wrapper module called QuantTrainModule to automate all the tasks required for QAT. The user simply needs to wrap his model in QuantTrainModule and do the training.