doc update

[jacinto-ai/caffe-jacinto.git] / README.md
diff --git a/README.md b/README.md

index 6b45624fe50b89264e6dbfff25a49a36bc326226..ee5eee89e5249f30594f3b2b632d9067d9794ec8 100644 (file)
--- a/README.md
+++ b/README.md
@@ -1,115 +1,120 @@
-[Caffe: Convolutional Architecture for Fast Feature Extraction](http://caffe.berkeleyvision.org)
+# Caffe-jacinto
  
-Created by [Yangqing Jia](http://daggerfs.com), UC Berkeley EECS department.
-In active development by the Berkeley Vision and Learning Center ([BVLC](http://bvlc.eecs.berkeley.edu/)).
+###### Notice: 
+- If you have not visited our landing page in github, please do so: [https://github.com/TexasInstruments/jacinto-ai-devkit](https://github.com/TexasInstruments/jacinto-ai-devkit)
+- **Issue Tracker for jacinto-ai-devkit:** You can file issues or ask questions at **e2e**: [https://e2e.ti.com/support/processors/f/791/tags/jacinto_2D00_ai_2D00_devkit](https://e2e.ti.com/support/processors/f/791/tags/jacinto_2D00_ai_2D00_devkit). While creating a new issue kindly include **jacinto-ai-devkit** in the tags (as you create a new issue, there is a space to enter tags, at the bottom of the page). 
+- **Issue Tracker for TIDL:** [https://e2e.ti.com/support/processors/f/791/tags/TIDL](https://e2e.ti.com/support/processors/f/791/tags/TIDL). Please include the tag **TIDL** (as you create a new issue, there is a space to enter tags, at the bottom of the page). 
+- If you do not get a reply within two days, please contact us at: jacinto-ai-devkit@list.ti.com
  
-## Introduction
+###### Caffe-jacinto - embedded deep learning framework
  
-Caffe aims to provide computer vision scientists with a **clean, modifiable
-implementation** of state-of-the-art deep learning algorithms. Network structure
-is easily specified in separate config files, with no mess of hard-coded
-parameters in the code. Python and Matlab wrappers are provided.
+Caffe-jacinto is a fork of [NVIDIA/caffe](https://github.com/NVIDIA/caffe), which in-turn is derived from [BVLC/Caffe](https://github.com/BVLC/caffe). The modifications in this fork enable training of sparse, quantized CNN models - resulting in low complexity models that can be used in embedded platforms. 
  
-At the same time, Caffe fits industry needs, with blazing fast C++/Cuda code for
-GPU computation. Caffe is currently the fastest GPU CNN implementation publicly
-available, and is able to process more than **40 million images per day** on a
-single NVIDIA K40 GPU (or 20 million per day on a K20)\*.
+For example, the semantic segmentation example (see below) shows how to train a model that is nearly 80% sparse (only 20% non-zero coefficients) and 8-bit quantized. This reduces the complexity of convolution layers by <b>5x</b>. An inference engine designed to efficiently take advantage of sparsity can run <b>significantly faster</b> by using such a model. 
  
-Caffe also provides **seamless switching between CPU and GPU**, which allows one
-to train models with fast GPUs and then deploy them on non-GPU clusters with one
-line of code: `Caffe::set_mode(Caffe::CPU)`.
+Care has to be taken to strike the right balance between quality and speedup. We have obtained more than 4x overall speedup for CNN inference on embedded device by applying sparsity. Since 8-bit multiplier is sufficient (instead of floating point), the speedup can be even higher on some platforms. See the section on quantization below for more details.
  
-Even in CPU mode, computing predictions on an image takes only 20 ms when images
-are processed in batch mode.
+**Important note - Support for SSD Object detection has been added. The relevant SSD layers have been ported over from the [original Caffe SSD implementation](https://github.com/weiliu89/caffe/tree/ssd).** This is probably the first time that SSD object detection is added to a fork of [NVIDIA/caffe](https://github.com/NVIDIA/caffe). This enables fast training of SSD object detection with all the additional speedup benefits that [NVIDIA/caffe](https://github.com/NVIDIA/caffe) offers. 
  
-* [Caffe introductory presentation](https://www.dropbox.com/s/10fx16yp5etb8dv/caffe-presentation.pdf)
-* [Installation instructions](http://caffe.berkeleyvision.org/installation.html)
+**Examples for training and inference (image classification, semantic segmentation and SSD object detection) are in [tidsp/caffe-jacinto-models](https://github.com/tidsp/caffe-jacinto-models).**
  
-\* When measured with the [SuperVision](http://www.image-net.org/challenges/LSVRC/2012/supervision.pdf) model that won the ImageNet Large Scale Visual Recognition Challenge 2012.
+### Installation
+* After cloning the source code, switch to the branch caffe-0.17, if it is not checked out already.
+-- *git checkout caffe-0.17*
  
-## License
+* Please see the [**installation instructions**](INSTALL.md) for installing the dependencies and building the code. 
  
-Caffe is BSD 2-Clause licensed (refer to the
-[LICENSE](http://caffe.berkeleyvision.org/license.html) for details).
+### Training procedure
+**After cloning and building this source code, please visit [tidsp/caffe-jacinto-models](https://github.com/tidsp/caffe-jacinto-models) to do the training.**
  
-The pretrained models published by the BVLC, such as the
-[Caffe reference ImageNet model](https://www.dropbox.com/s/n3jups0gr7uj0dv/caffe_reference_imagenet_model)
-are licensed for academic research / non-commercial use only. However, Caffe is
-a full toolkit for model training, so start brewing your own Caffe model today!
+### Additional Information (can be skipped)
  
-## Citing Caffe
+**SSD Object detection is supported. The relevant SSD layers have been ported over from the [original Caffe SSD implementation](https://github.com/weiliu89/caffe/tree/ssd).** Note: caffe-0.16 branch allows us to set different types (float, float16 for forward, backward and math types). However for the SSD specific layers, forward, backward and math must use the same type - this limitation can probably be overcome by spending some more time in the porting - but it doesn't look like a serious limitation.
  
-Please kindly cite Caffe in your publications if it helps your research:
+New layers and options have been added to support sparsity and quantization. A brief explanation is given in this section, but more details can be found by [clicking here](FEATURES.md). 
  
-    @misc{Jia13caffe,
-      Author = {Yangqing Jia},
-      Title = { {Caffe}: An Open Source Convolutional Architecture for Fast Feature Embedding},
-      Year  = {2013},
-      Howpublished = {\url{http://caffe.berkeleyvision.org/}}
-    }
+Note that Caffe-jacinto does not directly support any embedded/low-power device. But the models trained by it can be used for fast inference on such a device due to the sparsity and quantization.
  
-## Documentation
+###### Additional layers
+* ImageLabelData and IOUAccuracy layers have been added to train for semantic segmentation.
  
-Tutorials and general documentation are written in Markdown format in the `docs/` folder.
-While the format is quite easy to read directly, you may prefer to view the whole thing as a website.
-To do so, simply run `jekyll serve -s docs` and view the documentation website at `http://0.0.0.0:4000` (to get [jekyll](http://jekyllrb.com/), you must have ruby and do `gem install jekyll`).
+###### Sparsity
+* Sparse training methods: zeroing out of small coefficients during training, or fine tuning without updating the zero coefficients - similar to caffe-scnn [paper](https://arxiv.org/abs/1608.03665), [code](https://github.com/wenwei202/caffe/tree/scnn). It is possible to set a target sparsity and the training will try to achieve that.
+* Measuring sparsity in convolution layers while training is in progress. 
+* Thresholding tool to zero-out some convolution weights in each layer to attain certain sparsity in each layer.
  
-We strive to provide provide lots of usage examples, and to document all code in docstrings.
-We'd appreciate your contribution to this effort!
+###### Quantization
+* **Estimate the accuracy drop by simulating quantization. Note that caffe-jacinto does not actually do quantization - it only simulates the accuracy loss due to quantization - by quantizing the coefficients and activations and then converting it back to float.** And embedded implementation can use the methods used here to achieve speedup by using only integer arithmetic.
+* Variuos options are supported to control the quantization. Important features include: power of 2 quantization, non-power of 2 quantization, bitwidths, applying of offset to control bias around zero. See definition of NetQuantizationParameter for more details.
+* Dynamic -8 bit fixed point quantization, improved from Ristretto [paper](https://arxiv.org/abs/1605.06402), [code](https://github.com/pmgysel/caffe).
  
-## Development
+###### Absorbing Batch Normalization into convolution weights
+* A tool is provided to absorb batch norm values into convolution weights. This may help to speedup inference. This will also help if Batch Norm layers are not supported in an embedded implementation.
  
-Caffe is developed with active participation of the community by the [Berkeley Vision and Learning Center](http://bvlc.eecs.berkeley.edu/).
-We welcome all contributions!
+### Acknowledgements
+This repository was forked from [NVIDIA/caffe](http://www.github.com/NVIDIA/caffe) and we have added several enhancements on top of it. We acknowledge use of code from other soruces as listed below, and sincerely thank their authors. See the [LICENSE](/LICENSE) file for the COPYRIGHT and LICENSE notices.
+* [BVLC/caffe](https://github.com/BVLC/caffe) - base code.
+* [NVIDIA/caffe](http://www.github.com/NVIDIA/caffe) - base code.
+* [weiliu89/caffe/tree/ssd](https://github.com/weiliu89/caffe/tree/ssd) - Caffe SSD Object Detection source code and related scripts, which were later incorporated into [NVIDIA/caffe](http://www.github.com/NVIDIA/caffe).
+* [Ristretto](https://github.com/pmgysel/caffe) - Quantization accuracy simulation
+* [dilation](https://github.com/fyu/dilation) - Semantic Segmentation data loading layer ImageLabelListData layer (not used in the latest branch) and some parameters.
+* [MobileNet-Caffe](https://github.com/shicai/MobileNet-Caffe) - Mobilenet scripts are inspired by Mobilenet-Caffe pre-trained models and scripts.
+* [sp2823/caffe](https://github.com/sp2823/caffe/tree/convolution-depthwise), [BVCL/caffe/pull/5665](https://github.com/BVLC/caffe/pull/5665) - ConvolutionDepthwise layer for faster depthwise separable convolutions.
+<!---
+* [drnikolaev/caffe](https://github.com/drnikolaev/caffe) - experimental commits that are not yet integrated into [NVIDIA/caffe](http://www.github.com/NVIDIA/caffe).
+--->
+<br>
  
-### The release cycle
+The following sections are kept as it is from the original Caffe.
+# Caffe
  
-- The `dev` branch is for new development, including community contributions. We aim to keep it in a functional state, but large changes may occur and things may get broken every now and then. Use this if you want the "bleeding edge".
-- The `master` branch is handled by BVLC, which will integrate changes from `dev` on a roughly monthly schedule, giving it a release tag. Use this if you want more stability.
+Caffe is a deep learning framework made with expression, speed, and modularity in mind.
+It is developed by the Berkeley Vision and Learning Center ([BVLC](http://bvlc.eecs.berkeley.edu))
+and community contributors.
  
-### Setting priorities
+# NVCaffe
  
-- Make GitHub Issues for bugs, features you'd like to see, questions, etc.
-- Development work is guided by [milestones](https://github.com/BVLC/caffe/issues?milestone=1), which are sets of issues selected for concurrent release (integration from `dev` to `master`).
-- Please note that since the core developers are largely researchers, we may work on a feature in isolation from the open-source community for some time before releasing it, so as to claim honest academic contribution. We do release it as soon as a reasonable technical report may be written about the work, and we still aim to inform the community of ongoing development through Issues.
+NVIDIA Caffe ([NVIDIA Corporation &copy;2017](http://nvidia.com)) is an NVIDIA-maintained fork
+of BVLC Caffe tuned for NVIDIA GPUs, particularly in multi-GPU configurations.
+Here are the major features:
+* **16 bit (half) floating point train and inference support**.
+* **Mixed-precision support**. It allows to store and/or compute data in either 
+64, 32 or 16 bit formats. Precision can be defined for every layer (forward and 
+backward passes might be different too), or it can be set for the whole Net.
+* **Layer-wise Adaptive Rate Control (LARC) and adaptive global gradient scaler** for better
+ accuracy, especially in 16-bit training.
+* **Integration with  [cuDNN](https://developer.nvidia.com/cudnn) v7**.
+* **Automatic selection of the best cuDNN convolution algorithm**.
+* **Integration with v2.2 of [NCCL library](https://github.com/NVIDIA/nccl)**
+ for improved multi-GPU scaling.
+* **Optimized GPU memory management** for data and parameters storage, I/O buffers 
+and workspace for convolutional layers.
+* **Parallel data parser, transformer and image reader** for improved I/O performance.
+* **Parallel back propagation and gradient reduction** on multi-GPU systems.
+* **Fast solvers implementation with fused CUDA kernels for weights and history update**.
+* **Multi-GPU test phase** for even memory load across multiple GPUs.
+* **Backward compatibility with BVLC Caffe and NVCaffe 0.15 and higher**.
+* **Extended set of optimized models** (including 16 bit floating point examples).
  
-### Contibuting
  
-- Do new development in [feature branches](https://www.atlassian.com/git/workflows#!workflow-feature-branch) with descriptive names.
-- Bring your work up-to-date by [rebasing](http://git-scm.com/book/en/Git-Branching-Rebasing) onto the latest `dev`. (Polish your changes by [interactive rebase](https://help.github.com/articles/interactive-rebase), if you'd like.)
-- [Pull request](https://help.github.com/articles/using-pull-requests) your contribution to BVLC/caffe's `dev` branch for discussion and review.
-  * PRs should live fast, die young, and leave a beautiful merge. Pull request sooner than later so that discussion can guide development.
-  * Code must be accompanied by documentation and tests at all times.
-  * Only fast-forward merges will be accepted.
+## License and Citation
  
-See our [development guidelines](http://caffe.berkeleyvision.org/development.html) for further details–the more closely these are followed, the sooner your work will be merged.
+Caffe is released under the [BSD 2-Clause license](https://github.com/BVLC/caffe/blob/master/LICENSE).
+The BVLC reference models are released for unrestricted use.
  
-#### [Shelhamer's](https://github.com/shelhamer) “life of a branch in four acts”
+Please cite Caffe in your publications if it helps your research:
  
-Make the `feature` branch off of the latest `bvlc/dev`
-```
-git checkout dev
-git pull upstream dev
-git checkout -b feature
-# do your work, make commits
-```
+    @article{jia2014caffe,
+      Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
+      Journal = {arXiv preprint arXiv:1408.5093},
+      Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
+      Year = {2014}
+    }
  
-Prepare to merge by rebasing your branch on the latest `bvlc/dev`
-```
-# make sure dev is fresh
-git checkout dev
-git pull upstream dev
-# rebase your branch on the tip of dev
-git checkout feature
-git rebase dev
-```
+## Useful notes
  
-Push your branch to pull request it into `dev`
+Libturbojpeg library is used since 0.16.5. It has a packaging bug. Please execute the following (required for Makefile, optional for CMake):
  ```
-git push origin feature
-# ...make pull request to dev...
+sudo apt-get install libturbojpeg
+sudo ln -s /usr/lib/x86_64-linux-gnu/libturbojpeg.so.0.1.0 /usr/lib/x86_64-linux-gnu/libturbojpeg.so
  ```
-
-Now make a pull request! You can do this from the command line (`git pull-request -b dev`) if you install [hub](https://github.com/github/hub).
-
-The pull request of `feature` into `dev` will be a clean merge. Applause.