Clarify imagenet example output

[tidl/tidl-api.git] / docs / source / example.rst
diff --git a/docs/source/example.rst b/docs/source/example.rst

index 316a6ae316f3cf6e5c1f7d8ac5e6b28aa73b364c..7ba9bc7d3761368f0d9b5fae8b5b7a2eeb9472d6 100644 (file)
--- a/docs/source/example.rst
+++ b/docs/source/example.rst
@@ -2,11 +2,251 @@
  Examples
  ********
  
+We ship three end-to-end examples within the tidl-api package
+to demonstrate three categories of deep learning networks.  The first
+two examples can run on AM57x SoCs with either EVE or DSP devices.  The last
+example requires AM57x SoCs with both EVE and DSP.  The performance
+numbers that we present here were obtained on an AM5729 EVM, which
+includes 2 ARM A15 cores running at 1.5GHz, 4 EVE cores at 535MHz, and
+2 DSP cores at 750MHz.
+
+For each example, we report device processing time, host processing time,
+and TIDL API overhead.  **Device processing time** is measured on the device,
+from the moment processing starts for a frame till processing finishes.
+**Host processing time** is measured on the host, from the moment
+``ProcessFrameStartAsync()`` is called till ``ProcessFrameWait()`` returns
+in user application.  It includes the TIDL API overhead, the OpenCL runtime
+overhead, and the time to copy user input data into padded TIDL internal
+buffers.
+
  Imagenet
  --------
  
+The imagenet example takes an image as input and outputs 1000 probabilities.
+Each probability corresponds to one object in the 1000 objects that the
+network is pre-trained with.  Our example outputs top 5 predictions
+as the most likely objects that the input image can be.
+
+The following figure and tables shows an input image, top 5 predicted
+objects as output, and the processing time on either EVE or DSP.
+
+.. image:: ../../examples/test/testvecs/input/objects/cat-pet-animal-domestic-104827.jpeg
+   :width: 600
+
+.. table::
+
+    ==== ==============
+    Rank Object Classes
+    ==== ==============
+    1    tabby
+    2    Egyptian_cat
+    3    tiger_cat
+    4    lynx
+    5    Persian_cat
+    ==== ==============
+
+.. table::
+
+   ====================== ==================== ============
+   Device Processing Time Host Processing Time API Overhead
+   ====================== ==================== ============
+   EVE: 123.1 ms          124.7 ms             1.34 %
+   **OR**
+   DSP: 117.9 ms          119.3 ms             1.14 %
+   ====================== ==================== ============
+
+The particular network that we ran in this category, jacintonet11v2,
+has 14 layers.  User can specify whether to run the network on EVE or DSP
+for acceleration.  We can see that EVE time is slightly higher than DSP time.
+We can also see that the overall overhead is less than 1.5%.
+
+.. note::
+    The predicitions reported here are based on the output of the softmax
+    layer in the network, which are not normalized to the real probabilities.
+
  Segmentation
  ------------
  
+The segmentation example takes an image as input and performs pixel-level
+classification according to pre-trained categories.  The following figures
+show a street scene as input and the scene overlaid with pixel-level
+classifications as output: road in green, pedestrians in red, vehicles
+in blue and background in gray.
+
+.. image:: ../../examples/test/testvecs/input/roads/pexels-photo-972355.jpeg
+   :width: 600
+
+.. image:: images/pexels-photo-972355-seg.jpg
+   :width: 600
+
+The network we ran in this category is jsegnet21v2, which has 26 layers.
+From the reported time in the following table, we can see that this network
+runs significantly faster on EVE than on DSP.
+
+.. table::
+
+   ====================== ==================== ============
+   Device Processing Time Host Processing Time API Overhead
+   ====================== ==================== ============
+   EVE: 296.5 ms          303.3 ms             2.26 %
+   **OR**
+   DSP: 812.0 ms          818.4 ms             0.79 %
+   ====================== ==================== ============
+
+.. _ssd-example:
+
  SSD
  ---
+
+SSD is the abbreviation for Single Shot multi-box Detector.
+The ssd_multibox example takes an image as input and detects multiple
+objects with bounding boxes according to pre-trained categories.
+The following figures show another street scene as input and the scene
+with recognized objects boxed as output: pedestrians in red,
+vehicles in blue and road signs in yellow.
+
+.. image:: ../../examples/test/testvecs/input/roads/pexels-photo-378570.jpeg
+   :width: 600
+
+.. image:: images/pexels-photo-378570-ssd.jpg
+   :width: 600
+
+The network can be run entirely on either EVE or DSP.  But the best
+performance comes with running the first 30 layers on EVE and the
+next 13 layers on DSP, for this particular jdetnet_ssd network.
+Note the **AND** in the following table for the reported time.
+Our end-to-end example shows how easy it is to assign a layers group id
+to an *Executor* and how easy it is to connect the output from one
+*ExecutionObject* to the input to another *ExecutionObject*.
+
+.. table::
+
+   ====================== ==================== ============
+   Device Processing Time Host Processing Time API Overhead
+   ====================== ==================== ============
+   EVE: 175.2 ms          179.1 ms             2.14 %
+   **AND**
+   DSP:  21.1 ms           22.3 ms             5.62 %
+   ====================== ==================== ============
+
+Test
+----
+This example is used to test pre-converted networks included in the TIDL API package (``test/testvecs/config/tidl_models``). When run without any arguments, the program ``test_tidl`` will run all available networks on the C66x DSPs and EVEs available on the SoC. Use the ``-c`` option to specify a single network. Run ``test_tidl -h``  for details.
+
+Running Examples
+----------------
+
+The examples are located in ``/usr/share/ti/tidl/examples`` on
+the EVM file system.  Each example needs to be run its own directory.
+Running an example with ``-h`` will show help message with option set.
+The following code section shows how to run the examples, and
+the test program that tests all supported TIDL network configs.
+
+.. code:: shell
+
+   root@am57xx-evm:~# cd /usr/share/ti/tidl-api/examples/imagenet/
+   root@am57xx-evm:/usr/share/ti/tidl-api/examples/imagenet# make -j4
+   root@am57xx-evm:/usr/share/ti/tidl-api/examples/imagenet# ./imagenet -t d
+   Input: ../test/testvecs/input/objects/cat-pet-animal-domestic-104827.jpeg
+   frame[0]: Time on device:  117.9ms, host:  119.3ms API overhead:   1.17 %
+   1: tabby, prob = 0.996
+   2: Egyptian_cat, prob = 0.977
+   3: tiger_cat, prob = 0.973
+   4: lynx, prob = 0.941
+   5: Persian_cat, prob = 0.922
+   imagenet PASSED
+
+   root@am57xx-evm:/usr/share/ti/tidl-api/examples/imagenet# cd ../segmentation/; make -j4
+   root@am57xx-evm:/usr/share/ti/tidl-api/examples/segmentation# ./segmentation -i ../test/testvecs/input/roads/pexels-photo-972355.jpeg
+   Input: ../test/testvecs/input/roads/pexels-photo-972355.jpeg
+   frame[0]: Time on device:  296.5ms, host:  303.2ms API overhead:   2.21 %
+   Saving frame 0 overlayed with segmentation to: overlay_0.png
+   segmentation PASSED
+
+   root@am57xx-evm:/usr/share/ti/tidl-api/examples/segmentation# cd ../ssd_multibox/; make -j4
+   root@am57xx-evm:/usr/share/ti/tidl-api/examples/ssd_multibox# ./ssd_multibox -i ../test/testvecs/input/roads/pexels-photo-378570.jpeg
+   Input: ../test/testvecs/input/roads/pexels-photo-378570.jpeg
+   frame[0]: Time on EVE:  175.2ms, host:    179ms API overhead:    2.1 %
+   frame[0]: Time on DSP:  21.06ms, host:  22.43ms API overhead:   6.08 %
+   Saving frame 0 with SSD multiboxes to: multibox_0.png
+   Loop total time (including read/write/print/etc):  423.8ms
+   ssd_multibox PASSED
+
+   root@am57xx-evm:/usr/share/ti/tidl-api/examples/ssd_multibox# cd ../test; make -j4
+   root@am57xx-evm:/usr/share/ti/tidl-api/examples/test# ./test_tidl
+   API Version: 01.00.00.d91e442
+   Running dense_1x1 on 2 devices, type EVE
+   frame[0]: Time on device:  134.3ms, host:  135.6ms API overhead:  0.994 %
+   dense_1x1 : PASSED
+   Running j11_bn on 2 devices, type EVE
+   frame[0]: Time on device:  176.2ms, host:  177.7ms API overhead:  0.835 %
+   j11_bn : PASSED
+   Running j11_cifar on 2 devices, type EVE
+   frame[0]: Time on device:  53.86ms, host:  54.88ms API overhead:   1.85 %
+   j11_cifar : PASSED
+   Running j11_controlLayers on 2 devices, type EVE
+   frame[0]: Time on device:  122.9ms, host:  123.9ms API overhead:  0.821 %
+   j11_controlLayers : PASSED
+   Running j11_prelu on 2 devices, type EVE
+   frame[0]: Time on device:  300.8ms, host:  302.1ms API overhead:  0.437 %
+   j11_prelu : PASSED
+   Running j11_v2 on 2 devices, type EVE
+   frame[0]: Time on device:  124.1ms, host:  125.6ms API overhead:   1.18 %
+   j11_v2 : PASSED
+   Running jseg21 on 2 devices, type EVE
+   frame[0]: Time on device:    367ms, host:    374ms API overhead:   1.88 %
+   jseg21 : PASSED
+   Running jseg21_tiscapes on 2 devices, type EVE
+   frame[0]: Time on device:  302.2ms, host:  308.5ms API overhead:   2.02 %
+   frame[1]: Time on device:  301.9ms, host:  312.5ms API overhead:   3.38 %
+   frame[2]: Time on device:  302.7ms, host:  305.9ms API overhead:   1.04 %
+   frame[3]: Time on device:  301.9ms, host:    305ms API overhead:   1.01 %
+   frame[4]: Time on device:  302.7ms, host:  305.9ms API overhead:   1.05 %
+   frame[5]: Time on device:  301.9ms, host:  305.5ms API overhead:   1.17 %
+   frame[6]: Time on device:  302.7ms, host:  305.9ms API overhead:   1.06 %
+   frame[7]: Time on device:  301.9ms, host:    305ms API overhead:   1.02 %
+   frame[8]: Time on device:    297ms, host:  300.3ms API overhead:   1.09 %
+   Comparing frame: 0
+   jseg21_tiscapes : PASSED
+   Running smallRoi on 2 devices, type EVE
+   frame[0]: Time on device:  2.548ms, host:  3.637ms API overhead:   29.9 %
+   smallRoi : PASSED
+   Running squeeze1_1 on 2 devices, type EVE
+   frame[0]: Time on device:  292.9ms, host:  294.6ms API overhead:  0.552 %
+   squeeze1_1 : PASSED
+
+   Multiple Executor...
+   Running network tidl_config_j11_v2.txt on EVEs: 1  in thread 0
+   Running network tidl_config_j11_cifar.txt on EVEs: 0  in thread 1
+   Multiple executors: PASSED
+   Running j11_bn on 2 devices, type DSP
+   frame[0]: Time on device:  170.5ms, host:  171.5ms API overhead:  0.568 %
+   j11_bn : PASSED
+   Running j11_controlLayers on 2 devices, type DSP
+   frame[0]: Time on device:  416.4ms, host:  417.1ms API overhead:  0.176 %
+   j11_controlLayers : PASSED
+   Running j11_v2 on 2 devices, type DSP
+   frame[0]: Time on device:    118ms, host:  119.2ms API overhead:   1.01 %
+   j11_v2 : PASSED
+   Running jseg21 on 2 devices, type DSP
+   frame[0]: Time on device:   1123ms, host:   1128ms API overhead:  0.443 %
+   jseg21 : PASSED
+   Running jseg21_tiscapes on 2 devices, type DSP
+   frame[0]: Time on device:  812.3ms, host:  817.3ms API overhead:  0.614 %
+   frame[1]: Time on device:  812.6ms, host:  818.6ms API overhead:  0.738 %
+   frame[2]: Time on device:  812.3ms, host:  815.1ms API overhead:  0.343 %
+   frame[3]: Time on device:  812.7ms, host:  815.2ms API overhead:  0.312 %
+   frame[4]: Time on device:  812.3ms, host:  815.1ms API overhead:  0.353 %
+   frame[5]: Time on device:  812.6ms, host:  815.1ms API overhead:  0.302 %
+   frame[6]: Time on device:  812.2ms, host:  815.1ms API overhead:  0.357 %
+   frame[7]: Time on device:  812.6ms, host:  815.2ms API overhead:  0.315 %
+   frame[8]: Time on device:    812ms, host:    815ms API overhead:  0.367 %
+   Comparing frame: 0
+   jseg21_tiscapes : PASSED
+   Running smallRoi on 2 devices, type DSP
+   frame[0]: Time on device:  14.21ms, host:  14.94ms API overhead:   4.89 %
+   smallRoi : PASSED
+   Running squeeze1_1 on 2 devices, type DSP
+   frame[0]: Time on device:    960ms, host:  961.1ms API overhead:  0.116 %
+   squeeze1_1 : PASSED
+   tidl PASSED