From: Yuan Zhao Date: Wed, 23 May 2018 17:17:58 +0000 (-0500) Subject: Add documentation for examples X-Git-Tag: v01.00.00.00^2~2 X-Git-Url: https://git.ti.com/gitweb?p=tidl%2Ftidl-api.git;a=commitdiff_plain;h=56c2ee5d812a88b59f8df5569dff386b0012da70 Add documentation for examples - MCT-982 --- diff --git a/docs/source/example.rst b/docs/source/example.rst index 316a6ae..961e830 100644 --- a/docs/source/example.rst +++ b/docs/source/example.rst @@ -2,11 +2,290 @@ Examples ******** +We ship three end-to-end examples within the tidl-api packge +to demonstrate three categories of deep learning networks. The first +two examples can run on AM57x SoCs with either DLA or DSP. The last +example requires AM57x SoCs with both DLA and DSP. The performance +numbers that we present here were obtained on an AM5729 EVM, which +includes 2 ARM A15 cores running at 1.5GHz, 4 DLA cores at 535MHz, and +2 DSP cores at 750MHz. + Imagenet -------- +The imagenet example takes an image as input and outputs 1000 probabilities. +Each probability corresponds to one object in the 1000 objects that the +network is pre-trained with. Our example outputs top 5 probabilities +as the most likely objects that the input image can be. + +The following figure and tables shows an input image, top 5 predicted +objects as output, and the processing time on either DLA or DSP. + +.. image:: ../../examples/test/testvecs/input/objects/cat-pet-animal-domestic-104827.jpeg + :width: 600 + +.. table:: + + ==== ============== ============ + Rank Object Classes Probability + ==== ============== ============ + 1 tabby 0.996 + 2 Egyptian_cat 0.977 + 3 tiger_cat 0.973 + 4 lynx 0.941 + 5 Persian_cat 0.922 + ==== ============== ============ + +.. table:: + + ====================== ==================== ============ + Device Processing Time Host Processing Time API Overhead + ====================== ==================== ============ + DLA: 123.1 ms 124.7 ms 1.34 % + **OR** + DSP: 117.9 ms 119.3 ms 1.14 % + ====================== ==================== ============ + +The particular network that we ran in this category, jacintonet11v2, +has 14 layers. User can specify whether to run the network on DLA or DSP +for acceleration. We can see that DLA time is slightly higher than DSP time. +Host time includes the OpenCL runtime overhead and the time to copy user +input data into padded TIDL buffers. We can see that the overall overhead +is less than 1.5%. + Segmentation ------------ +The segmentation example takes an image as input and performs pixel-level +classification according to pre-trained categories. The following figures +show a street scene as input and the scene overlaid with pixel-level +classifications as output: road in green, pedestrians in red, vehicles +in blue and background in gray. + +.. image:: ../../examples/test/testvecs/input/roads/pexels-photo-972355.jpeg + :width: 600 + +.. image:: images/pexels-photo-972355-seg.jpg + :width: 600 + +The network we ran in this category is jsegnet21v2, which has 26 layers. +From the reported time in the following table, we can see that this network +runs significantly faster on DLA than on DSP. + +.. table:: + + ====================== ==================== ============ + Device Processing Time Host Processing Time API Overhead + ====================== ==================== ============ + DLA: 296.5 ms 303.3 ms 2.26 % + **OR** + DSP: 812.0 ms 818.4 ms 0.79 % + ====================== ==================== ============ + SSD --- + +SSD is the abbreviation for Single Shot multi-box Detector. +The ssd_multibox example takes an image as input and detects multiple +objects with bounding boxes according to pre-trained categories. +The following figures show another street scene as input and the scene +with recognized objects boxed as output: pedestrians in red, +vehicles in blue and road signs in yellow. + +.. image:: ../../examples/test/testvecs/input/roads/pexels-photo-378570.jpeg + :width: 600 + +.. image:: images/pexels-photo-378570-ssd.jpg + :width: 600 + +The network can be run entirely on either DLA or DSP. But the best +performance comes with running the first 30 layers on DLA and the +next 13 layers on DSP, for this particular jdetnet_ssd network. +Note the **AND** in the following table for the reported time. +Our end-to-end example shows how easy it is to assign a layers group id +to an *Executor* and how easy it is to connect the output from one +*ExecutionObject* to the input to another *ExecutionObject*. + +.. table:: + + ====================== ==================== ============ + Device Processing Time Host Processing Time API Overhead + ====================== ==================== ============ + DLA: 175.2 ms 179.1 ms 2.14 % + **AND** + DSP: 21.1 ms 22.3 ms 5.62 % + ====================== ==================== ============ + +Running Examples +---------------- + +The examples are located in ``/usr/share/ti/tidl-api/examples`` on +the EVM filesystem. Each example needs to be run its own directory. +Running an example with ``-h`` will show help message with option set. +The following code section shows how to run the examples, and +the test program that tests all supported TIDL network configs. + +.. code:: shell + + root@am57xx-evm:~# cd /usr/share/ti/tidl-api/examples/imagenet/ + root@am57xx-evm:/usr/share/ti/tidl-api/examples/imagenet# make -j4 + root@am57xx-evm:/usr/share/ti/tidl-api/examples/imagenet# ./imagenet -t d + Input: ../test/testvecs/input/objects/cat-pet-animal-domestic-104827.jpeg + frame[0]: Time on device: 117.9ms, host: 119.3ms API overhead: 1.17 % + 1: tabby, prob = 0.996 + 2: Egyptian_cat, prob = 0.977 + 3: tiger_cat, prob = 0.973 + 4: lynx, prob = 0.941 + 5: Persian_cat, prob = 0.922 + imagenet PASSED + + root@am57xx-evm:/usr/share/ti/tidl-api/examples/imagenet# cd ../segmentation/; make -j4 + root@am57xx-evm:/usr/share/ti/tidl-api/examples/segmentation# ./segmentation -i ../test/testvecs/input/roads/pexels-photo-972355.jpeg + Input: ../test/testvecs/input/roads/pexels-photo-972355.jpeg + frame[0]: Time on device: 296.5ms, host: 303.2ms API overhead: 2.21 % + Saving frame 0 overlayed with segmentation to: overlay_0.png + segmentation PASSED + + root@am57xx-evm:/usr/share/ti/tidl-api/examples/segmentation# cd ../ssd_multibox/; make -j4 + root@am57xx-evm:/usr/share/ti/tidl-api/examples/ssd_multibox# ./ssd_multibox -i ../test/testvecs/input/roads/pexels-photo-378570.jpeg + Input: ../test/testvecs/input/roads/pexels-photo-378570.jpeg + frame[0]: Time on DLA: 175.2ms, host: 179ms API overhead: 2.1 % + frame[0]: Time on DSP: 21.06ms, host: 22.43ms API overhead: 6.08 % + Saving frame 0 with SSD multiboxes to: multibox_0.png + Loop total time (including read/write/print/etc): 423.8ms + ssd_multibox PASSED + + root@am57xx-evm:/usr/share/ti/tidl-api/examples/ssd_multibox# cd ../test; make -j4 + root@am57xx-evm:/usr/share/ti/tidl-api/examples/test# ./test_tidl + API Version: 01.00.00.d91e442 + Running dense_1x1 on 2 devices, type EVE + frame[0]: Time on device: 134.3ms, host: 135.6ms API overhead: 0.994 % + dense_1x1 : PASSED + Running j11_bn on 2 devices, type EVE + frame[0]: Time on device: 176.2ms, host: 177.7ms API overhead: 0.835 % + j11_bn : PASSED + Running j11_cifar on 2 devices, type EVE + frame[0]: Time on device: 53.86ms, host: 54.88ms API overhead: 1.85 % + j11_cifar : PASSED + Running j11_controlLayers on 2 devices, type EVE + frame[0]: Time on device: 122.9ms, host: 123.9ms API overhead: 0.821 % + j11_controlLayers : PASSED + Running j11_prelu on 2 devices, type EVE + frame[0]: Time on device: 300.8ms, host: 302.1ms API overhead: 0.437 % + j11_prelu : PASSED + Running j11_v2 on 2 devices, type EVE + frame[0]: Time on device: 124.1ms, host: 125.6ms API overhead: 1.18 % + j11_v2 : PASSED + Running jseg21 on 2 devices, type EVE + frame[0]: Time on device: 367ms, host: 374ms API overhead: 1.88 % + jseg21 : PASSED + Running jseg21_tiscapes on 2 devices, type EVE + frame[0]: Time on device: 302.2ms, host: 308.5ms API overhead: 2.02 % + frame[1]: Time on device: 301.9ms, host: 312.5ms API overhead: 3.38 % + frame[2]: Time on device: 302.7ms, host: 305.9ms API overhead: 1.04 % + frame[3]: Time on device: 301.9ms, host: 305ms API overhead: 1.01 % + frame[4]: Time on device: 302.7ms, host: 305.9ms API overhead: 1.05 % + frame[5]: Time on device: 301.9ms, host: 305.5ms API overhead: 1.17 % + frame[6]: Time on device: 302.7ms, host: 305.9ms API overhead: 1.06 % + frame[7]: Time on device: 301.9ms, host: 305ms API overhead: 1.02 % + frame[8]: Time on device: 297ms, host: 300.3ms API overhead: 1.09 % + Comparing frame: 0 + jseg21_tiscapes : PASSED + Running smallRoi on 2 devices, type EVE + frame[0]: Time on device: 2.548ms, host: 3.637ms API overhead: 29.9 % + smallRoi : PASSED + Running squeeze1_1 on 2 devices, type EVE + frame[0]: Time on device: 292.9ms, host: 294.6ms API overhead: 0.552 % + squeeze1_1 : PASSED + + Multiple Executor... + Running network tidl_config_j11_v2.txt on EVEs: 1 in thread 0 + Running network tidl_config_j11_cifar.txt on EVEs: 0 in thread 1 + Multiple executors: PASSED + Running j11_bn on 2 devices, type DSP + frame[0]: Time on device: 170.5ms, host: 171.5ms API overhead: 0.568 % + j11_bn : PASSED + Running j11_controlLayers on 2 devices, type DSP + frame[0]: Time on device: 416.4ms, host: 417.1ms API overhead: 0.176 % + j11_controlLayers : PASSED + Running j11_v2 on 2 devices, type DSP + frame[0]: Time on device: 118ms, host: 119.2ms API overhead: 1.01 % + j11_v2 : PASSED + Running jseg21 on 2 devices, type DSP + frame[0]: Time on device: 1123ms, host: 1128ms API overhead: 0.443 % + jseg21 : PASSED + Running jseg21_tiscapes on 2 devices, type DSP + frame[0]: Time on device: 812.3ms, host: 817.3ms API overhead: 0.614 % + frame[1]: Time on device: 812.6ms, host: 818.6ms API overhead: 0.738 % + frame[2]: Time on device: 812.3ms, host: 815.1ms API overhead: 0.343 % + frame[3]: Time on device: 812.7ms, host: 815.2ms API overhead: 0.312 % + frame[4]: Time on device: 812.3ms, host: 815.1ms API overhead: 0.353 % + frame[5]: Time on device: 812.6ms, host: 815.1ms API overhead: 0.302 % + frame[6]: Time on device: 812.2ms, host: 815.1ms API overhead: 0.357 % + frame[7]: Time on device: 812.6ms, host: 815.2ms API overhead: 0.315 % + frame[8]: Time on device: 812ms, host: 815ms API overhead: 0.367 % + Comparing frame: 0 + jseg21_tiscapes : PASSED + Running smallRoi on 2 devices, type DSP + frame[0]: Time on device: 14.21ms, host: 14.94ms API overhead: 4.89 % + smallRoi : PASSED + Running squeeze1_1 on 2 devices, type DSP + frame[0]: Time on device: 960ms, host: 961.1ms API overhead: 0.116 % + squeeze1_1 : PASSED + tidl PASSED + +Possible runtime errors: out of memory +"""""""""""""""""""""""""""""""""""""" + +.. code:: shell + + tidl: device_alloc.h:31: T* tidl::malloc_ddr(size_t) [with T = char; size_t = unsigned int]: Assertion `val != nullptr' failed + +One possible reason is that previous aborted runs didn't properly release +allocation. Use ti-mct-heap-check with "-c" option to clean up. + +.. code:: shell + + root@am57xx-evm:~# ti-mct-heap-check -c + -- ddr_heap1 ------------------------------ + Addr : 0xa2000000 + Size : 0xa000000 + Avail: 0xa000000 + Align: 0x80 + ----------------------------------------- + +Another possible reason is that total memory requirement exceeds default +memory allocated for OpenCL. See below how to patch device tree to +increase OpenCL memory. + +.. code:: shell + + $ sudo apt-get install device-tree-compiler # In case dtc is not already installed + $ scp root@am57:/boot/am57xx-evm-reva3.dtb . + $ dtc -I dtb -O dts am57xx-evm-reva3.dtb -o am57xx-evm-reva3.dts + $ cp am57xx-evm-reva3.dts am57xx-evm-reva3.dts.orig + $ # increase cmem block size + $ diff -u am57xx-evm-reva3.dts.orig am57xx-evm-reva3.dts + --- am57xx-evm-reva3.dts.orig 2018-01-11 14:47:51.491572739 -0600 + +++ am57xx-evm-reva3.dts 2018-01-16 15:43:33.981431971 -0600 + @@ -5657,7 +5657,7 @@ + }; + + cmem_block_mem@a0000000 { + - reg = <0x0 0xa0000000 0x0 0xc000000>; + + reg = <0x0 0xa0000000 0x0 0x18000000>; + no-map; + status = "okay"; + linux,phandle = <0x13c>; + @@ -5823,7 +5823,7 @@ + cmem_block@0 { + reg = <0x0>; + memory-region = <0x13c>; + - cmem-buf-pools = <0x1 0x0 0xc000000>; + + cmem-buf-pools = <0x1 0x0 0x18000000>; + }; + + cmem_block@1 { + $ dtc -I dts -O dtb am57xx-evm-reva3.dts -o am57xx-evm-reva3.dtb + $ scp am57xx-evm-reva3.dtb root@am57:/boot/ + # reboot to make memory changes effective (run "cat /proc/iomem" to check) diff --git a/docs/source/images/pexels-photo-378570-ssd.jpg b/docs/source/images/pexels-photo-378570-ssd.jpg new file mode 100644 index 0000000..9977132 Binary files /dev/null and b/docs/source/images/pexels-photo-378570-ssd.jpg differ diff --git a/docs/source/images/pexels-photo-972355-seg.jpg b/docs/source/images/pexels-photo-972355-seg.jpg new file mode 100644 index 0000000..9d7c728 Binary files /dev/null and b/docs/source/images/pexels-photo-972355-seg.jpg differ