******** Examples ******** .. list-table:: TIDL API Examples :header-rows: 1 :widths: 12 43 20 25 * - Example - Description - Compute cores - Input image * - one_eo_per_frame - Processes a single frame with one :term:`EO` using the j11_v2 network. Throughput is increased by distributing frame processing across EOs. Refer :ref:`use-case-1`. - EVE or C66x - Pre-processed image read from file. * - two_eo_per_frame - Processes a single frame with an :term:`EOP` using the j11_v2 network to reduce per-frame processing latency. Also increases throughput by distributing frame processing across EOPs. The EOP consists of two EOs. Refer :ref:`use-case-2`. - EVE and C66x (network is split across both EVE and C66x) - Pre-processed image read from file. * - two_eo_per_frame_opt - Builds on ``two_eo_per_frame``. Adds double buffering to improve performance. Refer :ref:`use-case-3`. - EVE and C66x (network is split across both EVE and C66x) - Pre-processed image read from file. * - imagenet - Classification example - EVE or C66x - OpenCV used to read input image from file or capture from camera. * - segmentation - Pixel level segmentation example - EVE or C66x - OpenCV used to read input image from file or capture from camera. * - ssd_multibox - Object detection - EVE and C66x (network is split across both EVE and C66x) - OpenCV used to read input image from file or capture from camera. * - classification - Classification example, called from the Matrix GUI. - - OpenCV used to read input image from file or capture from camera. * - layer_output - Illustrates using TIDL APIs to access output buffers of intermediate :term:`layers` in the network. - EVE or C66x - Pre-processed image read from file. * - test - This example is used to test pre-converted networks included in the TIDL API package (``test/testvecs/config/tidl_models``). When run without any arguments, the program ``test_tidl`` will run all available networks on the C66x DSPs and EVEs available on the SoC. Use the ``-c`` option to specify a single network. Run ``test_tidl -h`` for details. - C66x and EVEs (if available) - Pre-processed image read from file. The included examples demonstrate three categories of deep learning networks: classification, segmentation and object detection. ``imagenet`` and ``segmentation`` can run on AM57x processors with either EVE or C66x cores. ``ssd_multibox`` requires AM57x processors with both EVE and C66x. The examples are available at ``/usr/share/ti/tidl/examples`` on the EVM file system and in the linux devkit. The performance numbers were obtained using: * `AM574x IDK EVM`_ with the Sitara `AM5749`_ Processor - 2 Arm Cortex-A15 cores running at 1.0GHz, 2 EVE cores at 650MHz, and 2 C66x cores at 750MHz. * `Processor SDK Linux`_ v5.1 with TIDL API v1.1 For each example, device processing time, host processing time, and TIDL API overhead is reported. * **Device processing time** is measured on the device, from the moment processing starts for a frame till processing finishes. * **Host processing time** is measured on the host, from the moment ``ProcessFrameStartAsync()`` is called till ``ProcessFrameWait()`` returns in user application. It includes the TIDL API overhead, the OpenCL runtime overhead, and the time to copy user input data into padded TIDL internal buffers. ``Host processing time = Device processing time + TIDL API overhead``. Imagenet -------- The imagenet example takes an image as input and outputs 1000 probabilities. Each probability corresponds to one object in the 1000 objects that the network is pre-trained with. The example outputs top 5 predictions for a given input image. The following figure and tables shows an input image, top 5 predicted objects as output, and the processing time on either EVE or C66x. .. image:: ../../examples/test/testvecs/input/objects/cat-pet-animal-domestic-104827.jpeg :width: 600 ==== ============== Rank Object Classes ==== ============== 1 tabby 2 Egyptian_cat 3 tiger_cat 4 lynx 5 Persian_cat ==== ============== ======= ====================== ==================== ============ Device Device Processing Time Host Processing Time API Overhead ======= ====================== ==================== ============ EVE 106.5 ms 107.9 ms 1.37 % C66x 117.9 ms 118.7 ms 0.93 % ======= ====================== ==================== ============ The :term:`network` used in the example is jacintonet11v2. It has 14 layers. Input to the network is RGB image of 224x224. Users can specify whether to run the network on EVE or C66x. The example code sets ``buffer_factor`` to 2 to create duplicated ExecutionObjectPipelines with identical ExecutionObjects to perform double buffering, so that host pre/post-processing can be overlapped with device processing (see comments in the code for details). The following table shows the loop overall time over 10 frames with single buffering and double buffering, ``./imagenet -f 10 -d -e ``. .. list-table:: Loop overall time over 10 frames :header-rows: 1 * - Device(s) - Single Buffering (buffer_factor=1) - Double Buffering (buffer_factor=2) * - 1 EVE - 1744 ms - 1167 ms * - 2 EVEs - 966 ms - 795 ms * - 1 C66x - 1879 ms - 1281 ms * - 2 C66xs - 1021 ms - 814 ms .. note:: The predicitions reported here are based on the output of the softmax layer in the network, which are not normalized to the real probabilities. Segmentation ------------ The segmentation example takes an image as input and performs pixel-level classification according to pre-trained categories. The following figures show a street scene as input and the scene overlaid with pixel-level classifications as output: road in green, pedestrians in red, vehicles in blue and background in gray. .. image:: ../../examples/test/testvecs/input/roads/pexels-photo-972355.jpeg :width: 600 .. image:: images/pexels-photo-972355-seg.jpg :width: 600 The :term:`network` used in the example is jsegnet21v2. It has 26 layers. Users can specify whether to run the network on EVE or C66x. Input to the network is RGB image of size 1024x512. The output is 1024x512 values, each value indicates which pre-trained category the current pixel belongs to. The example will take the network output, create an overlay, and blend the overlay onto the original input image to create an output image. From the reported time in the following table, we can see that this network runs significantly faster on EVE than on C66x. ======= ====================== ==================== ============ Device Device Processing Time Host Processing Time API Overhead ======= ====================== ==================== ============ EVE 251.8 ms 254.2 ms 0.96 % C66x 812.7 ms 815.0 ms 0.27 % ======= ====================== ==================== ============ The example code sets ``buffer_factor`` to 2 to create duplicated ExecutionObjectPipelines with identical ExecutionObjects to perform double buffering, so that host pre/post-processing can be overlapped with device processing (see comments in the code for details). The following table shows the loop overall time over 10 frames with single buffering and double buffering, ``./segmentation -f 10 -d -e ``. .. list-table:: Loop overall time over 10 frames :header-rows: 1 * - Device(s) - Single Buffering (buffer_factor=1) - Double Buffering (buffer_factor=2) * - 1 EVE - 5233 ms - 3017 ms * - 2 EVEs - 3032 ms - 3015 ms * - 1 C66x - 10890 ms - 8416 ms * - 2 C66xs - 5742 ms - 4638 ms .. _ssd-example: SSD --- SSD is the abbreviation for Single Shot multi-box Detector. The ssd_multibox example takes an image as input and detects multiple objects with bounding boxes according to pre-trained categories. The following figures show another street scene as input and the scene with recognized objects boxed as output: pedestrians in red, vehicles in blue and road signs in yellow. .. image:: ../../examples/test/testvecs/input/roads/pexels-photo-378570.jpeg :width: 600 .. image:: images/pexels-photo-378570-ssd.jpg :width: 600 The network we ran in this category is jdenet_ssd, which has 43 layers. Input to the network is RGB image of size 768x320. Output is a list of boxes (up to 20), each box has information about the box coordinates, and which pre-trained category that the object inside the box belongs to. The example will take the network output, draw boxes accordingly, and create an output image. The network can be run entirely on either EVE or C66x. However, the best performance comes with running the first 30 layers as a group on EVE and the next 13 layers as another group on C66x. Our end-to-end example shows how easy it is to assign a :term:`Layer Group` id to an :term:`Executor` and how easy it is to construct an :term:`ExecutionObjectPipeline` to connect the output of one *Executor*'s :term:`ExecutionObject` to the input of another *Executor*'s *ExecutionObject*. ======== ====================== ==================== ============ Device Device Processing Time Host Processing Time API Overhead ======== ====================== ==================== ============ EVE+C66x 169.5ms 172.0ms 1.68 % ======== ====================== ==================== ============ The example code sets ``pipeline_depth`` to 2 to create duplicated ExecutionObjectPipelines with identical ExecutionObjects to perform pipelined execution at the ExecutionObject level. The side effect is that it also overlaps host pre/post-processing with device processing (see comments in the code for details). The following table shows the loop overall time over 10 frames with pipelining at ExecutionObjectPipeline level versus ExecutionObject level. ``./ssd_multibox -f 10 -d -e ``. .. list-table:: Loop overall time over 10 frames :header-rows: 1 * - Device(s) - pipeline_depth=1 - pipeline_depth=2 * - 1 EVE + 1 C66x - 2900 ms - 1735 ms * - 2 EVEs + 2 C66xs - 1630 ms - 1408 ms Running Examples ---------------- The examples are located in ``/usr/share/ti/tidl/examples`` on the EVM file system. **Each example needs to be run in its own directory** due to relative paths to configuration files. Running an example with ``-h`` will show help message with option set. The following listing illustrates how to build and run the examples. .. code-block:: shell root@am57xx-evm:~/tidl-api/examples/imagenet# ./imagenet Input: ../test/testvecs/input/objects/cat-pet-animal-domestic-104827.jpeg frame[ 0]: Time on EVE0: 106.50 ms, host: 107.96 ms API overhead: 1.35 % 1: tabby 2: Egyptian_cat 3: tiger_cat 4: lynx 5: Persian_cat Loop total time (including read/write/opencv/print/etc): 202.6ms imagenet PASSED root@am57xx-evm:~/tidl-api/examples/segmentation# ./segmentation Input: ../test/testvecs/input/000100_1024x512_bgr.y frame[ 0]: Time on EVE0: 251.74 ms, host: 258.02 ms API overhead: 2.43 % Saving frame 0 to: frame_0.png Saving frame 0 overlayed with segmentation to: overlay_0.png frame[ 1]: Time on EVE0: 251.76 ms, host: 255.79 ms API overhead: 1.58 % Saving frame 1 to: frame_1.png Saving frame 1 overlayed with segmentation to: overlay_1.png ... frame[ 8]: Time on EVE0: 251.75 ms, host: 254.21 ms API overhead: 0.97 % Saving frame 8 to: frame_8.png Saving frame 8 overlayed with segmentation to: overlay_8.png Loop total time (including read/write/opencv/print/etc): 4809ms segmentation PASSED root@am57xx-evm:~/tidl-api/examples/ssd_multibox# ./ssd_multibox Input: ../test/testvecs/input/preproc_0_768x320.y frame[ 0]: Time on EVE0+DSP0: 169.44 ms, host: 173.56 ms API overhead: 2.37 % Saving frame 0 to: frame_0.png Saving frame 0 with SSD multiboxes to: multibox_0.png Loop total time (including read/write/opencv/print/etc): 320.2ms ssd_multibox PASSED Image input ^^^^^^^^^^^ The image input option, ``-i ``, takes an image file as input. You can supply an image file with format that OpenCV can read, since we use OpenCV for image pre/post-processing. When ``-f `` option is used, the same image will be processed repeatedly. Camera (live video) input ^^^^^^^^^^^^^^^^^^^^^^^^^ The input option, ``-i camera``, enables live frame inputs from camera. ```` is the video input port number of your camera in Linux. Use the following command to check video input ports. The number defaults to ``1`` for TMDSCM572X camera module used on AM57x EVMs. You can use ``-f `` to specify the number of frames you want to process. .. code-block:: shell root@am57xx-evm:~# v4l2-ctl --list-devices omapwb-cap (platform:omapwb-cap): /dev/video11 omapwb-m2m (platform:omapwb-m2m): /dev/video10 vip (platform:vip): /dev/video1 vpe (platform:vpe): /dev/video0 Pre-recorded video (mp4/mov/avi) input ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The input option, ``-i .{mp4,mov,avi}``, enables frame inputs from pre-recorded video file in mp4, mov or avi format. If you have a video in a different OpenCV-supported format/suffix, you can simply create a softlink with one of the mp4, mov or avi suffixes and feed it into the example. Again, use ``-f `` to specify the number of frames you want to process. Displaying video output ^^^^^^^^^^^^^^^^^^^^^^^ When using video input, live or pre-recorded, the example will display the output in a window using OpenCV. If you have a LCD screen attached to the EVM, you will need to kill the ``matrix-gui`` first in order to see the example display window, as shown in the following example. .. code-block:: shell root@am57xx-evm:/usr/share/ti/tidl/examples/ssd_multibox# /etc/init.d/matrix-gui-2.0 stop Stopping Matrix GUI application. root@am57xx-evm:/usr/share/ti/tidl/examples/ssd_multibox# ./ssd_multibox -i camera -f 100 Input: camera init done Using Wayland-EGL wlpvr: PVR Services Initialised Using the 'xdg-shell-v5' shell integration ... ... root@am57xx-evm:/usr/share/ti/tidl/examples/ssd_multibox# /etc/init.d/matrix-gui-2.0 start /usr/share/ti/tidl/examples/ssd_multibox Removing stale PID file /var/run/matrix-gui-2.0.pid. Starting Matrix GUI application. .. _AM574x IDK EVM: http://www.ti.com/tool/tmdsidk574 .. _AM5749: http://www.ti.com/product/AM5749/ .. _Processor SDK Linux: http://software-dl.ti.com/processor-sdk-linux/esd/AM57X/latest/index_FDS.html