docs/source/intro.rst

   1 ************
   2 Introduction
   3 ************
   4
   5 TI Deep Learning (TIDL) API brings deep learning to the edge by enabling applications to leverage TI's proprietary, highly optimized CNN/DNN implementation on Deep Learning Accelerator (DLA) and C66x DSP compute engines. TIDL will initially target Vision/2D use cases on AM57x SoCs.
   6
   7 This User's Guide covers the TIDL API. For information on TIDL such as the overall development flow, techniques to optimize performance of CNN/DNN on TI's SoCs,performance/benchmarking data and list of supported layers, see the TIDL section in the `Processor SDK Linux Software Developer's Guide`_.
   8
   9 .. note::
  10     TIDL API is available only on AM57x SoCs. It requires OpenCL version 1.1.15.1 or higher.
  11
  12 Key Features
  13 ------------
  14 Ease of use
  15 +++++++++++
  16 * Easily integrate TIDL APIs into other frameworks such as `OpenCV`_
  17 * Provides a common host abstraction for user applications across multiple compute engines (DLAs and C66x DSPs)
  18
  19 Low overhead
  20 +++++++++++++
  21 The execution time of TIDL APIs on the host is a fairly small percentage of the overall per-frame execution time. For example, with jseg21 network, 1024x512 frame with 3 channels, the APIs account for ~1.5% of overall per-frame processing time.
  22
  23 Software Architecture
  24 ---------------------
  25 The TIDL API leverages TI's `OpenCL`_ product to offload deep learning applications to both DLA(s) and DSP(s).  The TIDL API significantly improves the out-of-box deep learning experience for users and enables them to focus on their overall use case. They do not have to spend time on the mechanics of ARM ↔ DSP/DLA communication or implementing optimized network layers on DLA(s) and/or DSP(s).  The API allows customers to easily integrate frameworks such as OpenCV and rapidly prototype deep learning applications.
  26
  27 .. _`TIDL Development flow`:
  28
  29 .. figure:: images/tidl-development-flow.png
  30     :align: center
  31     :scale: 50
  32
  33     Development flow with TIDL APIs
  34
  35 :numref:`TIDL Development flow` shows the overall development process. Deep learning consists to two stages: training at development stage and inference at deployment stage.  Training involves designing neural network model, running training data through the network to tune the model parameters.  Inference takes the pre-trained model including parameters, applies to new input and produces output.  Training is computationally intensive and is done using frameworks such as Caffe/TensorFlow. Once the network is trained, the TIDL converter tool can be used to translate the network and parameters to TIDL. The `Processor SDK Linux Software Developer's Guide`_ provides details on the development flow and and the converter tool. The converter tool generates a TIDL network binary file and model or parameter file. The network file specifies the network graph. The parameter file specifies the weights.
  36
  37 :numref:`TIDL API Software Architecture` shows the TIDL API software architecture.
  38
  39 .. _`TIDL API Software Architecture`:
  40
  41 .. figure:: images/tidl-api.png
  42     :align: center
  43     :scale: 60
  44
  45     TIDL API Software Architecture
  46
  47 TIDL APIs provide three intuitive C++ classes.  ``Configuration`` encapsulates a network configuration, including pointers to the network and parameter binary files.  ``Executor`` encapsulates on-device memory allocation, network setup and initialization.  ``ExecutionObject`` encapsulates TIDL processing on a single DSP or DLA core.  Implementation of these classes will call into OpenCL runtime to offload network processing onto DLA/DSP devices, abstracting these details from the user.
  48
  49 :numref:`simple-example` illustrates how easy it is to use TIDL APIs to leverage deep learning application in user applications.  In this example, a configuration object is created from reading a TIDL network config file.  An executor object is created with two DLA devices.  It uses the configuration object to setup and initialize TIDL network on DLAs.  Each of the two execution objects dispatches TIDL processing to a different DLA core.  Because the OpenCL kernel execution is asynchronous, we can pipeline the frames across two DLAs.  When one frame is being processed by a DLA, the next frame can be processed by another DLA.
  50
  51
  52 .. code-block:: c++
  53     :caption: Application using TIDL APIs
  54     :name: simple-example
  55
  56     // Read a TI DL network configuration file
  57     Configuration configuration;
  58     bool status = configuration.ReadFromFile(“./tidl_j11v2_net");
  59
  60     // Create an executor with 2 DLAs and configuration
  61     DeviceIds ids = {DeviceId::ID0, DeviceId::ID1};
  62     Executor executor(DeviceType::DLA, ids, configuration);
  63
  64     // Query Executor for set of ExecutionObjects created
  65     const ExecutionObjects& eos = executor.GetExecutionObjects();
  66     int num_eos = eos.size();  // 2 DLAs
  67
  68     // Allocate input and output buffers for each execution object
  69     for (auto &eo : eos)
  70     {
  71          ArgInfo in(eo->GetInputBufferSizeInBytes());
  72          ArgInfo out(eo->GetOutputBufferSizeInBytes());
  73          eo->SetInputOutputBuffer(in, out);
  74     }
  75
  76     // Pipelined processing with 2 DLA cores
  77     for (int idx = 0; idx < configuration.numFrames + num_eos; idx++)
  78     {
  79         ExecutionObject* eo = eos[idx % num_eos].get();
  80
  81         // Wait for previous frame on the same eo to finish processing
  82         if (eo->ProcessFrameWait())  WriteFrameOutput(*eo);
  83
  84         // Read a frame and start processing it with current eo
  85         if (ReadFrameInput(*eo, idx))  eo->ProcessFrameStartAsync();
  86     }
  87
  88
  89 ``ReadFrameInput`` and ``WriteFrameOutput`` functions are used to read an input frame and write the result of processing. For example, with OpenCV, ``ReadFrameInput`` is implemented using OpenCV APIs to capture a frame. To execute the same network on DSPs, the only change to :numref:`simple-example` is to replace ``DeviceType::DLA`` with ``DeviceType::DSP``.
  90
  91 Section :ref:`using-tidl-api` contains details on using the APIs. The APIs themselves are documented in section :ref:`api-documentation`.
  92
  93 Sometimes it is beneficial to partition a network and run different parts on different cores because some types of layers could run faster on DLAs while other types could run faster on DSPs.  TIDL APIs provide the flexibility to run partitioned network across DLAs and DSPs. Refer the :ref:`ssd-example` example for details.
  94
  95 .. _Processor SDK Linux Software Developer's Guide: http://software-dl.ti.com/processor-sdk-linux/esd/docs/latest/linux/index.html
  96 .. _OpenCV: http://software-dl.ti.com/processor-sdk-linux/esd/docs/latest/linux/Foundational_Components.html#opencv
  97 .. _OpenCL: http://software-dl.ti.com/mctools/esd/docs/opencl/index.html