docs/source/using_api.rst

   1 .. _using-tidl-api:
   2
   3 ******************
   4 Using the TIDL API
   5 ******************
   6
   7 This example illustrates using the TIDL API to offload deep learning network processing from a Linux application to the C66x DSPs or EVEs on AM57x devices. The API consists of three classes: ``Configuration``, ``Executor`` and ``ExecutionObject``.
   8
   9 Step 1
  10 ======
  11
  12 Determine if there are any TIDL capable devices on the AM57x SoC:
  13
  14 .. code-block:: c++
  15
  16     uint32_t num_eve = Executor::GetNumDevices(DeviceType::EVE);
  17     uint32_t num_dsp = Executor::GetNumDevices(DeviceType::DSP);
  18
  19 .. note::
  20     By default, the OpenCL runtime is configured with sufficient global memory
  21     (via CMEM) to offload TIDL networks to 2 OpenCL devices. On devices where
  22     ``Executor::GetNumDevices`` returns 4 (E.g. AM5729 with 4 EVE OpenCL
  23     devices) the amount of memory available to the runtime must be increased.
  24     Refer :ref:`opencl-global-memory` for details
  25
  26 Step 2
  27 ======
  28 Create a Configuration object by reading it from a file or by initializing it directly. The example below parses a configuration file and initializes the Configuration object. See ``examples/test/testvecs/config/infer`` for examples of configuration files.
  29
  30 .. code::
  31
  32     Configuration configuration;
  33     bool status = configuration.ReadFromFile(config_file);
  34
  35 .. note::
  36     Refer `Processor SDK Linux Software Developer's Guide`_ for creating TIDL network and parameter binary files from TensorFlow and Caffe.
  37
  38 Step 3
  39 ======
  40 Create an Executor with the appropriate device type, set of devices and a configuration. In the snippet below, an Executor is created on 2 EVEs.
  41
  42 .. code-block:: c++
  43
  44         DeviceIds ids = {DeviceId::ID0, DeviceId::ID1};
  45         Executor executor(DeviceType::EVE, ids, configuration);
  46
  47 Step 4
  48 ======
  49 Get the set of available ExecutionObjects and allocate input and output buffers for each ExecutionObject.
  50
  51 .. code-block:: c++
  52
  53         const ExecutionObjects& execution_objects = executor.GetExecutionObjects();
  54         int num_eos = execution_objects.size();
  55
  56         // Allocate input and output buffers for each execution object
  57         std::vector<void *> buffers;
  58         for (auto &eo : execution_objects)
  59         {
  60             ArgInfo in  = { ArgInfo(malloc(frame_sz), frame_sz)};
  61             ArgInfo out = { ArgInfo(malloc(frame_sz), frame_sz)};
  62             eo->SetInputOutputBuffer(in, out);
  63
  64             buffers.push_back(in.ptr());
  65             buffers.push_back(out.ptr());
  66         }
  67
  68 Step 5
  69 ======
  70 Run the network on each input frame.  The frames are processed with available execution objects in a pipelined manner with additional num_eos iterations to flush the pipeline (epilogue).
  71
  72 .. code-block:: c++
  73
  74         for (int frame_idx = 0; frame_idx < configuration.numFrames + num_eos; frame_idx++)
  75         {
  76             ExecutionObject* eo = execution_objects[frame_idx % num_eos].get();
  77
  78             // Wait for previous frame on the same eo to finish processing
  79             if (eo->ProcessFrameWait())
  80                 WriteFrame(*eo, output_data_file);
  81
  82             // Read a frame and start processing it with current eo
  83             if (ReadFrame(*eo, frame_idx, configuration, input_data_file))
  84                 eo->ProcessFrameStartAsync();
  85         }
  86
  87 For a complete example of using the API, refer any of the examples available at ``/usr/share/ti/tidl/examples`` on the EVM file system.
  88
  89 .. _Processor SDK Linux Software Developer's Guide: http://software-dl.ti.com/processor-sdk-linux/esd/docs/latest/linux/index.html