docs/source/using_api.rst

   1 ******************
   2 Using the TIDL API
   3 ******************
   4
   5 This example illustrates using the TIDL API to offload deep learning network processing from a Linux application to the C66x DSPs or DLAs on AM57x devices.
   6
   7 Step 1
   8 ======
   9
  10 Determine if there are any TIDL capable devices on the AM57x SoC:
  11
  12 .. code-block:: c++
  13
  14     uint32_t num_dla = Executor::GetNumDevices(DeviceType::DLA);
  15     uint32_t num_dsp = Executor::GetNumDevices(DeviceType::DSP);
  16
  17 Step 2
  18 ======
  19 Create a Configuration object by reading it from a file or by initializing it directly. The example below parses a configuration file and initializes the Configuration object. See ``examples/test/testvecs/config/infer`` for examples of configuration files.
  20
  21 .. code::
  22
  23     Configuration configuration;
  24     bool status = configuration.ReadFromFile(config_file);
  25
  26 .. note::
  27     Refer TIDL Translation Tool documentation for creating TIDL network and parameter binary files from TensorFlow and Caffe.
  28
  29 Step 3
  30 ======
  31 Create an Executor with the approriate device type, set of devices and a configuration. In the snippet below, an Executor is created on 2 DLAs.
  32
  33 .. code-block:: c++
  34
  35         DeviceIds ids = {DeviceId::ID0, DeviceId::ID1};
  36         Executor executor(DeviceType::DLA, ids, configuration);
  37
  38 Step 4
  39 ======
  40 Get the set of available ExecutionObjects and allocate input and output buffers for each ExecutionObject.
  41
  42 .. code-block:: c++
  43
  44         const ExecutionObjects& execution_objects = executor.GetExecutionObjects();
  45         int num_eos = execution_objects.size();
  46
  47         // Allocate input and output buffers for each execution object
  48         std::vector<void *> buffers;
  49         for (auto &eo : execution_objects)
  50         {
  51             ArgInfo in  = { ArgInfo(malloc(frame_sz), frame_sz)};
  52             ArgInfo out = { ArgInfo(malloc(frame_sz), frame_sz)};
  53             eo->SetInputOutputBuffer(in, out);
  54
  55             buffers.push_back(in.ptr());
  56             buffers.push_back(out.ptr());
  57         }
  58
  59
  60
  61 Step 5
  62 ======
  63 Run the network on each input frame.  The frames are processed with available execution objects in a pipelined manner with additional num_eos iterations to flush the pipeline (epilogue).
  64
  65 .. code-block:: c++
  66
  67         for (int frame_idx = 0; frame_idx < configuration.numFrames + num_eos; frame_idx++)
  68         {
  69             ExecutionObject* eo = execution_objects[frame_idx % num_eos].get();
  70
  71             // Wait for previous frame on the same eo to finish processing
  72             if (eo->ProcessFrameWait())
  73                 WriteFrame(*eo, output_data_file);
  74
  75             // Read a frame and start processing it with current eo
  76             if (ReadFrame(*eo, frame_idx, configuration, input_data_file))
  77                 eo->ProcessFrameStartAsync();
  78         }
  79
  80
  81
  82 Putting it together
  83 ===================
  84 The code snippet :ref:`tidl_main` illustrates using the API to offload a network.
  85
  86 .. literalinclude:: ../../examples/test/main.cpp
  87     :name: tidl_main
  88     :caption: examples/test/main.cpp
  89     :lines: 155-189,208-213,215-220
  90     :linenos:
  91
  92 For a complete example of using the API, refer ``examples/test/main.cpp`` on the EVM filesystem.