Report memory usage when device allocation fails
authorAjay Jayaraj <ajayj@ti.com>
Fri, 17 Aug 2018 20:14:15 +0000 (15:14 -0500)
committerAjay Jayaraj <ajayj@ti.com>
Tue, 21 Aug 2018 18:32:28 +0000 (13:32 -0500)
TIDL API creates 2 device side heaps:
1. Parameter heap
2. Network heap

The sizes of these heaps are specified in the Configuration object, via
PARAM_HEAP_SIZE and NETWORK_HEAP_SIZE.

Existing behavior: If the heaps are not large enough, allocation on the
device triggers an assertion failure with no indication of how large the
heaps need to be for successfull allocation.

To improve the usability of the API, provide feedback to the user on the
heap sizes required to satisfy device side allocations when any
allocation fails.

Also added `-Wall -Werror` when building examples and fixed failures.

(MCT-1035)

19 files changed:
docs/source/api.rst
docs/source/faq/out_of_memory.rst
docs/source/using_api.rst
examples/classification/main.cpp
examples/classification/multiple_executors.cpp
examples/make.common
examples/segmentation/main.cpp
examples/segmentation/object_classes.cpp
examples/ssd_multibox/main.cpp
examples/test/main.cpp
examples/test/multiple_executors.cpp
tidl_api/inc/configuration.h
tidl_api/inc/execution_object.h
tidl_api/src/configuration.cpp
tidl_api/src/execution_object.cpp
tidl_api/src/executor.cpp
tidl_api/src/ocl_device.cpp
tidl_api/src/trace.cpp
tidl_api/src/trace.h

index 6fd383ca164aa275ea68d734b8ed294687b7e968..3a3d221fd1cdb5d370a76a163e10f4ba5bd3799b 100644 (file)
@@ -56,9 +56,9 @@ The ``Configuration`` object specifies the sizes of 2 heaps. These heaps are all
 
     This field is used to specify the size of the device heap used for network parameters. The size depends on the size of the parameter binary file. For example, ``jsegnet21v2``'s parameter file, ``tidl_param_jsegnet21v2.bin`` is 2.6MB. Due to alignment reasons, the parameter heap must be 10% larger than the binary file size - in this case, 2.9MB. The constructor for ``Configuration`` sets PARAM_HEAP_SIZE to 9MB. There is one parameter heap for each instance of ``Executor`` .
 
-.. data:: std::size_t Configuration.EXTMEM_HEAP_SIZE
+.. data:: std::size_t Configuration.NETWORK_HEAP_SIZE
 
-    This field is used to specify the size of the device heap used for all allocations other than network parameters. The constructor for ``Configuration`` sets EXTMEM_HEAP_SIZE to 64MB.  There is one external memory heap for each instance of ``ExecutionObject``
+    This field is used to specify the size of the device heap used for all allocations other than network parameters. The constructor for ``Configuration`` sets NETWORK_HEAP_SIZE to 64MB.  There is one external memory heap for each instance of ``ExecutionObject``
 
 Debug
 +++++
index cba902cc68f7dfbd95197e559b76df4d710a15b8..ee97e58eb00d5b66cc7c3afda189aafee94934a6 100644 (file)
@@ -26,7 +26,7 @@ One possible reason is that previous runs of the application were aborted (e.g.
 
 Insufficient OpenCL global memory
 +++++++++++++++++++++++++++++++++
-Another possible reason is that total memory requirement specified in the ``Configuration`` using EXTMEM_HEAP_SIZE and PARAM_HEAP_SIZE exceeds default memory available for OpenCL.  Follow the instructions below to increase the amount of CMEM (contiguous memory available for OpenCL)
+Another possible reason is that total memory requirement specified in the ``Configuration`` using NETWORK_HEAP_SIZE and PARAM_HEAP_SIZE exceeds default memory available for OpenCL.  Follow the instructions below to increase the amount of CMEM (contiguous memory available for OpenCL) from 192MB (0xc000000) to 384MB (0x18000000):
 
 .. code:: bash
 
index 752bd4d3f10e4fbbf7fd42ce6ad21296539f671c..b41fe85e1beead9594cbabe7bc844f20a9d84983 100644 (file)
@@ -89,6 +89,101 @@ Run the network on each input frame.  The frames are processed with available ex
 
 For a complete example of using the API, refer any of the examples available at ``/usr/share/ti/tidl/examples`` on the EVM file system.
 
+Sizing device side heaps
+++++++++++++++++++++++++
+
+TIDL API allocates 2 heaps for device size allocations during network setup/initialization:
+
++-----------+-----------------------------------+-----------------------------+
+| Heap Name | Configuration parameter           | Default size                |
++-----------+-----------------------------------+-----------------------------+
+| Parameter | Configuration::PARAM_HEAP_SIZE    | 9MB,  1 per Executor        |
++-----------+-----------------------------------+-----------------------------+
+| Network   | Configuration::NETWORK_HEAP_SIZE  | 64MB, 1 per ExecutionObject |
++-----------+-----------------------------------+-----------------------------+
+
+Depending on the network being deployed, these defaults may be smaller or larger than required. In order to determine the exact sizes for the heaps, the following approach can be used:
+
+Start with the default heap sizes. The API displays heap usage statistics when Configuration::showHeapStats is set to true.
+
+.. code-block:: c++
+
+    Configuration configuration;
+    bool status = configuration.ReadFromFile(config_file);
+    configuration.showHeapStats = true;
+
+If the heap size is larger than required by device side allocations, the API displays usage statistics. When ``Free`` > 0, the heaps are larger than required.
+
+.. code-block:: bash
+
+    # ./test_tidl -n 1 -t e -c testvecs/config/infer/tidl_config_j11_v2.txt
+    API Version: 01.01.00.00.e4e45c8
+    [eve 0]         TIDL Device Trace: PARAM heap: Size 9437184, Free 6556180, Total requested 2881004
+    [eve 0]         TIDL Device Trace: NETWORK heap: Size 67108864, Free 47047680, Total requested 20061184
+
+
+Update the application to set the heap sizes to the "Total requested size" displayed:
+
+.. code-block:: c++
+
+    configuration.PARAM_HEAP_SIZE   = 2881004;
+    configuration.NETWORK_HEAP_SIZE = 20061184;
+
+.. code-block:: bash
+
+    # ./test_tidl -n 1 -t e -c testvecs/config/infer/tidl_config_j11_v2.txt
+    API Version: 01.01.00.00.e4e45c8
+    [eve 0]         TIDL Device Trace: PARAM heap: Size 2881004, Free 0, Total requested 2881004
+    [eve 0]         TIDL Device Trace: NETWORK heap: Size 20061184, Free 0, Total requested 20061184
+
+Now, the heaps are sized as required by network execution (i.e. ``Free`` is 0)
+and the ``configuration.showHeapStats = true`` line can be removed.
+
+.. note::
+
+    If the default heap sizes are smaller than required, the device will report an allocation failure and indicate the required minimum size. E.g.
+.. code-block:: bash
+
+    # ./test_tidl -n 1 -t e -c testvecs/config/infer/tidl_config_j11_v2.txt
+    API Version: 01.01.00.00.0ba86d4
+    [eve 0]         TIDL Device Error:  Allocation failure with NETWORK heap, request size 161472, avail 102512
+    [eve 0]         TIDL Device Error: Network heap must be >= 20061184 bytes, 19960944 not sufficient. Update Configuration::NETWORK_HEAP_SIZE
+    TIDL Error: [src/execution_object.cpp, Wait, 548]: Allocation failed on device
+
+.. note::
+
+    The memory for parameter and network heaps is itself allocated from OpenCL global memory (CMEM). Refer :ref:`opencl-global-memory` for details.
+
+
+Configuration file
+++++++++++++++++++
+TIDL API allows the user to create a Configuration object by reading from a file or by initializing it directly. Configuration settings supported by ``Configuration::ReadFromFile``:
+
+    * numFrames
+    * inWidth
+    * inHeight
+    * inNumChannels
+    * preProcType
+    * layerIndex2LayerGroupId
+
+    * inData
+    * outData
+
+    * netBinFile
+    * paramsBinFile
+
+    * enableTrace
+
+An example configuration file:
+
+.. literalinclude:: ../../examples/layer_output/j11_v2_trace.txt
+    :language: bash
+
+.. note::
+
+    Refer :ref:`api-documentation` for the complete set of parameters in the ``Configuration`` class and their description.
+
+
 Overriding layer group assignment
 +++++++++++++++++++++++++++++++++
 The `TIDL device translation tool`_ assigns layer group ids to layers during the translation process. TIDL API 1.1 and higher allows the user to override this assignment by specifying explicit mappings. There are two ways for the user to provide an updated mapping:
index e38376a814908db9dc1c9d5d312c5f5e8c6d888d..9b1b2cc732b43f2f39dc5985aca594447ece8c29 100644 (file)
@@ -60,22 +60,22 @@ int live_input = 1;
 char video_clip[320];
 
 #ifdef TWO_ROIs
-#define RES_X 400                                                              
-#define RES_Y 300                                                            
-#define NUM_ROI_X 2                                                     
-#define NUM_ROI_Y 1                                                      
-#define X_OFFSET 0                                                           
-#define X_STEP   176                                                        
-#define Y_OFFSET 52                                                         
+#define RES_X 400
+#define RES_Y 300
+#define NUM_ROI_X 2
+#define NUM_ROI_Y 1
+#define X_OFFSET 0
+#define X_STEP   176
+#define Y_OFFSET 52
 #define Y_STEP   224
 #else
 #define RES_X 244
-#define RES_Y 244                                                            
-#define NUM_ROI_X 1                                                     
-#define NUM_ROI_Y 1                                                      
-#define X_OFFSET 10                                                         
-#define X_STEP   224                                                     
-#define Y_OFFSET 10                                                    
+#define RES_Y 244
+#define NUM_ROI_X 1
+#define NUM_ROI_Y 1
+#define X_OFFSET 10
+#define X_STEP   224
+#define Y_OFFSET 10
 #define Y_STEP   224
 #endif
 
@@ -235,7 +235,7 @@ bool RunConfiguration(const std::string& config_file, int num_devices,
         }
 
 #ifdef LIVE_DISPLAY
-    if(NUM_ROI > 1) 
+    if(NUM_ROI > 1)
     {
       for(int i = 0; i < NUM_ROI; i ++) {
         char tmp_string[80];
@@ -350,7 +350,6 @@ bool RunConfiguration(const std::string& config_file, int num_devices,
                 double elapsed_host =
                                 ms_diff(t0[eo->GetFrameIndex() % num_eos], t1);
                 double elapsed_device = eo->GetProcessTimeInMilliSeconds();
-                double overhead = 100 - (elapsed_device/elapsed_host*100);
 #ifdef PERF_VERBOSE
                 std::cout << "frame[" << eo->GetFrameIndex() << "]: "
                           << "Time on device: "
@@ -381,7 +380,7 @@ bool RunConfiguration(const std::string& config_file, int num_devices,
                           << elapsed_host << "ms ";
              }
 
-             for (int r = 0; r < NUM_ROI; r ++) 
+             for (int r = 0; r < NUM_ROI; r ++)
              {
                int rpt_id =  ShowRegion(selclass_history[r]);
                 if(rpt_id >= 0)
@@ -470,10 +469,10 @@ bool RunConfiguration(const std::string& config_file, int num_devices,
           if(live_input == -1) {
             //Rewind!
             cap.release();
-            cap.open(std::string(video_clip)); 
+            cap.open(std::string(video_clip));
           }
         }
+
         }
 
         for (auto b : buffers)
@@ -676,8 +675,6 @@ int tf_postprocess(uchar *in, int size, int roi_idx, int frame_idx, int f_id)
   for (int i = k-1; i >= 0; i--)
   {
       int id = sorted[i].second;
-      char res2show[320];
-      bool found = false;
 
       if (tf_expected_id(id))
       {
@@ -685,7 +682,6 @@ int tf_postprocess(uchar *in, int size, int roi_idx, int frame_idx, int f_id)
                   << k-i << ", prob=" << (float) sorted[i].first / 255 << ", "
                   << labels_classes[sorted[i].second] << " accum_in=" << accum_in << std::endl;
         rpt_id = id;
-        found  = true;
       }
   }
   return rpt_id;
@@ -701,9 +697,9 @@ void tf_preprocess(uchar *out, uchar *in, int size)
 
 int ShowRegion(int roi_history[])
 {
-  if((roi_history[0] >= 0) && (roi_history[0] == roi_history[1])) return roi_history[0];    
-  if((roi_history[0] >= 0) && (roi_history[0] == roi_history[2])) return roi_history[0];    
-  if((roi_history[1] >= 0) && (roi_history[1] == roi_history[2])) return roi_history[1];    
+  if((roi_history[0] >= 0) && (roi_history[0] == roi_history[1])) return roi_history[0];
+  if((roi_history[0] >= 0) && (roi_history[0] == roi_history[2])) return roi_history[0];
+  if((roi_history[1] >= 0) && (roi_history[1] == roi_history[2])) return roi_history[1];
   return -1;
 }
 
index 78a1789c74332e4c3e4f7eaad8a652f1482d5ca5..3b9f6b1987f760cfb6cff10b18e9d0dc11df796d 100644 (file)
@@ -59,11 +59,11 @@ void* run_network(void *data);
 
 struct ThreadArg
 {
-    std::string config_file;
-    DeviceIds ids;
     ThreadArg(const DeviceIds& ids, const std::string& s):
         ids(ids), config_file(s) {}
 
+    DeviceIds ids;
+    std::string config_file;
 };
 
 bool thread_status[2];
index 87a4f3130a1fe5a10d22033b18a22d4a80f1163b..aab1d584fac5021dd9a209e0ec2b7f5489e18d72 100644 (file)
@@ -39,7 +39,7 @@ include $(TIDL_API_DIR)/make.inc
 ifeq ($(BUILD), debug)
        CXXFLAGS += -Og -g -ggdb
 else
-       CXXFLAGS += -O3
+       CXXFLAGS += -O3 -Wall -Werror -Wno-error=ignored-attributes
 endif
 
 CXXFLAGS += -I. -I$(TIDL_API_DIR)/inc -std=c++11
index 86f81e55cd8c21f464ef19320ff86ef83f5bf073..67f1a4acaeff3e80138096f1240efc50fe5fe04d 100644 (file)
@@ -338,7 +338,6 @@ bool WriteFrameOutput(const ExecutionObject &eo,
                       const Configuration& configuration)
 {
     unsigned char *out = (unsigned char *) eo.GetOutputBufferPtr();
-    int out_size       = eo.GetOutputBufferSizeInBytes();
     int width          = configuration.inWidth;
     int height         = configuration.inHeight;
     int channel_size   = width * height;
index d930106f0aec6ddac344f04413a4851060bfb702..35aebf744d338cc841813024d96747355e25d692 100644 (file)
@@ -62,7 +62,7 @@ object_class_table_t* GetObjectClassTable(std::string &config)
 
 object_class_t* GetObjectClass(object_class_table_t *table, int index)
 {
-    if (index < 0 || index >= table->num_classes)  index = table->num_classes;
+    if (index < 0 || (unsigned int)index >= table->num_classes)  index = table->num_classes;
     return & (table->classes[index]);
 }
 
index b302cfa7529128f69580152adcc17169af6db815..70cda0663384d2388751bd1a9a663cfb8edf55a9 100644 (file)
@@ -150,9 +150,9 @@ bool RunConfiguration(const std::string& config_file,
                       DeviceType device_type, std::string& input_file)
 {
     DeviceIds ids_eve, ids_dsp;
-    for (int i = 0; i < num_eves; i++)
+    for (unsigned int i = 0; i < num_eves; i++)
         ids_eve.insert(static_cast<DeviceId>(i));
-    for (int i = 0; i < num_dsps; i++)
+    for (unsigned int i = 0; i < num_dsps; i++)
         ids_dsp.insert(static_cast<DeviceId>(i));
 
     // Read the TI DL configuration file
@@ -378,7 +378,6 @@ bool WriteFrameOutput(const ExecutionObjectPipeline& eop,
         if (index < 0)  break;
 
         int   label = (int)  out[i * 7 + 1];
-        float score =        out[i * 7 + 2];
         int   xmin  = (int) (out[i * 7 + 3] * width);
         int   ymin  = (int) (out[i * 7 + 4] * height);
         int   xmax  = (int) (out[i * 7 + 5] * width);
@@ -449,7 +448,7 @@ void ProcessArgs(int argc, char *argv[], std::string& config,
                       break;
 
             case 'd': num_dsps = atoi(optarg);
-                      assert (num_dsps > 0 && num_dsps <= 
+                      assert (num_dsps > 0 && num_dsps <=
                                      Executor::GetNumDevices(DeviceType::DSP));
                       break;
 
index 4c82ca0ee7494c459206f4eefce5b93a21d625e4..5fdfd641ab47daf4e46c5cb7ba43cf6cea6dd768 100644 (file)
@@ -102,7 +102,7 @@ int main(int argc, char *argv[])
         {
             // Run on 2 devices because there is not enough CMEM available by
             // default
-            if (num_eve = 4)
+            if (num_eve == 4)
             {
                 std::cout
                  << "Running on 2 EVE devices instead of the available 4 "
index 78a1789c74332e4c3e4f7eaad8a652f1482d5ca5..41821b55a11e6c62e222c1b39fce8d9a91e96971 100644 (file)
@@ -59,11 +59,11 @@ void* run_network(void *data);
 
 struct ThreadArg
 {
-    std::string config_file;
-    DeviceIds ids;
-    ThreadArg(const DeviceIds& ids, const std::string& s):
-        ids(ids), config_file(s) {}
+    ThreadArg(const DeviceIds& device_ids, const std::string& s):
+        ids(device_ids), config_file(s) {}
 
+    DeviceIds   ids;
+    std::string config_file;
 };
 
 bool thread_status[2];
index d1ebf09ed176f9c8fbe295e4c0e92b3a06999e47..8ed6be2f5d6616b23b04da7919f5e303e013e74a 100644 (file)
@@ -69,10 +69,11 @@ class Configuration
     //! outputs of previous layersGroupId, instead of from user application
     bool     enableInternalInput;
 
-    //! Size of the TI DL per Execution Object heap
-    size_t EXTMEM_HEAP_SIZE;
+    //! Size of the device side heap, used for allocating memory required to
+    //! run the network on the device. One per Execution Object.
+    size_t NETWORK_HEAP_SIZE;
 
-    //! Size of the heap used for paramter data
+    //! Size of the heap used for parameter data. One per Executor.
     size_t PARAM_HEAP_SIZE;
 
     //! @brief Location of the input file
@@ -92,6 +93,14 @@ class Configuration
     //! Enable tracing of output buffers associated with each layer
     bool enableOutputTrace;
 
+    //! Debug - Generates a trace of host and device function calls
+    bool enableApiTrace;
+
+    //! Debug - Shows total size of PARAM and NETWORK heaps. Also shows bytes
+    //! available after all allocations. Can be used to adjust the heap
+    //! size
+    bool showHeapStats;
+
     //! Map of layer index to layer group id. Used to override layer group
     //! assigment for layers. Any layer not specified in this map will
     //! retain its existing mapping.
index c1d86fc126bb8e243a67df04d30bfb5c3aca63d8..dad586678cac660dc4a682abd68355d10792e5d1 100644 (file)
@@ -31,6 +31,7 @@
 #pragma once
 
 #include <memory>
+#include "configuration.h"
 #include "execution_object_internal.h"
 
 namespace tidl {
@@ -54,10 +55,8 @@ class ExecutionObject : public ExecutionObjectInternalInterface
         ExecutionObject(Device* d, uint8_t device_index,
                         const  ArgInfo& create_arg,
                         const  ArgInfo& param_heap_arg,
-                        size_t extmem_heap_size,
-                        int    layersGroupId,
-                        bool   output_trace,
-                        bool   internal_input);
+                        const  Configuration& configuration,
+                        int    layersGroupId);
         //! @private
         ~ExecutionObject();
 
index f9ec0bffa97e3fec8903848aff8f6a848a3b2e03..daa15f41dcd5003dc7a559645706b0ae4d2c893a 100644 (file)
@@ -39,10 +39,12 @@ Configuration::Configuration(): numFrames(0), inHeight(0), inWidth(0),
                      noZeroCoeffsPercentage(100),
                      preProcType(0),
                      runFullNet(false),
-                     enableInternalInput(0),
-                     EXTMEM_HEAP_SIZE(64 << 20),  // 64MB for inceptionNetv1
+                     enableInternalInput(false),
+                     NETWORK_HEAP_SIZE(64 << 20),  // 64MB for inceptionNetv1
                      PARAM_HEAP_SIZE(9 << 20),    // 9MB for mobileNet1
-                     enableOutputTrace(false)
+                     enableOutputTrace(false),
+                     enableApiTrace(false),
+                     showHeapStats(false)
 {
 }
 
@@ -58,7 +60,7 @@ void Configuration::Print(std::ostream &os) const
        << "\nOutputFile               " << outData
        << "\nNetwork                  " << netBinFile
        << "\nParameters               " << paramsBinFile
-       << "\nEO Heap Size (MB)        " << (EXTMEM_HEAP_SIZE >> 20)
+       << "\nEO Heap Size (MB)        " << (NETWORK_HEAP_SIZE >> 20)
        << "\nParameter heap size (MB) " << (PARAM_HEAP_SIZE >> 20)
        << "\n";
 }
@@ -87,14 +89,11 @@ bool Configuration::Validate() const
         errors++;
     }
 
-    size_t paramsBinFileSize = 0;
     if (stat(paramsBinFile.c_str(), &buffer) != 0)
     {
         std::cerr << "paramsBinFile not found: " << paramsBinFile << std::endl;
         errors++;
     }
-    else
-        paramsBinFileSize = buffer.st_size;
 
     if (!inData.empty() && stat(inData.c_str(), &buffer) != 0)
     {
@@ -102,16 +101,6 @@ bool Configuration::Validate() const
         errors++;
     }
 
-    // Due to alignment, the parameter heap must be larger than the
-    // parameter binary. Using 1.1 as a conservative factor.
-    if (paramsBinFileSize > 0 &&
-            (paramsBinFileSize * 1.1) > PARAM_HEAP_SIZE)
-    {
-        std::cerr << "Parameter binary file larger than paramter heap. "
-                     "Increase Configuration::PARAM_HEAP_SIZE" << std::endl;
-        errors++;
-    }
-
     if (errors > 0)
         return false;
 
index 178bbcaeb9f256c47c38df355171622c8413638f..e89a96e075a04b2ab2f0fb9843bdc24fb53a6679 100644 (file)
@@ -39,7 +39,6 @@
 #include "trace.h"
 #include "ocl_device.h"
 #include "parameters.h"
-#include "configuration.h"
 #include "common_defines.h"
 #include "tidl_create_params.h"
 #include "device_arginfo.h"
@@ -52,10 +51,8 @@ class ExecutionObject::Impl
         Impl(Device* d, uint8_t device_index,
              const DeviceArgInfo& create_arg,
              const DeviceArgInfo& param_heap_arg,
-             size_t extmem_heap_size,
-             int    layers_group_id,
-             bool   output_trace,
-             bool   internal_input);
+             const Configuration& configuration,
+             int    layers_group_id);
         ~Impl() {}
 
         bool RunAsync(CallType ct);
@@ -103,9 +100,7 @@ class ExecutionObject::Impl
 
     private:
         void SetupInitializeKernel(const DeviceArgInfo& create_arg,
-                                   const DeviceArgInfo& param_heap_arg,
-                                   size_t extmem_heap_size,
-                                   bool   internal_input);
+                                   const DeviceArgInfo& param_heap_arg);
         void EnableOutputBufferTrace();
         void SetupProcessKernel();
 
@@ -121,18 +116,20 @@ class ExecutionObject::Impl
         bool                            is_idle_m;
         std::mutex                      mutex_access_m;
         std::condition_variable         cv_access_m;
+
+        const Configuration             configuration_m;
 };
 
 
 ExecutionObject::ExecutionObject(Device* d,
                                  uint8_t device_index,
-                                 const ArgInfo& create_arg,
-                                 const ArgInfo& param_heap_arg,
-                                 size_t extmem_heap_size,
-                                 int    layers_group_id,
-                                 bool   output_trace,
-                                 bool   internal_input)
+                                 const   ArgInfo& create_arg,
+                                 const   ArgInfo& param_heap_arg,
+                                 const   Configuration& configuration,
+                                 int     layers_group_id)
 {
+    TRACE::print("-> ExecutionObject::ExecutionObject()\n");
+
     DeviceArgInfo create_arg_d(create_arg, DeviceArgInfo::Kind::BUFFER);
     DeviceArgInfo param_heap_arg_d(param_heap_arg, DeviceArgInfo::Kind::BUFFER);
 
@@ -140,21 +137,17 @@ ExecutionObject::ExecutionObject(Device* d,
               { new ExecutionObject::Impl(d, device_index,
                                           create_arg_d,
                                           param_heap_arg_d,
-                                          extmem_heap_size,
-                                          layers_group_id,
-                                          output_trace,
-                                          internal_input) };
+                                          configuration,
+                                          layers_group_id) };
+    TRACE::print("<- ExecutionObject::ExecutionObject()\n");
 }
 
 
-ExecutionObject::Impl::Impl(Device* d,
-                                 uint8_t device_index,
-                                 const DeviceArgInfo& create_arg,
-                                 const DeviceArgInfo& param_heap_arg,
-                                 size_t extmem_heap_size,
-                                 int    layers_group_id,
-                                 bool   output_trace,
-                                 bool   internal_input):
+ExecutionObject::Impl::Impl(Device* d, uint8_t device_index,
+                            const DeviceArgInfo& create_arg,
+                            const DeviceArgInfo& param_heap_arg,
+                            const Configuration& configuration,
+                            int    layers_group_id):
     device_m(d),
     device_index_m(device_index),
     tidl_extmem_heap_m (nullptr, &__free_ddr),
@@ -172,7 +165,8 @@ ExecutionObject::Impl::Impl(Device* d,
     k_initialize_m(nullptr),
     k_process_m(nullptr),
     k_cleanup_m(nullptr),
-    is_idle_m(true)
+    is_idle_m(true),
+    configuration_m(configuration)
 {
     device_name_m = device_m->GetDeviceName() + std::to_string(device_index_m);
     // Save number of layers in the network
@@ -180,10 +174,11 @@ ExecutionObject::Impl::Impl(Device* d,
                 static_cast<const TIDL_CreateParams *>(create_arg.ptr());
     num_network_layers_m = cp->net.numLayers;
 
-    SetupInitializeKernel(create_arg, param_heap_arg, extmem_heap_size,
-                          internal_input);
+    SetupInitializeKernel(create_arg, param_heap_arg);
+
+    if (configuration_m.enableOutputTrace)
+        EnableOutputBufferTrace();
 
-    if (output_trace)  EnableOutputBufferTrace();
     SetupProcessKernel();
 }
 
@@ -240,12 +235,14 @@ void ExecutionObject::SetInputOutputBuffer(const IODeviceArgInfo* in,
 
 bool ExecutionObject::ProcessFrameStartAsync()
 {
+    TRACE::print("-> ExecutionObject::ProcessFrameStartAsync()\n");
     assert(GetInputBufferPtr() != nullptr && GetOutputBufferPtr() != nullptr);
     return pimpl_m->RunAsync(ExecutionObject::CallType::PROCESS);
 }
 
 bool ExecutionObject::ProcessFrameWait()
 {
+    TRACE::print("-> ExecutionObject::ProcessFrameWait()\n");
     return pimpl_m->Wait(ExecutionObject::CallType::PROCESS);
 }
 
@@ -317,12 +314,11 @@ void ExecutionObject::ReleaseLock()
 //
 void
 ExecutionObject::Impl::SetupInitializeKernel(const DeviceArgInfo& create_arg,
-                                             const DeviceArgInfo& param_heap_arg,
-                                             size_t extmem_heap_size,
-                                             bool   internal_input)
+                                             const DeviceArgInfo& param_heap_arg)
 {
     // Allocate a heap for TI DL to use on the device
-    tidl_extmem_heap_m.reset(malloc_ddr<char>(extmem_heap_size));
+    tidl_extmem_heap_m.reset(
+                         malloc_ddr<char>(configuration_m.NETWORK_HEAP_SIZE));
 
     // Create a kernel for cleanup
     KernelArgs cleanup_args;
@@ -335,17 +331,21 @@ ExecutionObject::Impl::SetupInitializeKernel(const DeviceArgInfo& create_arg,
     memset(shared_initialize_params_m.get(), 0,
            sizeof(OCL_TIDL_InitializeParams));
 
-    shared_initialize_params_m->tidlHeapSize = extmem_heap_size;
+    shared_initialize_params_m->tidlHeapSize =configuration_m.NETWORK_HEAP_SIZE;
     shared_initialize_params_m->l2HeapSize   = tidl::internal::DMEM1_SIZE;
     shared_initialize_params_m->l1HeapSize   = tidl::internal::DMEM0_SIZE;
-    shared_initialize_params_m->enableTrace  = OCL_TIDL_TRACE_OFF;
-    shared_initialize_params_m->enableInternalInput = internal_input ? 1 : 0;
+    shared_initialize_params_m->enableInternalInput =
+                   configuration_m.enableInternalInput ? 1 : 0;
+
+    // Set up execution trace specified in the configuration
+    EnableExecutionTrace(configuration_m,
+                         &shared_initialize_params_m->enableTrace);
 
     // Setup kernel arguments for initialize
     KernelArgs args = { create_arg,
                         param_heap_arg,
                         DeviceArgInfo(tidl_extmem_heap_m.get(),
-                                      extmem_heap_size,
+                                      configuration_m.NETWORK_HEAP_SIZE,
                                       DeviceArgInfo::Kind::BUFFER),
                         DeviceArgInfo(shared_initialize_params_m.get(),
                                       sizeof(OCL_TIDL_InitializeParams),
@@ -394,11 +394,14 @@ void
 ExecutionObject::Impl::SetupProcessKernel()
 {
     shared_process_params_m.reset(malloc_ddr<OCL_TIDL_ProcessParams>());
-    shared_process_params_m->enableTrace = OCL_TIDL_TRACE_OFF;
     shared_process_params_m->enableInternalInput =
                                shared_initialize_params_m->enableInternalInput;
     shared_process_params_m->cycles = 0;
 
+    // Set up execution trace specified in the configuration
+    EnableExecutionTrace(configuration_m,
+                         &shared_process_params_m->enableTrace);
+
     KernelArgs args = { DeviceArgInfo(shared_process_params_m.get(),
                                       sizeof(OCL_TIDL_ProcessParams),
                                       DeviceArgInfo::Kind::BUFFER),
index 914c78ab58104eeba379db5ae8305e45537d007e..4d4c1562ac0d458c2fe7791609719a4fb6258db9 100644 (file)
@@ -41,9 +41,15 @@ using std::unique_ptr;
 Executor::Executor(DeviceType core_type, const DeviceIds& ids,
                    const Configuration& configuration, int layers_group_id)
 {
+    TRACE::enabled = configuration.enableApiTrace;
+
+    TRACE::print("-> Executor::Executor()\n");
+
     pimpl_m = unique_ptr<ExecutorImpl>
               { new ExecutorImpl(core_type, ids, layers_group_id) };
     pimpl_m->Initialize(configuration);
+
+    TRACE::print("<- Executor::Executor()\n");
 }
 
 
@@ -150,10 +156,8 @@ bool ExecutorImpl::Initialize(const Configuration& configuration)
              unique_ptr<ExecutionObject>
              {new ExecutionObject(device_m.get(), index,
                                   create_arg, param_heap_arg,
-                                  configuration_m.EXTMEM_HEAP_SIZE,
-                                  layers_group_id_m,
-                                  configuration_m.enableOutputTrace,
-                                  configuration_m.enableInternalInput)} );
+                                  configuration_m,
+                                  layers_group_id_m)} );
     }
 
     for (auto &eo : execution_objects_m)
@@ -186,7 +190,9 @@ bool ExecutorImpl::InitializeNetworkParams(TIDL_CreateParams *cp)
                                             malloc_ddr<OCL_TIDL_SetupParams>(),
                                             &__free_ddr);
 
-    setupParams->enableTrace = OCL_TIDL_TRACE_OFF;
+    // Set up execution trace specified in the configuration
+    EnableExecutionTrace(configuration_m, &setupParams->enableTrace);
+
     setupParams->networkParamHeapSize = configuration_m.PARAM_HEAP_SIZE;
     setupParams->noZeroCoeffsPercentage = configuration_m.noZeroCoeffsPercentage;
     setupParams->sizeofTIDL_CreateParams = sizeof(TIDL_CreateParams);
@@ -267,6 +273,7 @@ Exception::Exception(const std::string& error, const std::string& file,
     message_m += error;
 }
 
+// Refer ti-opencl/builtins/include/custom.h for error codes
 Exception::Exception(int32_t errorCode, const std::string& file,
                      const std::string& func, uint32_t line_no)
 {
@@ -278,20 +285,28 @@ Exception::Exception(int32_t errorCode, const std::string& file,
     message_m += std::to_string(line_no);
     message_m += "]: ";
 
-    if (errorCode == OCL_TIDL_ERROR)
+    switch (errorCode)
+    {
+        case OCL_TIDL_ERROR:
         message_m += "";
-    else if (errorCode == OCL_TIDL_ALLOC_FAIL)
-        message_m += "Allocation failed on device";
-    else if (errorCode == OCL_TIDL_MEMREC_ALLOC_FAIL)
-        message_m += "Memrec allocation failed on device";
-    else if (errorCode == OCL_TIDL_PROCESS_FAIL)
+            break;
+        case OCL_TIDL_ALLOC_FAIL:
+        case OCL_TIDL_MEMREC_ALLOC_FAIL:
+            message_m += "Memory allocation failed on device";
+            break;
+        case OCL_TIDL_PROCESS_FAIL:
         message_m += "Process call failed on device";
-    else if (errorCode == OCL_TIDL_CREATE_PARAMS_MISMATCH)
-        message_m += "TIDL_CreateParams definition inconsistent across host"
-                     "and device.";
-    else
+            break;
+        case OCL_TIDL_CREATE_PARAMS_MISMATCH:
+            message_m += "TIDL API headers inconsistent with OpenCL";
+            break;
+        case OCL_TIDL_INIT_FAIL:
+            message_m += "Initialization failed on device";
+            break;
+        default:
         message_m += std::to_string(errorCode);
-
+            break;
+    }
 }
 
 const char* Exception::what() const noexcept
index b3eaf36d4894a8c2f0b15f60d0d24889a9dcc4fa..1e4d27779e657cca6c687da6fbc46fb489a8f945 100644 (file)
@@ -281,7 +281,8 @@ Kernel::Kernel(Device* device, const std::string& name,
                 clSetKernelArg(kernel_m, arg_index, sizeof(cl_mem), &buffer);
                 TRACE::print("  Arg[%d]: %p\n", arg_index, buffer);
 
-                buffers_m.push_back(buffer);
+                if (buffer)
+                    buffers_m.push_back(buffer);
             }
             else if (arg.kind() == DeviceArgInfo::Kind::SCALAR)
             {
index 2f74562beaddd71fb79dcf9a88dc2dc0c67f646c..13ff90a7fc48bbb9dcdc4b8ececdc431ad7a040b 100644 (file)
 
 
 #include "trace.h"
+#include "custom.h"
 
 using namespace tidl;
 
-
-#if defined(OA_ENABLE_TRACE)
-extern bool __attribute__((weak)) __TI_show_debug_;
+bool TRACE::enabled = false;
 
 void TRACE::print(const char *fmt, ...)
 {
-    bool enabled = (&__TI_show_debug_ ? __TI_show_debug_ : false);
     if (!enabled)
         return;
 
@@ -47,5 +45,11 @@ void TRACE::print(const char *fmt, ...)
     va_end(ap);
     std::fflush(stdout);
 }
-#endif
 
+void tidl::EnableExecutionTrace(const Configuration& config,
+                                uint32_t* enableDeviceTrace)
+{
+    if (config.showHeapStats)       *enableDeviceTrace = OCL_TIDL_TRACE_HEAP;
+    else if (config.enableApiTrace) *enableDeviceTrace = OCL_TIDL_TRACE_API;
+    else                            *enableDeviceTrace = OCL_TIDL_TRACE_OFF;
+}
index bdbcb87a19b8fcfcb820e1a5feec5663e31a9070..27dbdec6492f1e3c224612f653c56ac728455117 100644 (file)
  * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
  * THE POSSIBILITY OF SUCH DAMAGE.
  *****************************************************************************/
-
-
 #pragma once
 
-#define OA_ENABLE_TRACE (1)
-
 #include <cstdio>
 #include <cstdarg>
+#include "configuration.h"
 
 namespace tidl {
 
@@ -41,10 +38,11 @@ class TRACE
 {
     public:
         static void print(const char *fmt, ...);
+        static bool enabled;
 };
 
-#if !defined(OA_ENABLE_TRACE)
-inline void TRACE::print(const char * __attribute__ ((unused)) fmt, ...) {}
-#endif
+void EnableExecutionTrace(const Configuration& config,
+                          uint32_t* enableDeviceTrace);
+
 }