summary | shortlog | log | commit | commitdiff | tree
raw | patch | inline | side by side (parent: e54b079)
raw | patch | inline | side by side (parent: e54b079)
author | Ajay Jayaraj <ajayj@ti.com> | |
Mon, 25 Jun 2018 21:53:29 +0000 (16:53 -0500) | ||
committer | Ajay Jayaraj <ajayj@ti.com> | |
Tue, 26 Jun 2018 16:12:24 +0000 (11:12 -0500) |
(MCT-1005)
13 files changed:
index ed6564d2dbb9fb17712509fb414f0e29f5af527d..7036d8dff87c1169aaf00b37be9cab339e1bc433 100644 (file)
--- a/docs/source/example.rst
+++ b/docs/source/example.rst
We ship three end-to-end examples within the tidl-api package
to demonstrate three categories of deep learning networks. The first
-two examples can run on AM57x SoCs with either DLA or DSP. The last
-example requires AM57x SoCs with both DLA and DSP. The performance
+two examples can run on AM57x SoCs with either EVE or DSP. The last
+example requires AM57x SoCs with both EVE and DSP. The performance
numbers that we present here were obtained on an AM5729 EVM, which
-includes 2 ARM A15 cores running at 1.5GHz, 4 DLA cores at 535MHz, and
+includes 2 ARM A15 cores running at 1.5GHz, 4 EVE cores at 535MHz, and
2 DSP cores at 750MHz.
Imagenet
as the most likely objects that the input image can be.
The following figure and tables shows an input image, top 5 predicted
-objects as output, and the processing time on either DLA or DSP.
+objects as output, and the processing time on either EVE or DSP.
.. image:: ../../examples/test/testvecs/input/objects/cat-pet-animal-domestic-104827.jpeg
:width: 600
====================== ==================== ============
Device Processing Time Host Processing Time API Overhead
====================== ==================== ============
- DLA: 123.1 ms 124.7 ms 1.34 %
+ EVE: 123.1 ms 124.7 ms 1.34 %
**OR**
DSP: 117.9 ms 119.3 ms 1.14 %
====================== ==================== ============
The particular network that we ran in this category, jacintonet11v2,
-has 14 layers. User can specify whether to run the network on DLA or DSP
-for acceleration. We can see that DLA time is slightly higher than DSP time.
+has 14 layers. User can specify whether to run the network on EVE or DSP
+for acceleration. We can see that EVE time is slightly higher than DSP time.
Host time includes the OpenCL runtime overhead and the time to copy user
input data into padded TIDL buffers. We can see that the overall overhead
is less than 1.5%.
The network we ran in this category is jsegnet21v2, which has 26 layers.
From the reported time in the following table, we can see that this network
-runs significantly faster on DLA than on DSP.
+runs significantly faster on EVE than on DSP.
.. table::
====================== ==================== ============
Device Processing Time Host Processing Time API Overhead
====================== ==================== ============
- DLA: 296.5 ms 303.3 ms 2.26 %
+ EVE: 296.5 ms 303.3 ms 2.26 %
**OR**
DSP: 812.0 ms 818.4 ms 0.79 %
====================== ==================== ============
.. image:: images/pexels-photo-378570-ssd.jpg
:width: 600
-The network can be run entirely on either DLA or DSP. But the best
-performance comes with running the first 30 layers on DLA and the
+The network can be run entirely on either EVE or DSP. But the best
+performance comes with running the first 30 layers on EVE and the
next 13 layers on DSP, for this particular jdetnet_ssd network.
Note the **AND** in the following table for the reported time.
Our end-to-end example shows how easy it is to assign a layers group id
====================== ==================== ============
Device Processing Time Host Processing Time API Overhead
====================== ==================== ============
- DLA: 175.2 ms 179.1 ms 2.14 %
+ EVE: 175.2 ms 179.1 ms 2.14 %
**AND**
DSP: 21.1 ms 22.3 ms 5.62 %
====================== ==================== ============
root@am57xx-evm:/usr/share/ti/tidl-api/examples/segmentation# cd ../ssd_multibox/; make -j4
root@am57xx-evm:/usr/share/ti/tidl-api/examples/ssd_multibox# ./ssd_multibox -i ../test/testvecs/input/roads/pexels-photo-378570.jpeg
Input: ../test/testvecs/input/roads/pexels-photo-378570.jpeg
- frame[0]: Time on DLA: 175.2ms, host: 179ms API overhead: 2.1 %
+ frame[0]: Time on EVE: 175.2ms, host: 179ms API overhead: 2.1 %
frame[0]: Time on DSP: 21.06ms, host: 22.43ms API overhead: 6.08 %
Saving frame 0 with SSD multiboxes to: multibox_0.png
Loop total time (including read/write/print/etc): 423.8ms
index 26a8a7ce0aa5de8dccd26a6e05693503d3aabcc7..59c52f2987a03ec017947c86bd83b5953c64b12b 100755 (executable)
Binary files a/docs/source/images/tidl-api.png and b/docs/source/images/tidl-api.png differ
Binary files a/docs/source/images/tidl-api.png and b/docs/source/images/tidl-api.png differ
diff --git a/docs/source/intro.rst b/docs/source/intro.rst
index 92124aa8871e25164ff486fea2ce06f1d028a64f..407a39e1ea387ca4ba030c3ce86118ab07237dab 100644 (file)
--- a/docs/source/intro.rst
+++ b/docs/source/intro.rst
Introduction
************
-TI Deep Learning (TIDL) API brings deep learning to the edge by enabling applications to leverage TI's proprietary, highly optimized CNN/DNN implementation on Deep Learning Accelerator (DLA) and C66x DSP compute engines. TIDL will initially target Vision/2D use cases on AM57x SoCs.
+TI Deep Learning (TIDL) API brings deep learning to the edge by enabling applications to leverage TI's proprietary, highly optimized CNN/DNN implementation on the EVE and C66x DSP compute engines. TIDL will initially target Vision/2D use cases on AM57x SoCs.
This User's Guide covers the TIDL API. For information on TIDL such as the overall development flow, techniques to optimize performance of CNN/DNN on TI's SoCs,performance/benchmarking data and list of supported layers, see the TIDL section in the `Processor SDK Linux Software Developer's Guide`_.
Ease of use
+++++++++++
* Easily integrate TIDL APIs into other frameworks such as `OpenCV`_
-* Provides a common host abstraction for user applications across multiple compute engines (DLAs and C66x DSPs)
+* Provides a common host abstraction for user applications across multiple compute engines (EVEs and C66x DSPs)
Low overhead
+++++++++++++
Software Architecture
---------------------
-The TIDL API leverages TI's `OpenCL`_ product to offload deep learning applications to both DLA(s) and DSP(s). The TIDL API significantly improves the out-of-box deep learning experience for users and enables them to focus on their overall use case. They do not have to spend time on the mechanics of ARM ↔ DSP/DLA communication or implementing optimized network layers on DLA(s) and/or DSP(s). The API allows customers to easily integrate frameworks such as OpenCV and rapidly prototype deep learning applications.
+The TIDL API leverages TI's `OpenCL`_ product to offload deep learning applications to both EVE(s) and DSP(s). The TIDL API significantly improves the out-of-box deep learning experience for users and enables them to focus on their overall use case. They do not have to spend time on the mechanics of ARM ↔ DSP/EVE communication or implementing optimized network layers on EVE(s) and/or DSP(s). The API allows customers to easily integrate frameworks such as OpenCV and rapidly prototype deep learning applications.
.. _`TIDL Development flow`:
TIDL API Software Architecture
-TIDL APIs provide three intuitive C++ classes. ``Configuration`` encapsulates a network configuration, including pointers to the network and parameter binary files. ``Executor`` encapsulates on-device memory allocation, network setup and initialization. ``ExecutionObject`` encapsulates TIDL processing on a single DSP or DLA core. Implementation of these classes will call into OpenCL runtime to offload network processing onto DLA/DSP devices, abstracting these details from the user.
+TIDL APIs provide three intuitive C++ classes. ``Configuration`` encapsulates a network configuration, including pointers to the network and parameter binary files. ``Executor`` encapsulates on-device memory allocation, network setup and initialization. ``ExecutionObject`` encapsulates TIDL processing on a single DSP or EVE core. Implementation of these classes will call into OpenCL runtime to offload network processing onto EVE/DSP devices, abstracting these details from the user.
-:numref:`simple-example` illustrates how easy it is to use TIDL APIs to leverage deep learning application in user applications. In this example, a configuration object is created from reading a TIDL network config file. An executor object is created with two DLA devices. It uses the configuration object to setup and initialize TIDL network on DLAs. Each of the two execution objects dispatches TIDL processing to a different DLA core. Because the OpenCL kernel execution is asynchronous, we can pipeline the frames across two DLAs. When one frame is being processed by a DLA, the next frame can be processed by another DLA.
+:numref:`simple-example` illustrates how easy it is to use TIDL APIs to leverage deep learning application in user applications. In this example, a configuration object is created from reading a TIDL network config file. An executor object is created with two EVE devices. It uses the configuration object to setup and initialize TIDL network on EVEs. Each of the two execution objects dispatches TIDL processing to a different EVE core. Because the OpenCL kernel execution is asynchronous, we can pipeline the frames across two EVEs. When one frame is being processed by a EVE, the next frame can be processed by another EVE.
.. code-block:: c++
Configuration configuration;
bool status = configuration.ReadFromFile(“./tidl_j11v2_net");
- // Create an executor with 2 DLAs and configuration
+ // Create an executor with 2 EVEs and configuration
DeviceIds ids = {DeviceId::ID0, DeviceId::ID1};
- Executor executor(DeviceType::DLA, ids, configuration);
+ Executor executor(DeviceType::EVE, ids, configuration);
// Query Executor for set of ExecutionObjects created
const ExecutionObjects& eos = executor.GetExecutionObjects();
- int num_eos = eos.size(); // 2 DLAs
+ int num_eos = eos.size(); // 2 EVEs
// Allocate input and output buffers for each execution object
for (auto &eo : eos)
eo->SetInputOutputBuffer(in, out);
}
- // Pipelined processing with 2 DLA cores
+ // Pipelined processing with 2 EVE cores
for (int idx = 0; idx < configuration.numFrames + num_eos; idx++)
{
ExecutionObject* eo = eos[idx % num_eos].get();
}
-``ReadFrameInput`` and ``WriteFrameOutput`` functions are used to read an input frame and write the result of processing. For example, with OpenCV, ``ReadFrameInput`` is implemented using OpenCV APIs to capture a frame. To execute the same network on DSPs, the only change to :numref:`simple-example` is to replace ``DeviceType::DLA`` with ``DeviceType::DSP``.
+``ReadFrameInput`` and ``WriteFrameOutput`` functions are used to read an input frame and write the result of processing. For example, with OpenCV, ``ReadFrameInput`` is implemented using OpenCV APIs to capture a frame. To execute the same network on DSPs, the only change to :numref:`simple-example` is to replace ``DeviceType::EVE`` with ``DeviceType::DSP``.
Section :ref:`using-tidl-api` contains details on using the APIs. The APIs themselves are documented in section :ref:`api-documentation`.
-Sometimes it is beneficial to partition a network and run different parts on different cores because some types of layers could run faster on DLAs while other types could run faster on DSPs. TIDL APIs provide the flexibility to run partitioned network across DLAs and DSPs. Refer the :ref:`ssd-example` example for details.
+Sometimes it is beneficial to partition a network and run different parts on different cores because some types of layers could run faster on EVEs while other types could run faster on DSPs. TIDL APIs provide the flexibility to run partitioned network across EVEs and DSPs. Refer the :ref:`ssd-example` example for details.
.. _Processor SDK Linux Software Developer's Guide: http://software-dl.ti.com/processor-sdk-linux/esd/docs/latest/linux/index.html
.. _OpenCV: http://software-dl.ti.com/processor-sdk-linux/esd/docs/latest/linux/Foundational_Components.html#opencv
index f7c4eaedaf8960b6bc6c111d46df3bda60ab4887..02c24ff82438928a9f6528901f2697660ba68465 100644 (file)
Using the TIDL API
******************
-This example illustrates using the TIDL API to offload deep learning network processing from a Linux application to the C66x DSPs or DLAs on AM57x devices. The API consists of three classes: ``Configuration``, ``Executor`` and ``ExecutionObject``.
+This example illustrates using the TIDL API to offload deep learning network processing from a Linux application to the C66x DSPs or EVEs on AM57x devices. The API consists of three classes: ``Configuration``, ``Executor`` and ``ExecutionObject``.
Step 1
======
.. code-block:: c++
- uint32_t num_dla = Executor::GetNumDevices(DeviceType::DLA);
+ uint32_t num_eve = Executor::GetNumDevices(DeviceType::EVE);
uint32_t num_dsp = Executor::GetNumDevices(DeviceType::DSP);
Step 2
@@ -30,12 +30,12 @@ Create a Configuration object by reading it from a file or by initializing it di
Step 3
======
-Create an Executor with the appropriate device type, set of devices and a configuration. In the snippet below, an Executor is created on 2 DLAs.
+Create an Executor with the appropriate device type, set of devices and a configuration. In the snippet below, an Executor is created on 2 EVEs.
.. code-block:: c++
DeviceIds ids = {DeviceId::ID0, DeviceId::ID1};
- Executor executor(DeviceType::DLA, ids, configuration);
+ Executor executor(DeviceType::EVE, ids, configuration);
Step 4
======
index 8fe89127e7b9f2e336821e82201308f075a001be..57c4cb4578814848ada4f75b7d3914843e3cb625 100644 (file)
signal(SIGTERM, exit);
// If there are no devices capable of offloading TIDL on the SoC, exit
- uint32_t num_dla = Executor::GetNumDevices(DeviceType::DLA);
+ uint32_t num_eve = Executor::GetNumDevices(DeviceType::EVE);
uint32_t num_dsp = Executor::GetNumDevices(DeviceType::DSP);
- if (num_dla == 0 && num_dsp == 0)
+ if (num_eve == 0 && num_dsp == 0)
{
std::cout << "TI DL not supported on this SoC." << std::endl;
return EXIT_SUCCESS;
std::string config = "j11_v2";
std::string input_file = "../test/testvecs/input/objects/cat-pet-animal-domestic-104827.jpeg";
int num_devices = 1;
- DeviceType device_type = (num_dla > 0 ? DeviceType::DLA:DeviceType::DSP);
+ DeviceType device_type = (num_eve > 0 ? DeviceType::EVE:DeviceType::DSP);
ProcessArgs(argc, argv, config, num_devices, device_type, input_file);
std::cout << "Input: " << input_file << std::endl;
break;
case 't': if (*optarg == 'e')
- device_type = DeviceType::DLA;
+ device_type = DeviceType::EVE;
else if (*optarg == 'd')
device_type = DeviceType::DSP;
else
"Optional arguments:\n"
" -c <config> Valid configs: j11_bn, j11_prelu, j11_v2\n"
" -n <number of cores> Number of cores to use (1 - 4)\n"
- " -t <d|e> Type of core. d -> DSP, e -> DLA\n"
+ " -t <d|e> Type of core. d -> DSP, e -> EVE\n"
" -i <image> Path to the image file\n"
" -i camera Use camera as input\n"
" -v Verbose output during execution\n"
index ec183286182124ee94991f84a5d4592ed608ff6d..86f81e55cd8c21f464ef19320ff86ef83f5bf073 100644 (file)
signal(SIGTERM, exit);
// If there are no devices capable of offloading TIDL on the SoC, exit
- uint32_t num_dla = Executor::GetNumDevices(DeviceType::DLA);
+ uint32_t num_eve = Executor::GetNumDevices(DeviceType::EVE);
uint32_t num_dsp = Executor::GetNumDevices(DeviceType::DSP);
- if (num_dla == 0 && num_dsp == 0)
+ if (num_eve == 0 && num_dsp == 0)
{
std::cout << "TI DL not supported on this SoC." << std::endl;
return EXIT_SUCCESS;
std::string config = DEFAULT_CONFIG;
std::string input_file = DEFAULT_INPUT;
int num_devices = 1;
- DeviceType device_type = (num_dla > 0 ? DeviceType::DLA:DeviceType::DSP);
+ DeviceType device_type = (num_eve > 0 ? DeviceType::EVE:DeviceType::DSP);
ProcessArgs(argc, argv, config, num_devices, device_type, input_file);
if ((object_class_table = GetObjectClassTable(config)) == nullptr)
break;
case 't': if (*optarg == 'e')
- device_type = DeviceType::DLA;
+ device_type = DeviceType::EVE;
else if (*optarg == 'd')
device_type = DeviceType::DSP;
else
"Optional arguments:\n"
" -c <config> Valid configs: jseg21_tiscapes, jseg21\n"
" -n <number of cores> Number of cores to use (1 - 4)\n"
- " -t <d|e> Type of core. d -> DSP, e -> DLA\n"
+ " -t <d|e> Type of core. d -> DSP, e -> EVE\n"
" -i <image> Path to the image file\n"
" Default are 3 frames in testvecs\n"
" -i camera Use camera as input\n"
index c3d9c8e7036de5bc41935d3101509cc0422174b7..6d39dda1561658b70a54964fc7efdf8a4505a4cb 100644 (file)
signal(SIGTERM, exit);
// If there are no devices capable of offloading TIDL on the SoC, exit
- uint32_t num_dla = Executor::GetNumDevices(DeviceType::DLA);
+ uint32_t num_eve = Executor::GetNumDevices(DeviceType::EVE);
uint32_t num_dsp = Executor::GetNumDevices(DeviceType::DSP);
- if (num_dla == 0 || num_dsp == 0)
+ if (num_eve == 0 || num_dsp == 0)
{
- std::cout << "ssd_multibox requires both DLA and DSP for execution."
+ std::cout << "ssd_multibox requires both EVE and DSP for execution."
<< std::endl;
return EXIT_SUCCESS;
}
std::string config = DEFAULT_CONFIG;
std::string input_file = DEFAULT_INPUT;
uint32_t num_devices = 1;
- DeviceType device_type = DeviceType::DLA;
+ DeviceType device_type = DeviceType::EVE;
ProcessArgs(argc, argv, config, num_devices, device_type, input_file);
- // Use same number of DLAs and DSPs
- num_devices = std::min(num_devices, std::min(num_dla, num_dsp));
+ // Use same number of EVEs and DSPs
+ num_devices = std::min(num_devices, std::min(num_eve, num_dsp));
if (num_devices == 0)
{
- std::cout << "Partitioned execution requires at least 1 DLA and 1 DSP."
+ std::cout << "Partitioned execution requires at least 1 EVE and 1 DSP."
<< std::endl;
return EXIT_FAILURE;
}
{
// Create a executor with the approriate core type, number of cores
// and configuration specified
- // DLA will run layersGroupId 1 in the network, while
+ // EVE will run layersGroupId 1 in the network, while
// DSP will run layersGroupId 2 in the network
- Executor executor_dla(DeviceType::DLA, ids, configuration, 1);
+ Executor executor_eve(DeviceType::EVE, ids, configuration, 1);
Executor executor_dsp(DeviceType::DSP, ids, configuration, 2);
// Query Executor for set of ExecutionObjects created
- const ExecutionObjects& execution_objects_dla =
- executor_dla.GetExecutionObjects();
+ const ExecutionObjects& execution_objects_eve =
+ executor_eve.GetExecutionObjects();
const ExecutionObjects& execution_objects_dsp =
executor_dsp.GetExecutionObjects();
- int num_eos = execution_objects_dla.size();
+ int num_eos = execution_objects_eve.size();
// Allocate input and output buffers for each execution object
- // Note that "out" is both the output of eo_dla and the input of eo_dsp
+ // Note that "out" is both the output of eo_eve and the input of eo_dsp
// This is how two layersGroupIds, 1 and 2, are tied together
std::vector<void *> buffers;
for (int i = 0; i < num_eos; i++)
{
- ExecutionObject *eo_dla = execution_objects_dla[i].get();
- size_t in_size = eo_dla->GetInputBufferSizeInBytes();
- size_t out_size = eo_dla->GetOutputBufferSizeInBytes();
+ ExecutionObject *eo_eve = execution_objects_eve[i].get();
+ size_t in_size = eo_eve->GetInputBufferSizeInBytes();
+ size_t out_size = eo_eve->GetOutputBufferSizeInBytes();
ArgInfo in = { ArgInfo(malloc(in_size), in_size) };
ArgInfo out = { ArgInfo(malloc(out_size), out_size) };
- eo_dla->SetInputOutputBuffer(in, out);
+ eo_eve->SetInputOutputBuffer(in, out);
ExecutionObject *eo_dsp = execution_objects_dsp[i].get();
size_t out2_size = eo_dsp->GetOutputBufferSizeInBytes();
// Process frames with available execution objects in a pipelined manner
// additional num_eos iterations to flush the pipeline (epilogue)
- ExecutionObject *eo_dla, *eo_dsp, *eo_input;
+ ExecutionObject *eo_eve, *eo_dsp, *eo_input;
for (int frame_idx = 0;
frame_idx < num_frames + num_eos; frame_idx++)
{
- eo_dla = execution_objects_dla[frame_idx % num_eos].get();
+ eo_eve = execution_objects_eve[frame_idx % num_eos].get();
eo_dsp = execution_objects_dsp[frame_idx % num_eos].get();
// Wait for previous frame on the same eo to finish processing
ms_diff(t0[finished_idx % num_eos], t1),
eo_dsp->GetProcessTimeInMilliSeconds());
- eo_input = execution_objects_dla[finished_idx % num_eos].get();
+ eo_input = execution_objects_eve[finished_idx % num_eos].get();
WriteFrameOutput(*eo_input, *eo_dsp, configuration);
}
// Read a frame and start processing it with current eo
- if (ReadFrame(*eo_dla, frame_idx, configuration, num_frames,
+ if (ReadFrame(*eo_eve, frame_idx, configuration, num_frames,
image_file, cap))
{
clock_gettime(CLOCK_MONOTONIC, &t0[frame_idx % num_eos]);
- eo_dla->ProcessFrameStartAsync();
+ eo_eve->ProcessFrameStartAsync();
- if (eo_dla->ProcessFrameWait())
+ if (eo_eve->ProcessFrameWait())
{
clock_gettime(CLOCK_MONOTONIC, &t1);
- ReportTime(frame_idx, "DLA",
+ ReportTime(frame_idx, "EVE",
ms_diff(t0[frame_idx % num_eos], t1),
- eo_dla->GetProcessTimeInMilliSeconds());
+ eo_eve->GetProcessTimeInMilliSeconds());
clock_gettime(CLOCK_MONOTONIC, &t0[frame_idx % num_eos]);
eo_dsp->ProcessFrameStartAsync();
" Will run partitioned ssd_multibox network to perform "
"multi-objects detection\n"
" and classification. First part of network "
- "(layersGroupId 1) runs on DLA,\n"
+ "(layersGroupId 1) runs on EVE,\n"
" second part (layersGroupId 2) runs on DSP.\n"
" Use -c to run a different segmentation network. "
"Default is jdetnet.\n"
diff --git a/examples/test/main.cpp b/examples/test/main.cpp
index 645bb6fd07bdaa6f8f6e17318ba0fc55106bfd2a..968b0fed4feab762a4f219374902df8020719ab7 100644 (file)
--- a/examples/test/main.cpp
+++ b/examples/test/main.cpp
signal(SIGTERM, exit);
// If there are no devices capable of offloading TIDL on the SoC, exit
- uint32_t num_dla = Executor::GetNumDevices(DeviceType::DLA);
+ uint32_t num_eve = Executor::GetNumDevices(DeviceType::EVE);
uint32_t num_dsp = Executor::GetNumDevices(DeviceType::DSP);
- if (num_dla == 0 && num_dsp == 0)
+ if (num_eve == 0 && num_dsp == 0)
{
std::cout << "TI DL not supported on this SoC." << std::endl;
return EXIT_SUCCESS;
// Process arguments
std::string config_file;
int num_devices = 1;
- DeviceType device_type = DeviceType::DLA;
+ DeviceType device_type = DeviceType::EVE;
ProcessArgs(argc, argv, config_file, num_devices, device_type);
bool status = true;
status = RunConfiguration(config_file, num_devices, device_type);
else
{
- if (num_dla > 0)
+ if (num_eve > 0)
{
//TODO: Use memory availability to determine # devices
// Run on 2 devices because there is not enough CMEM available by
// default
- if (num_dla = 4) num_dla = 2;
- status = RunAllConfigurations(num_dla, DeviceType::DLA);
+ if (num_eve = 4) num_eve = 2;
+ status = RunAllConfigurations(num_eve, DeviceType::EVE);
status &= RunMultipleExecutors(
"testvecs/config/infer/tidl_config_j11_v2.txt",
"testvecs/config/infer/tidl_config_j11_cifar.txt",
- num_dla);
+ num_eve);
}
if (num_dsp > 0)
{
std::vector<std::string> configurations;
- if (device_type == DeviceType::DLA)
+ if (device_type == DeviceType::EVE)
configurations = {"dense_1x1", "j11_bn", "j11_cifar",
"j11_controlLayers", "j11_prelu", "j11_v2",
"jseg21", "jseg21_tiscapes", "smallRoi", "squeeze1_1"};
+ config + ".txt";
std::cout << "Running " << config << " on " << num_devices
<< " devices, type "
- << ((device_type == DeviceType::DLA) ? "EVE" : "DSP")
+ << ((device_type == DeviceType::EVE) ? "EVE" : "DSP")
<< std::endl;
Configuration configuration;
break;
case 't': if (*optarg == 'e')
- device_type = DeviceType::DLA;
+ device_type = DeviceType::EVE;
else if (*optarg == 'd')
device_type = DeviceType::DSP;
else
"Optional arguments:\n"
" -c Path to the configuration file\n"
" -n <number of cores> Number of cores to use (1 - 4)\n"
- " -t <d|e> Type of core. d -> DSP, e -> DLA\n"
+ " -t <d|e> Type of core. d -> DSP, e -> EVE\n"
" -v Verbose output during execution\n"
" -h Help\n";
}
index 173ee23caee32b64ac4c26382200658d10e6ac84..78a1789c74332e4c3e4f7eaad8a652f1482d5ca5 100644 (file)
{
// Create a executor with the approriate core type, number of cores
// and configuration specified
- Executor executor(DeviceType::DLA, ids, configuration);
+ Executor executor(DeviceType::EVE, ids, configuration);
const ExecutionObjects& execution_objects =
executor.GetExecutionObjects();
diff --git a/makefile b/makefile
index 72c0a9f108e5f96000540e40a47758f049531a9e..341d4444ff72fa3e6933601043bbf243fa66cb4c 100644 (file)
--- a/makefile
+++ b/makefile
# THE POSSIBILITY OF SUCH DAMAGE.
-# makefile for TI internal use
+# makefile for building from the tidl-api git repi
+# Required TARGET_ROOTDIR to be set.
ifneq (,$(findstring 86, $(shell uname -m)))
DEST_DIR ?= $(CURDIR)/install/am57
build-examples: install-api
$(MAKE) -C examples
+# Build HTML from Sphinx RST, requires Sphinx to be installed
+build-docs:
+ $(MAKE) -C docs
+
install-api: build-api
mkdir -p $(INSTALL_DIR_API)
cp $(CP_ARGS) tidl_api $(INSTALL_DIR_API)/
index 9c6238fb139c30a13f27613162b679708ef7bb7f..2b20eaf9b3c5a5acd0238f76e21c776c1b193556 100644 (file)
--- a/tidl_api/inc/executor.h
+++ b/tidl_api/inc/executor.h
//! Enumerates types of devices available to offload the network.
enum class DeviceType { DSP, /**< Offload to C66x DSP */
- DLA /**< Offload to TI DLA */
+ EVE /**< Offload to TI EVE */
};
//! Enumerates IDs for devices of a given type.
-enum class DeviceId : int { ID0=0, /**< DSP1 or DLA1 */
- ID1, /**< DSP2 or DLA2 */
- ID2, /**< DLA3 */
- ID3 /**< DLA4 */
+enum class DeviceId : int { ID0=0, /**< DSP1 or EVE1 */
+ ID1, /**< DSP2 or EVE2 */
+ ID2, /**< EVE3 */
+ ID3 /**< EVE4 */
};
//! Used to specify the set of devices available to an Executor
//! Configuration configuration;
//! configuration.ReadFromFile("path to configuration file");
//! DeviceIds ids1 = {DeviceId::ID2, DeviceId::ID3};
- //! Executor executor(DeviceType::DLA, ids, configuration);
+ //! Executor executor(DeviceType::EVE, ids, configuration);
//! @endcode
//!
- //! @param device_type DSP or EVE/DLA device
+ //! @param device_type DSP or EVE device
//! @param ids Set of devices uses by this instance of the Executor
//! @param configuration Configuration used to initialize the Executor
//! @param layers_group_id Layers group that this Executor should run
//! @brief Returns the number of devices of the specified type
//! available for TI DL.
- //! @param device_type DSP or EVE/DLA device
+ //! @param device_type DSP or EVE/EVE device
//! @return number of devices available
static uint32_t GetNumDevices(DeviceType device_type);
index 0e329cfb2797998f627c5a754cb01fe019ad9488..6283a98406e27d19c1f99a93781ecc7b5b7d06f2 100644 (file)
std::string name;
if (core_type_m == DeviceType::DSP)
name = "";
- else if (core_type_m == DeviceType::DLA)
+ else if (core_type_m == DeviceType::EVE)
name = STRING(SETUP_KERNEL) ";" STRING(INIT_KERNEL) ";" STRING(PROCESS_KERNEL) ";" STRING(CLEANUP_KERNEL);
device_m = Device::Create(core_type_m, ids, name);
index 2e0b0bd2775f40d4e10e6835a12318585acf1633..a3853a59a5e6bf6677e64094713b2237c035333b 100644 (file)
{
TRACE::print("\tOCL Device: %s created\n",
device_type_m == CL_DEVICE_TYPE_ACCELERATOR ? "DSP" :
- device_type_m == CL_DEVICE_TYPE_CUSTOM ? "DLA" : "Unknown");
+ device_type_m == CL_DEVICE_TYPE_CUSTOM ? "EVE" : "Unknown");
for (int i = 0; i < MAX_DEVICES; i++)
queue_m[i] = nullptr;
Device::Ptr p(nullptr);
if (core_type == DeviceType::DSP)
p.reset(new DspDevice(ids, name));
- else if (core_type == DeviceType::DLA)
+ else if (core_type == DeviceType::EVE)
p.reset(new EveDevice(ids, name));
return p;
if (!PlatformIsAM57()) return 0;
// Convert DeviceType to OpenCL device type
- cl_device_type t = (device_type == DeviceType::DLA) ?
+ cl_device_type t = (device_type == DeviceType::EVE) ?
CL_DEVICE_TYPE_CUSTOM :
CL_DEVICE_TYPE_ACCELERATOR;