From: Yuan Zhao <yuanzhao@ti.com>
Date: Tue, 10 Jul 2018 15:27:02 +0000 (-0500)
Subject: Merge tag 'v01.00.00.02' into develop
X-Git-Tag: v01.01.00.00^2~25
X-Git-Url: https://git.ti.com/gitweb?p=tidl%2Ftidl-api.git;a=commitdiff_plain;h=b6ebd3d1b482ad0ba6f1f7e265aaeb2c0f4335ff;hp=1eb38688505bb3033b5e33e4fb214e3980e3aacb

Merge tag 'v01.00.00.02' into develop

For PSDK 18Q2 (5.0) release
---

diff --git a/docs/source/building.rst b/docs/source/building.rst
new file mode 100644
index 0000000..c507550
--- /dev/null
+++ b/docs/source/building.rst
@@ -0,0 +1,5 @@
+*********************
+Building from sources
+*********************
+
+Source for the TIDL API is available at https://git.ti.com/tidl/tidl-api. ``tidl-api/makefile`` contains targets to build the API, viewer and examples. The makefile supports native compilation on the EVM and cross-compilation on x86/Linux.
diff --git a/docs/source/example.rst b/docs/source/example.rst
index ed6564d..7ba9bc7 100644
--- a/docs/source/example.rst
+++ b/docs/source/example.rst
@@ -4,54 +4,65 @@ Examples
 
 We ship three end-to-end examples within the tidl-api package
 to demonstrate three categories of deep learning networks.  The first
-two examples can run on AM57x SoCs with either DLA or DSP.  The last
-example requires AM57x SoCs with both DLA and DSP.  The performance
+two examples can run on AM57x SoCs with either EVE or DSP devices.  The last
+example requires AM57x SoCs with both EVE and DSP.  The performance
 numbers that we present here were obtained on an AM5729 EVM, which
-includes 2 ARM A15 cores running at 1.5GHz, 4 DLA cores at 535MHz, and
+includes 2 ARM A15 cores running at 1.5GHz, 4 EVE cores at 535MHz, and
 2 DSP cores at 750MHz.
 
+For each example, we report device processing time, host processing time,
+and TIDL API overhead.  **Device processing time** is measured on the device,
+from the moment processing starts for a frame till processing finishes.
+**Host processing time** is measured on the host, from the moment
+``ProcessFrameStartAsync()`` is called till ``ProcessFrameWait()`` returns
+in user application.  It includes the TIDL API overhead, the OpenCL runtime
+overhead, and the time to copy user input data into padded TIDL internal
+buffers.
+
 Imagenet
 --------
 
 The imagenet example takes an image as input and outputs 1000 probabilities.
 Each probability corresponds to one object in the 1000 objects that the
-network is pre-trained with.  Our example outputs top 5 probabilities
+network is pre-trained with.  Our example outputs top 5 predictions
 as the most likely objects that the input image can be.
 
 The following figure and tables shows an input image, top 5 predicted
-objects as output, and the processing time on either DLA or DSP.
+objects as output, and the processing time on either EVE or DSP.
 
 .. image:: ../../examples/test/testvecs/input/objects/cat-pet-animal-domestic-104827.jpeg
    :width: 600
 
 .. table::
 
-    ==== ============== ============
-    Rank Object Classes Probability
-    ==== ============== ============
-    1    tabby          0.996
-    2    Egyptian_cat   0.977
-    3    tiger_cat      0.973
-    4    lynx           0.941
-    5    Persian_cat    0.922
-    ==== ============== ============
+    ==== ==============
+    Rank Object Classes
+    ==== ==============
+    1    tabby
+    2    Egyptian_cat
+    3    tiger_cat
+    4    lynx
+    5    Persian_cat
+    ==== ==============
 
 .. table::
 
    ====================== ==================== ============
    Device Processing Time Host Processing Time API Overhead
    ====================== ==================== ============
-   DLA: 123.1 ms          124.7 ms             1.34 %
+   EVE: 123.1 ms          124.7 ms             1.34 %
    **OR**
    DSP: 117.9 ms          119.3 ms             1.14 %
    ====================== ==================== ============
 
 The particular network that we ran in this category, jacintonet11v2,
-has 14 layers.  User can specify whether to run the network on DLA or DSP
-for acceleration.  We can see that DLA time is slightly higher than DSP time.
-Host time includes the OpenCL runtime overhead and the time to copy user
-input data into padded TIDL buffers.  We can see that the overall overhead
-is less than 1.5%.
+has 14 layers.  User can specify whether to run the network on EVE or DSP
+for acceleration.  We can see that EVE time is slightly higher than DSP time.
+We can also see that the overall overhead is less than 1.5%.
+
+.. note::
+    The predicitions reported here are based on the output of the softmax
+    layer in the network, which are not normalized to the real probabilities.
 
 Segmentation
 ------------
@@ -70,14 +81,14 @@ in blue and background in gray.
 
 The network we ran in this category is jsegnet21v2, which has 26 layers.
 From the reported time in the following table, we can see that this network
-runs significantly faster on DLA than on DSP.
+runs significantly faster on EVE than on DSP.
 
 .. table::
 
    ====================== ==================== ============
    Device Processing Time Host Processing Time API Overhead
    ====================== ==================== ============
-   DLA: 296.5 ms          303.3 ms             2.26 %
+   EVE: 296.5 ms          303.3 ms             2.26 %
    **OR**
    DSP: 812.0 ms          818.4 ms             0.79 %
    ====================== ==================== ============
@@ -100,8 +111,8 @@ vehicles in blue and road signs in yellow.
 .. image:: images/pexels-photo-378570-ssd.jpg
    :width: 600
 
-The network can be run entirely on either DLA or DSP.  But the best
-performance comes with running the first 30 layers on DLA and the
+The network can be run entirely on either EVE or DSP.  But the best
+performance comes with running the first 30 layers on EVE and the
 next 13 layers on DSP, for this particular jdetnet_ssd network.
 Note the **AND** in the following table for the reported time.
 Our end-to-end example shows how easy it is to assign a layers group id
@@ -113,15 +124,19 @@ to an *Executor* and how easy it is to connect the output from one
    ====================== ==================== ============
    Device Processing Time Host Processing Time API Overhead
    ====================== ==================== ============
-   DLA: 175.2 ms          179.1 ms             2.14 %
+   EVE: 175.2 ms          179.1 ms             2.14 %
    **AND**
    DSP:  21.1 ms           22.3 ms             5.62 %
    ====================== ==================== ============
 
+Test
+----
+This example is used to test pre-converted networks included in the TIDL API package (``test/testvecs/config/tidl_models``). When run without any arguments, the program ``test_tidl`` will run all available networks on the C66x DSPs and EVEs available on the SoC. Use the ``-c`` option to specify a single network. Run ``test_tidl -h``  for details.
+
 Running Examples
 ----------------
 
-The examples are located in ``/usr/share/ti/tidl-api/examples`` on
+The examples are located in ``/usr/share/ti/tidl/examples`` on
 the EVM file system.  Each example needs to be run its own directory.
 Running an example with ``-h`` will show help message with option set.
 The following code section shows how to run the examples, and
@@ -151,7 +166,7 @@ the test program that tests all supported TIDL network configs.
    root@am57xx-evm:/usr/share/ti/tidl-api/examples/segmentation# cd ../ssd_multibox/; make -j4
    root@am57xx-evm:/usr/share/ti/tidl-api/examples/ssd_multibox# ./ssd_multibox -i ../test/testvecs/input/roads/pexels-photo-378570.jpeg
    Input: ../test/testvecs/input/roads/pexels-photo-378570.jpeg
-   frame[0]: Time on DLA:  175.2ms, host:    179ms API overhead:    2.1 %
+   frame[0]: Time on EVE:  175.2ms, host:    179ms API overhead:    2.1 %
    frame[0]: Time on DSP:  21.06ms, host:  22.43ms API overhead:   6.08 %
    Saving frame 0 with SSD multiboxes to: multibox_0.png
    Loop total time (including read/write/print/etc):  423.8ms
diff --git a/docs/source/images/tidl-api.png b/docs/source/images/tidl-api.png
index 26a8a7c..59c52f2 100755
Binary files a/docs/source/images/tidl-api.png and b/docs/source/images/tidl-api.png differ
diff --git a/docs/source/index.rst b/docs/source/index.rst
index a094db5..ec32a6b 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -12,6 +12,7 @@ TI Deep Learning API User's Guide
    viewer
    example
    api
+   building
    faq/index
    notice
    disclaimer
diff --git a/docs/source/intro.rst b/docs/source/intro.rst
index 92124aa..407a39e 100644
--- a/docs/source/intro.rst
+++ b/docs/source/intro.rst
@@ -2,7 +2,7 @@
 Introduction
 ************
 
-TI Deep Learning (TIDL) API brings deep learning to the edge by enabling applications to leverage TI's proprietary, highly optimized CNN/DNN implementation on Deep Learning Accelerator (DLA) and C66x DSP compute engines. TIDL will initially target Vision/2D use cases on AM57x SoCs.
+TI Deep Learning (TIDL) API brings deep learning to the edge by enabling applications to leverage TI's proprietary, highly optimized CNN/DNN implementation on the EVE and C66x DSP compute engines. TIDL will initially target Vision/2D use cases on AM57x SoCs.
 
 This User's Guide covers the TIDL API. For information on TIDL such as the overall development flow, techniques to optimize performance of CNN/DNN on TI's SoCs,performance/benchmarking data and list of supported layers, see the TIDL section in the `Processor SDK Linux Software Developer's Guide`_.
 
@@ -14,7 +14,7 @@ Key Features
 Ease of use
 +++++++++++
 * Easily integrate TIDL APIs into other frameworks such as `OpenCV`_
-* Provides a common host abstraction for user applications across multiple compute engines (DLAs and C66x DSPs)
+* Provides a common host abstraction for user applications across multiple compute engines (EVEs and C66x DSPs)
 
 Low overhead
 +++++++++++++
@@ -22,7 +22,7 @@ The execution time of TIDL APIs on the host is a fairly small percentage of the
 
 Software Architecture
 ---------------------
-The TIDL API leverages TI's `OpenCL`_ product to offload deep learning applications to both DLA(s) and DSP(s).  The TIDL API significantly improves the out-of-box deep learning experience for users and enables them to focus on their overall use case. They do not have to spend time on the mechanics of ARM â DSP/DLA communication or implementing optimized network layers on DLA(s) and/or DSP(s).  The API allows customers to easily integrate frameworks such as OpenCV and rapidly prototype deep learning applications.
+The TIDL API leverages TI's `OpenCL`_ product to offload deep learning applications to both EVE(s) and DSP(s).  The TIDL API significantly improves the out-of-box deep learning experience for users and enables them to focus on their overall use case. They do not have to spend time on the mechanics of ARM â DSP/EVE communication or implementing optimized network layers on EVE(s) and/or DSP(s).  The API allows customers to easily integrate frameworks such as OpenCV and rapidly prototype deep learning applications.
 
 .. _`TIDL Development flow`:
 
@@ -44,9 +44,9 @@ The TIDL API leverages TI's `OpenCL`_ product to offload deep learning applicati
 
     TIDL API Software Architecture
 
-TIDL APIs provide three intuitive C++ classes.  ``Configuration`` encapsulates a network configuration, including pointers to the network and parameter binary files.  ``Executor`` encapsulates on-device memory allocation, network setup and initialization.  ``ExecutionObject`` encapsulates TIDL processing on a single DSP or DLA core.  Implementation of these classes will call into OpenCL runtime to offload network processing onto DLA/DSP devices, abstracting these details from the user.
+TIDL APIs provide three intuitive C++ classes.  ``Configuration`` encapsulates a network configuration, including pointers to the network and parameter binary files.  ``Executor`` encapsulates on-device memory allocation, network setup and initialization.  ``ExecutionObject`` encapsulates TIDL processing on a single DSP or EVE core.  Implementation of these classes will call into OpenCL runtime to offload network processing onto EVE/DSP devices, abstracting these details from the user.
 
-:numref:`simple-example` illustrates how easy it is to use TIDL APIs to leverage deep learning application in user applications.  In this example, a configuration object is created from reading a TIDL network config file.  An executor object is created with two DLA devices.  It uses the configuration object to setup and initialize TIDL network on DLAs.  Each of the two execution objects dispatches TIDL processing to a different DLA core.  Because the OpenCL kernel execution is asynchronous, we can pipeline the frames across two DLAs.  When one frame is being processed by a DLA, the next frame can be processed by another DLA.
+:numref:`simple-example` illustrates how easy it is to use TIDL APIs to leverage deep learning application in user applications.  In this example, a configuration object is created from reading a TIDL network config file.  An executor object is created with two EVE devices.  It uses the configuration object to setup and initialize TIDL network on EVEs.  Each of the two execution objects dispatches TIDL processing to a different EVE core.  Because the OpenCL kernel execution is asynchronous, we can pipeline the frames across two EVEs.  When one frame is being processed by a EVE, the next frame can be processed by another EVE.
 
 
 .. code-block:: c++
@@ -57,13 +57,13 @@ TIDL APIs provide three intuitive C++ classes.  ``Configuration`` encapsulates a
     Configuration configuration;
     bool status = configuration.ReadFromFile(â./tidl_j11v2_net");
 
-    // Create an executor with 2 DLAs and configuration
+    // Create an executor with 2 EVEs and configuration
     DeviceIds ids = {DeviceId::ID0, DeviceId::ID1};
-    Executor executor(DeviceType::DLA, ids, configuration);
+    Executor executor(DeviceType::EVE, ids, configuration);
 
     // Query Executor for set of ExecutionObjects created
     const ExecutionObjects& eos = executor.GetExecutionObjects();
-    int num_eos = eos.size();  // 2 DLAs
+    int num_eos = eos.size();  // 2 EVEs
 
     // Allocate input and output buffers for each execution object
     for (auto &eo : eos)
@@ -73,7 +73,7 @@ TIDL APIs provide three intuitive C++ classes.  ``Configuration`` encapsulates a
          eo->SetInputOutputBuffer(in, out);
     }
 
-    // Pipelined processing with 2 DLA cores
+    // Pipelined processing with 2 EVE cores
     for (int idx = 0; idx < configuration.numFrames + num_eos; idx++)
     {
         ExecutionObject* eo = eos[idx % num_eos].get();
@@ -86,11 +86,11 @@ TIDL APIs provide three intuitive C++ classes.  ``Configuration`` encapsulates a
     }
 
 
-``ReadFrameInput`` and ``WriteFrameOutput`` functions are used to read an input frame and write the result of processing. For example, with OpenCV, ``ReadFrameInput`` is implemented using OpenCV APIs to capture a frame. To execute the same network on DSPs, the only change to :numref:`simple-example` is to replace ``DeviceType::DLA`` with ``DeviceType::DSP``.
+``ReadFrameInput`` and ``WriteFrameOutput`` functions are used to read an input frame and write the result of processing. For example, with OpenCV, ``ReadFrameInput`` is implemented using OpenCV APIs to capture a frame. To execute the same network on DSPs, the only change to :numref:`simple-example` is to replace ``DeviceType::EVE`` with ``DeviceType::DSP``.
 
 Section :ref:`using-tidl-api` contains details on using the APIs. The APIs themselves are documented in section :ref:`api-documentation`.
 
-Sometimes it is beneficial to partition a network and run different parts on different cores because some types of layers could run faster on DLAs while other types could run faster on DSPs.  TIDL APIs provide the flexibility to run partitioned network across DLAs and DSPs. Refer the :ref:`ssd-example` example for details.
+Sometimes it is beneficial to partition a network and run different parts on different cores because some types of layers could run faster on EVEs while other types could run faster on DSPs.  TIDL APIs provide the flexibility to run partitioned network across EVEs and DSPs. Refer the :ref:`ssd-example` example for details.
 
 .. _Processor SDK Linux Software Developer's Guide: http://software-dl.ti.com/processor-sdk-linux/esd/docs/latest/linux/index.html
 .. _OpenCV: http://software-dl.ti.com/processor-sdk-linux/esd/docs/latest/linux/Foundational_Components.html#opencv
diff --git a/docs/source/using_api.rst b/docs/source/using_api.rst
index f7c4eae..b2c909e 100644
--- a/docs/source/using_api.rst
+++ b/docs/source/using_api.rst
@@ -4,7 +4,7 @@
 Using the TIDL API
 ******************
 
-This example illustrates using the TIDL API to offload deep learning network processing from a Linux application to the C66x DSPs or DLAs on AM57x devices. The API consists of three classes: ``Configuration``, ``Executor`` and ``ExecutionObject``.
+This example illustrates using the TIDL API to offload deep learning network processing from a Linux application to the C66x DSPs or EVEs on AM57x devices. The API consists of three classes: ``Configuration``, ``Executor`` and ``ExecutionObject``.
 
 Step 1
 ======
@@ -13,9 +13,16 @@ Determine if there are any TIDL capable devices on the AM57x SoC:
 
 .. code-block:: c++
 
-    uint32_t num_dla = Executor::GetNumDevices(DeviceType::DLA);
+    uint32_t num_eve = Executor::GetNumDevices(DeviceType::EVE);
     uint32_t num_dsp = Executor::GetNumDevices(DeviceType::DSP);
 
+.. note::
+    By default, the OpenCL runtime is configured with sufficient global memory 
+    (via CMEM) to offload TIDL networks to 2 OpenCL devices. On devices where
+    ``Executor::GetNumDevices`` returns 4 (E.g. AM5729 with 4 EVE OpenCL
+    devices) the amount of memory available to the runtime must be increased. 
+    Refer :ref:`opencl-global-memory` for details
+
 Step 2
 ======
 Create a Configuration object by reading it from a file or by initializing it directly. The example below parses a configuration file and initializes the Configuration object. See ``examples/test/testvecs/config/infer`` for examples of configuration files.
@@ -30,12 +37,12 @@ Create a Configuration object by reading it from a file or by initializing it di
 
 Step 3
 ======
-Create an Executor with the appropriate device type, set of devices and a configuration. In the snippet below, an Executor is created on 2 DLAs.
+Create an Executor with the appropriate device type, set of devices and a configuration. In the snippet below, an Executor is created on 2 EVEs.
 
 .. code-block:: c++
 
         DeviceIds ids = {DeviceId::ID0, DeviceId::ID1};
-        Executor executor(DeviceType::DLA, ids, configuration);
+        Executor executor(DeviceType::EVE, ids, configuration);
 
 Step 4
 ======
@@ -77,6 +84,6 @@ Run the network on each input frame.  The frames are processed with available ex
                 eo->ProcessFrameStartAsync();
         }
 
-For a complete example of using the API, refer any of the examples available at ``/usr/share/ti/tidl-api/examples`` on the EVM file system.
+For a complete example of using the API, refer any of the examples available at ``/usr/share/ti/tidl/examples`` on the EVM file system.
 
 .. _Processor SDK Linux Software Developer's Guide: http://software-dl.ti.com/processor-sdk-linux/esd/docs/latest/linux/index.html
diff --git a/examples/Makefile b/examples/Makefile
index 512e2ed..69c7086 100644
--- a/examples/Makefile
+++ b/examples/Makefile
@@ -42,6 +42,3 @@ all:
 .PHONY: clean
 clean:
 	$(call make_in_dirs, $(DIRS), clean)
-
-realclean: clean
-	make -C ../tidl_api clean
diff --git a/examples/classification/Makefile b/examples/classification/Makefile
new file mode 100644
index 0000000..507ee00
--- /dev/null
+++ b/examples/classification/Makefile
@@ -0,0 +1,41 @@
+# Copyright (c) 2018 Texas Instruments Incorporated - http://www.ti.com/
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+# * Neither the name of Texas Instruments Incorporated nor the
+# names of its contributors may be used to endorse or promote products
+# derived from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+# THE POSSIBILITY OF SUCH DAMAGE.
+
+EXE = tidl_classification
+
+include ../make.common
+
+LIBS     += -lopencv_highgui -lopencv_imgcodecs -lopencv_videoio\
+			-lopencv_imgproc -lopencv_core
+
+SOURCES = main.cpp multiple_executors.cpp findclasses.cpp
+
+$(EXE): $(TIDL_API_LIB) $(HEADERS) $(SOURCES)
+	$(CXX) $(CXXFLAGS) $(SOURCES) $(TIDL_API_LIB) $(LDFLAGS) $(LIBS) -o $@
+
+clean::
+	$(RM) -f $(EXE)
+
diff --git a/examples/classification/classlist.txt b/examples/classification/classlist.txt
new file mode 100644
index 0000000..a21f165
--- /dev/null
+++ b/examples/classification/classlist.txt
@@ -0,0 +1,13 @@
+coffee_mug
+coffeepot
+tennis_ball
+baseball
+sunglass
+sunglasses
+water_bottle
+pill_bottle
+beer_glass
+fountain_pen
+laptop
+envelope
+notebook
diff --git a/examples/classification/clips/test1.mp4 b/examples/classification/clips/test1.mp4
new file mode 100644
index 0000000..95d3ce7
Binary files /dev/null and b/examples/classification/clips/test1.mp4 differ
diff --git a/examples/classification/clips/test1a.mp4 b/examples/classification/clips/test1a.mp4
new file mode 100644
index 0000000..cb47958
Binary files /dev/null and b/examples/classification/clips/test1a.mp4 differ
diff --git a/examples/classification/clips/test2.mp4 b/examples/classification/clips/test2.mp4
new file mode 100644
index 0000000..6952fab
Binary files /dev/null and b/examples/classification/clips/test2.mp4 differ
diff --git a/examples/classification/findclasses.cpp b/examples/classification/findclasses.cpp
new file mode 100644
index 0000000..2793295
--- /dev/null
+++ b/examples/classification/findclasses.cpp
@@ -0,0 +1,85 @@
+
+#include <signal.h>
+#include <getopt.h>
+#include <iostream>
+#include <iomanip>
+#include <fstream>
+#include <cassert>
+#include <string>
+
+#define MAX_CLASSES 1000
+#define MAX_SELECTED_ITEMS 100
+
+using namespace std;
+
+std::string labels_classes[MAX_CLASSES];
+int IMAGE_CLASSES_NUM = 0;
+int selected_items_size = 0;
+int selected_items[MAX_SELECTED_ITEMS];
+static int get_classindex(std::string str2find)
+{
+  if(selected_items_size >= MAX_SELECTED_ITEMS)
+  {
+     std::cout << "Max number of selected classes is reached! (" << selected_items_size << ")!" << std::endl;
+     return -1;
+  }
+  for (int i = 0; i < IMAGE_CLASSES_NUM; i ++)
+  {
+    if(labels_classes[i].compare(str2find) == 0)
+    {
+      selected_items[selected_items_size ++] = i;
+      return i;
+    }
+  }
+  std::cout << "Not found: " << str2find << std::endl << std::flush;
+  return -1;
+}
+
+int populate_selected_items (char *filename)
+{
+  ifstream file(filename);
+  if(file.is_open())
+  {
+    string inputLine;
+
+    while (getline(file, inputLine) )                 //while the end of file is NOT reached
+    {
+      int res = get_classindex(inputLine);
+      std::cout << "Searching for " << inputLine  << std::endl;
+      if(res >= 0) {
+        std::cout << "Found: " << res << std::endl;
+      } else {
+        std::cout << "Not Found: " << res << std::endl;
+      }
+    }
+    file.close();
+  }
+#if 0
+  std::cout << "==Total of " << selected_items_size << " items!" << std::endl;
+  for (int i = 0; i < selected_items_size; i ++)
+    std::cout << i << ") " << selected_items[i] << std::endl;
+#endif
+  return selected_items_size;
+}
+
+void populate_labels (char *filename)
+{
+  ifstream file(filename);
+  if(file.is_open())
+  {
+    string inputLine;
+
+    while (getline(file, inputLine) )                 //while the end of file is NOT reached
+    {
+      labels_classes[IMAGE_CLASSES_NUM ++] = string(inputLine);
+    }
+    file.close();
+  }
+#if 1
+  std::cout << "==Total of " << IMAGE_CLASSES_NUM << " items!" << std::endl;
+  for (int i = 0; i < IMAGE_CLASSES_NUM; i ++)
+    std::cout << i << ") " << labels_classes[i] << std::endl;
+#endif
+}
+
+
diff --git a/examples/classification/imagenet.txt b/examples/classification/imagenet.txt
new file mode 100644
index 0000000..9f6d42a
--- /dev/null
+++ b/examples/classification/imagenet.txt
@@ -0,0 +1,1000 @@
+tench
+goldfish
+great_white_shark
+tiger_shark
+hammerhead
+electric_ray
+stingray
+cock
+hen
+ostrich
+brambling
+goldfinch
+house_finch
+junco
+indigo_bunting
+robin
+bulbul
+jay
+magpie
+chickadee
+water_ouzel
+kite
+bald_eagle
+vulture
+great_grey_owl
+European_fire_salamander
+common_newt
+eft
+spotted_salamander
+axolotl
+bullfrog
+tree_frog
+tailed_frog
+loggerhead
+leatherback_turtle
+mud_turtle
+terrapin
+box_turtle
+banded_gecko
+common_iguana
+American_chameleon
+whiptail
+agama
+frilled_lizard
+alligator_lizard
+Gila_monster
+green_lizard
+African_chameleon
+Komodo_dragon
+African_crocodile
+American_alligator
+triceratops
+thunder_snake
+ringneck_snake
+hognose_snake
+green_snake
+king_snake
+garter_snake
+water_snake
+vine_snake
+night_snake
+boa_constrictor
+rock_python
+Indian_cobra
+green_mamba
+sea_snake
+horned_viper
+diamondback
+sidewinder
+trilobite
+harvestman
+scorpion
+black_and_gold_garden_spider
+barn_spider
+garden_spider
+black_widow
+tarantula
+wolf_spider
+tick
+centipede
+black_grouse
+ptarmigan
+ruffed_grouse
+prairie_chicken
+peacock
+quail
+partridge
+African_grey
+macaw
+sulphur-crested_cockatoo
+lorikeet
+coucal
+bee_eater
+hornbill
+hummingbird
+jacamar
+toucan
+drake
+red-breasted_merganser
+goose
+black_swan
+tusker
+echidna
+platypus
+wallaby
+koala
+wombat
+jellyfish
+sea_anemone
+brain_coral
+flatworm
+nematode
+conch
+snail
+slug
+sea_slug
+chiton
+chambered_nautilus
+Dungeness_crab
+rock_crab
+fiddler_crab
+king_crab
+American_lobster
+spiny_lobster
+crayfish
+hermit_crab
+isopod
+white_stork
+black_stork
+spoonbill
+flamingo
+little_blue_heron
+American_egret
+bittern
+crane
+limpkin
+European_gallinule
+American_coot
+bustard
+ruddy_turnstone
+red-backed_sandpiper
+redshank
+dowitcher
+oystercatcher
+pelican
+king_penguin
+albatross
+grey_whale
+killer_whale
+dugong
+sea_lion
+Chihuahua
+Japanese_spaniel
+Maltese_dog
+Pekinese
+Shih-Tzu
+Blenheim_spaniel
+papillon
+toy_terrier
+Rhodesian_ridgeback
+Afghan_hound
+basset
+beagle
+bloodhound
+bluetick
+black-and-tan_coonhound
+Walker_hound
+English_foxhound
+redbone
+borzoi
+Irish_wolfhound
+Italian_greyhound
+whippet
+Ibizan_hound
+Norwegian_elkhound
+otterhound
+Saluki
+Scottish_deerhound
+Weimaraner
+Staffordshire_bullterrier
+American_Staffordshire_terrier
+Bedlington_terrier
+Border_terrier
+Kerry_blue_terrier
+Irish_terrier
+Norfolk_terrier
+Norwich_terrier
+Yorkshire_terrier
+wire-haired_fox_terrier
+Lakeland_terrier
+Sealyham_terrier
+Airedale
+cairn
+Australian_terrier
+Dandie_Dinmont
+Boston_bull
+miniature_schnauzer
+giant_schnauzer
+standard_schnauzer
+Scotch_terrier
+Tibetan_terrier
+silky_terrier
+soft-coated_wheaten_terrier
+West_Highland_white_terrier
+Lhasa
+flat-coated_retriever
+curly-coated_retriever
+golden_retriever
+Labrador_retriever
+Chesapeake_Bay_retriever
+German_short-haired_pointer
+vizsla
+English_setter
+Irish_setter
+Gordon_setter
+Brittany_spaniel
+clumber
+English_springer
+Welsh_springer_spaniel
+cocker_spaniel
+Sussex_spaniel
+Irish_water_spaniel
+kuvasz
+schipperke
+groenendael
+malinois
+briard
+kelpie
+komondor
+Old_English_sheepdog
+Shetland_sheepdog
+collie
+Border_collie
+Bouvier_des_Flandres
+Rottweiler
+German_shepherd
+Doberman
+miniature_pinscher
+Greater_Swiss_Mountain_dog
+Bernese_mountain_dog
+Appenzeller
+EntleBucher
+boxer
+bull_mastiff
+Tibetan_mastiff
+French_bulldog
+Great_Dane
+Saint_Bernard
+Eskimo_dog
+malamute
+Siberian_husky
+dalmatian
+affenpinscher
+basenji
+pug
+Leonberg
+Newfoundland
+Great_Pyrenees
+Samoyed
+Pomeranian
+chow
+keeshond
+Brabancon_griffon
+Pembroke
+Cardigan
+toy_poodle
+miniature_poodle
+standard_poodle
+Mexican_hairless
+timber_wolf
+white_wolf
+red_wolf
+coyote
+dingo
+dhole
+African_hunting_dog
+hyena
+red_fox
+kit_fox
+Arctic_fox
+grey_fox
+tabby
+tiger_cat
+Persian_cat
+Siamese_cat
+Egyptian_cat
+cougar
+lynx
+leopard
+snow_leopard
+jaguar
+lion
+tiger
+cheetah
+brown_bear
+American_black_bear
+ice_bear
+sloth_bear
+mongoose
+meerkat
+tiger_beetle
+ladybug
+ground_beetle
+long-horned_beetle
+leaf_beetle
+dung_beetle
+rhinoceros_beetle
+weevil
+fly
+bee
+ant
+grasshopper
+cricket
+walking_stick
+cockroach
+mantis
+cicada
+leafhopper
+lacewing
+dragonfly
+damselfly
+admiral
+ringlet
+monarch
+cabbage_butterfly
+sulphur_butterfly
+lycaenid
+starfish
+sea_urchin
+sea_cucumber
+wood_rabbit
+hare
+Angora
+hamster
+porcupine
+fox_squirrel
+marmot
+beaver
+guinea_pig
+sorrel
+zebra
+hog
+wild_boar
+warthog
+hippopotamus
+ox
+water_buffalo
+bison
+ram
+bighorn
+ibex
+hartebeest
+impala
+gazelle
+Arabian_camel
+llama
+weasel
+mink
+polecat
+black-footed_ferret
+otter
+skunk
+badger
+armadillo
+three-toed_sloth
+orangutan
+gorilla
+chimpanzee
+gibbon
+siamang
+guenon
+patas
+baboon
+macaque
+langur
+colobus
+proboscis_monkey
+marmoset
+capuchin
+howler_monkey
+titi
+spider_monkey
+squirrel_monkey
+Madagascar_cat
+indri
+Indian_elephant
+African_elephant
+lesser_panda
+giant_panda
+barracouta
+eel
+coho
+rock_beauty
+anemone_fish
+sturgeon
+gar
+lionfish
+puffer
+abacus
+abaya
+academic_gown
+accordion
+acoustic_guitar
+aircraft_carrier
+airliner
+airship
+altar
+ambulance
+amphibian
+analog_clock
+apiary
+apron
+ashcan
+assault_rifle
+backpack
+bakery
+balance_beam
+balloon
+ballpoint
+Band_Aid
+banjo
+bannister
+barbell
+barber_chair
+barbershop
+barn
+barometer
+barrel
+barrow
+baseball
+basketball
+bassinet
+bassoon
+bathing_cap
+bath_towel
+bathtub
+beach_wagon
+beacon
+beaker
+bearskin
+beer_bottle
+beer_glass
+bell_cote
+bib
+bicycle-built-for-two
+bikini
+binder
+binoculars
+birdhouse
+boathouse
+bobsled
+bolo_tie
+bonnet
+bookcase
+bookshop
+bottlecap
+bow
+bow_tie
+brass
+brassiere
+breakwater
+breastplate
+broom
+bucket
+buckle
+bulletproof_vest
+bullet_train
+butcher_shop
+cab
+caldron
+candle
+cannon
+canoe
+can_opener
+cardigan
+car_mirror
+carousel
+carpenter's_kit
+carton
+car_wheel
+cash_machine
+cassette
+cassette_player
+castle
+catamaran
+CD_player
+cello
+cellular_telephone
+chain
+chainlink_fence
+chain_mail
+chain_saw
+chest
+chiffonier
+chime
+china_cabinet
+Christmas_stocking
+church
+cinema
+cleaver
+cliff_dwelling
+cloak
+clog
+cocktail_shaker
+coffee_mug
+coffeepot
+coil
+combination_lock
+computer_keyboard
+confectionery
+container_ship
+convertible
+corkscrew
+cornet
+cowboy_boot
+cowboy_hat
+cradle
+crane
+crash_helmet
+crate
+crib
+Crock_Pot
+croquet_ball
+crutch
+cuirass
+dam
+desk
+desktop_computer
+dial_telephone
+diaper
+digital_clock
+digital_watch
+dining_table
+dishrag
+dishwasher
+disk_brake
+dock
+dogsled
+dome
+doormat
+drilling_platform
+drum
+drumstick
+dumbbell
+Dutch_oven
+electric_fan
+electric_guitar
+electric_locomotive
+entertainment_center
+envelope
+espresso_maker
+face_powder
+feather_boa
+file
+fireboat
+fire_engine
+fire_screen
+flagpole
+flute
+folding_chair
+football_helmet
+forklift
+fountain
+fountain_pen
+four-poster
+freight_car
+French_horn
+frying_pan
+fur_coat
+garbage_truck
+gasmask
+gas_pump
+goblet
+go-kart
+golf_ball
+golfcart
+gondola
+gong
+gown
+grand_piano
+greenhouse
+grille
+grocery_store
+guillotine
+hair_slide
+hair_spray
+half_track
+hammer
+hamper
+hand_blower
+hand-held_computer
+handkerchief
+hard_disc
+harmonica
+harp
+harvester
+hatchet
+holster
+home_theater
+honeycomb
+hook
+hoopskirt
+horizontal_bar
+horse_cart
+hourglass
+iPod
+iron
+jack-o'-lantern
+jean
+jeep
+jersey
+jigsaw_puzzle
+jinrikisha
+joystick
+kimono
+knee_pad
+knot
+lab_coat
+ladle
+lampshade
+laptop
+lawn_mower
+lens_cap
+letter_opener
+library
+lifeboat
+lighter
+limousine
+liner
+lipstick
+Loafer
+lotion
+loudspeaker
+loupe
+lumbermill
+magnetic_compass
+mailbag
+mailbox
+maillot
+maillot
+manhole_cover
+maraca
+marimba
+mask
+matchstick
+maypole
+maze
+measuring_cup
+medicine_chest
+megalith
+microphone
+microwave
+military_uniform
+milk_can
+minibus
+miniskirt
+minivan
+missile
+mitten
+mixing_bowl
+mobile_home
+Model_T
+modem
+monastery
+monitor
+moped
+mortar
+mortarboard
+mosque
+mosquito_net
+motor_scooter
+mountain_bike
+mountain_tent
+mouse
+mousetrap
+moving_van
+muzzle
+nail
+neck_brace
+necklace
+nipple
+notebook
+obelisk
+oboe
+ocarina
+odometer
+oil_filter
+organ
+oscilloscope
+overskirt
+oxcart
+oxygen_mask
+packet
+paddle
+paddlewheel
+padlock
+paintbrush
+pajama
+palace
+panpipe
+paper_towel
+parachute
+parallel_bars
+park_bench
+parking_meter
+passenger_car
+patio
+pay-phone
+pedestal
+pencil_box
+pencil_sharpener
+perfume
+Petri_dish
+photocopier
+pick
+pickelhaube
+picket_fence
+pickup
+pier
+piggy_bank
+pill_bottle
+pillow
+ping-pong_ball
+pinwheel
+pirate
+pitcher
+plane
+planetarium
+plastic_bag
+plate_rack
+plow
+plunger
+Polaroid_camera
+pole
+police_van
+poncho
+pool_table
+pop_bottle
+pot
+potter's_wheel
+power_drill
+prayer_rug
+printer
+prison
+projectile
+projector
+puck
+punching_bag
+purse
+quill
+quilt
+racer
+racket
+radiator
+radio
+radio_telescope
+rain_barrel
+recreational_vehicle
+reel
+reflex_camera
+refrigerator
+remote_control
+restaurant
+revolver
+rifle
+rocking_chair
+rotisserie
+rubber_eraser
+rugby_ball
+rule
+running_shoe
+safe
+safety_pin
+saltshaker
+sandal
+sarong
+sax
+scabbard
+scale
+school_bus
+schooner
+scoreboard
+screen
+screw
+screwdriver
+seat_belt
+sewing_machine
+shield
+shoe_shop
+shoji
+shopping_basket
+shopping_cart
+shovel
+shower_cap
+shower_curtain
+ski
+ski_mask
+sleeping_bag
+slide_rule
+sliding_door
+slot
+snorkel
+snowmobile
+snowplow
+soap_dispenser
+soccer_ball
+sock
+solar_dish
+sombrero
+soup_bowl
+space_bar
+space_heater
+space_shuttle
+spatula
+speedboat
+spider_web
+spindle
+sports_car
+spotlight
+stage
+steam_locomotive
+steel_arch_bridge
+steel_drum
+stethoscope
+stole
+stone_wall
+stopwatch
+stove
+strainer
+streetcar
+stretcher
+studio_couch
+stupa
+submarine
+suit
+sundial
+sunglass
+sunglasses
+sunscreen
+suspension_bridge
+swab
+sweatshirt
+swimming_trunks
+swing
+switch
+syringe
+table_lamp
+tank
+tape_player
+teapot
+teddy
+television
+tennis_ball
+thatch
+theater_curtain
+thimble
+thresher
+throne
+tile_roof
+toaster
+tobacco_shop
+toilet_seat
+torch
+totem_pole
+tow_truck
+toyshop
+tractor
+trailer_truck
+tray
+trench_coat
+tricycle
+trimaran
+tripod
+triumphal_arch
+trolleybus
+trombone
+tub
+turnstile
+typewriter_keyboard
+umbrella
+unicycle
+upright
+vacuum
+vase
+vault
+velvet
+vending_machine
+vestment
+viaduct
+violin
+volleyball
+waffle_iron
+wall_clock
+wallet
+wardrobe
+warplane
+washbasin
+washer
+water_bottle
+water_jug
+water_tower
+whiskey_jug
+whistle
+wig
+window_screen
+window_shade
+Windsor_tie
+wine_bottle
+wing
+wok
+wooden_spoon
+wool
+worm_fence
+wreck
+yawl
+yurt
+web_site
+comic_book
+crossword_puzzle
+street_sign
+traffic_light
+book_jacket
+menu
+plate
+guacamole
+consomme
+hot_pot
+trifle
+ice_cream
+ice_lolly
+French_loaf
+bagel
+pretzel
+cheeseburger
+hotdog
+mashed_potato
+head_cabbage
+broccoli
+cauliflower
+zucchini
+spaghetti_squash
+acorn_squash
+butternut_squash
+cucumber
+artichoke
+bell_pepper
+cardoon
+mushroom
+Granny_Smith
+strawberry
+orange
+lemon
+fig
+pineapple
+banana
+jackfruit
+custard_apple
+pomegranate
+hay
+carbonara
+chocolate_sauce
+dough
+meat_loaf
+pizza
+potpie
+burrito
+red_wine
+espresso
+cup
+eggnog
+alp
+bubble
+cliff
+coral_reef
+geyser
+lakeside
+promontory
+sandbar
+seashore
+valley
+volcano
+ballplayer
+groom
+scuba_diver
+rapeseed
+daisy
+yellow_lady's_slipper
+corn
+acorn
+hip
+buckeye
+coral_fungus
+agaric
+gyromitra
+stinkhorn
+earthstar
+hen-of-the-woods
+bolete
+ear
+toilet_tissue
diff --git a/examples/classification/images/baseball.jpg b/examples/classification/images/baseball.jpg
new file mode 100644
index 0000000..836c5b4
Binary files /dev/null and b/examples/classification/images/baseball.jpg differ
diff --git a/examples/classification/images/coffe.jpg b/examples/classification/images/coffe.jpg
new file mode 100644
index 0000000..370d93c
Binary files /dev/null and b/examples/classification/images/coffe.jpg differ
diff --git a/examples/classification/images/coffe_pot.jpg b/examples/classification/images/coffe_pot.jpg
new file mode 100644
index 0000000..5d1b2dc
Binary files /dev/null and b/examples/classification/images/coffe_pot.jpg differ
diff --git a/examples/classification/images/img2clip.sh b/examples/classification/images/img2clip.sh
new file mode 100644
index 0000000..f79f2d4
--- /dev/null
+++ b/examples/classification/images/img2clip.sh
@@ -0,0 +1,2 @@
+convert ../*.jpg -delay 500 -morph 300 -scale 320x320 %05d.jpg
+ffmpeg -i %05d.jpg -vcodec libx264 -profile:v main -pix_fmt yuv420p  -r 15 test.mp4
diff --git a/examples/classification/images/tennis_ball.jpg b/examples/classification/images/tennis_ball.jpg
new file mode 100644
index 0000000..9cdd085
Binary files /dev/null and b/examples/classification/images/tennis_ball.jpg differ
diff --git a/examples/classification/labels.txt b/examples/classification/labels.txt
new file mode 100644
index 0000000..bc6c4a9
--- /dev/null
+++ b/examples/classification/labels.txt
@@ -0,0 +1,9 @@
+shar_pei
+golden_retriever
+afghan_hound
+dachshund
+german_shepherd
+labrador_retriever
+pomeranian
+rottweiler
+background
diff --git a/examples/classification/main.cpp b/examples/classification/main.cpp
new file mode 100644
index 0000000..e38376a
--- /dev/null
+++ b/examples/classification/main.cpp
@@ -0,0 +1,710 @@
+/******************************************************************************
+ * Copyright (c) 2018, Texas Instruments Incorporated - http://www.ti.com/
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions are met:
+ *       * Redistributions of source code must retain the above copyright
+ *         notice, this list of conditions and the following disclaimer.
+ *       * Redistributions in binary form must reproduce the above copyright
+ *         notice, this list of conditions and the following disclaimer in the
+ *         documentation and/or other materials provided with the distribution.
+ *       * Neither the name of Texas Instruments Incorporated nor the
+ *         names of its contributors may be used to endorse or promote products
+ *         derived from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ *   AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ *   IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ *   ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ *   LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ *   CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ *   SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ *   INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ *   CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ *   ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+ *   THE POSSIBILITY OF SUCH DAMAGE.
+ *****************************************************************************/
+#include <signal.h>
+#include <getopt.h>
+#include <iostream>
+#include <iomanip>
+#include <fstream>
+#include <cassert>
+#include <string>
+#include <functional>
+#include <queue>
+#include <algorithm>
+#include <time.h>
+#include <memory.h>
+#include <string.h>
+
+#include "executor.h"
+#include "execution_object.h"
+#include "configuration.h"
+
+#include "opencv2/core.hpp"
+#include "opencv2/imgproc.hpp"
+#include "opencv2/highgui.hpp"
+#include "opencv2/videoio.hpp"
+
+//#define TWO_ROIs
+#define LIVE_DISPLAY
+//#define PERF_VERBOSE
+
+//#define RMT_GST_STREAMER
+
+#define MAX_NUM_ROI 4
+
+int live_input = 1;
+char video_clip[320];
+
+#ifdef TWO_ROIs
+#define RES_X 400                                                              
+#define RES_Y 300                                                            
+#define NUM_ROI_X 2                                                     
+#define NUM_ROI_Y 1                                                      
+#define X_OFFSET 0                                                           
+#define X_STEP   176                                                        
+#define Y_OFFSET 52                                                         
+#define Y_STEP   224
+#else
+#define RES_X 244
+#define RES_Y 244                                                            
+#define NUM_ROI_X 1                                                     
+#define NUM_ROI_Y 1                                                      
+#define X_OFFSET 10                                                         
+#define X_STEP   224                                                     
+#define Y_OFFSET 10                                                    
+#define Y_STEP   224
+#endif
+
+int NUM_ROI = NUM_ROI_X * NUM_ROI_Y;
+
+//Temporal averaging
+int TOP_CANDIDATES = 2;
+
+using namespace tidl;
+using namespace cv;
+
+#ifdef LIVE_DISPLAY
+void imagenetCallBackFunc(int event, int x, int y, int flags, void* userdata)
+{
+    if  ( event == EVENT_RBUTTONDOWN )
+    {
+        std::cout << "Right button of the mouse is clicked - position (" << x << ", " << y << ")" << " ... prepare to exit!" << std::endl;
+        exit(0);
+    }
+}
+#endif
+
+static int tf_postprocess(uchar *in, int size, int roi_idx, int frame_idx, int f_id);
+static void tf_preprocess(uchar *out, uchar *in, int size);
+static int ShowRegion(int roi_history[]);
+static int selclass_history[MAX_NUM_ROI][3];  // from most recent to oldest at top indices
+
+bool __TI_show_debug_ = false;
+
+bool RunMultipleExecutors(const std::string& config_file_1,
+                          const std::string& config_file_2,
+                          uint32_t num_devices_available);
+
+bool RunConfiguration(const std::string& config_file, int num_devices,
+                      DeviceType device_type);
+bool RunAllConfigurations(int32_t num_devices, DeviceType device_type);
+
+bool ReadFrame(ExecutionObject&     eo,
+               int                  frame_idx,
+               const Configuration& configuration,
+               std::istream&        input_file);
+
+bool WriteFrame(const ExecutionObject &eo,
+                std::ostream& output_file);
+
+static void ProcessArgs(int argc, char *argv[],
+                        std::string& config_file,
+                        int& num_devices,
+                        DeviceType& device_type);
+
+static void DisplayHelp();
+extern std::string labels_classes[];
+extern int IMAGE_CLASSES_NUM;
+extern int selected_items_size;
+extern int selected_items[];
+extern int populate_selected_items (char *filename);
+extern void populate_labels (char *filename);
+
+static double ms_diff(struct timespec &t0, struct timespec &t1)
+{ return (t1.tv_sec - t0.tv_sec) * 1e3 + (t1.tv_nsec - t0.tv_nsec) / 1e6; }
+
+
+int main(int argc, char *argv[])
+{
+    // Catch ctrl-c to ensure a clean exit
+    signal(SIGABRT, exit);
+    signal(SIGTERM, exit);
+
+    // If there are no devices capable of offloading TIDL on the SoC, exit
+    uint32_t num_dla =
+                Executor::GetNumDevices(DeviceType::EVE);
+    uint32_t num_dsp =
+                Executor::GetNumDevices(DeviceType::DSP);
+    if (num_dla == 0 && num_dsp == 0)
+    {
+        std::cout << "TI DL not supported on this SoC." << std::endl;
+        return EXIT_SUCCESS;
+    }
+
+    // Process arguments
+    std::string config_file;
+    int         num_devices = 1;
+    DeviceType  device_type = DeviceType::EVE;
+    ProcessArgs(argc, argv, config_file, num_devices, device_type);
+
+    bool status = true;
+    if (!config_file.empty()) {
+        std::cout << "Run single configuration: " << config_file << std::endl;
+        status = RunConfiguration(config_file, num_devices, device_type);
+    } else
+    {
+        status = false;
+    }
+
+    if (!status)
+    {
+        std::cout << "tidl FAILED" << std::endl;
+        return EXIT_FAILURE;
+    }
+
+    std::cout << "tidl PASSED" << std::endl;
+    return EXIT_SUCCESS;
+}
+
+bool RunConfiguration(const std::string& config_file, int num_devices,
+                      DeviceType device_type)
+{
+    DeviceIds ids;
+    char imagenet_win[160];
+    for (int i = 0; i < num_devices; i++)
+        ids.insert(static_cast<DeviceId>(i));
+
+    // Read the TI DL configuration file
+    Configuration configuration;
+    bool status = configuration.ReadFromFile(config_file);
+    if (!status)
+    {
+        std::cerr << "Error in configuration file: " << config_file
+                  << std::endl;
+        return false;
+    }
+
+    std::ifstream input_data_file(configuration.inData, std::ios::binary);
+    std::ofstream output_data_file(configuration.outData, std::ios::binary);
+    assert (input_data_file.good());
+    assert (output_data_file.good());
+
+    sprintf(imagenet_win, "Imagenet_%sx%d", (device_type == DeviceType::EVE) ? "EVE" : "DSP", num_devices);
+
+    // Determine input frame size from configuration
+    size_t frame_sz_in = configuration.inWidth * configuration.inHeight *
+                         configuration.inNumChannels * (configuration.inNumChannels == 1 ? 1 : 1);
+    size_t frame_sz_out = configuration.inWidth * configuration.inHeight * 3;
+
+    try
+    {
+        // Create a executor with the approriate core type, number of cores
+        // and configuration specified
+        Executor executor(device_type, ids, configuration);
+
+
+        // Query Executor for set of ExecutionObjects created
+        const ExecutionObjects& execution_objects =
+                                                executor.GetExecutionObjects();
+        int num_eos = execution_objects.size();
+
+        // Allocate input and output buffers for each execution object
+        std::vector<void *> buffers;
+        for (auto &eo : execution_objects)
+        {
+            ArgInfo in  = { ArgInfo(malloc_ddr<char>(frame_sz_in),  frame_sz_in)};
+            ArgInfo out = { ArgInfo(malloc_ddr<char>(frame_sz_out), frame_sz_out)};
+            eo->SetInputOutputBuffer(in, out);
+
+            buffers.push_back(in.ptr());
+            buffers.push_back(out.ptr());
+        }
+
+#ifdef LIVE_DISPLAY
+    if(NUM_ROI > 1) 
+    {
+      for(int i = 0; i < NUM_ROI; i ++) {
+        char tmp_string[80];
+        sprintf(tmp_string, "ROI[%02d]", i);
+        namedWindow(tmp_string, WINDOW_AUTOSIZE | CV_GUI_NORMAL);
+      }
+    }
+    Mat sw_stack_image = imread("/usr/share/ti/tidl/examples/classification/tidl-sw-stack-small.png", IMREAD_COLOR); // Read the file
+    if( sw_stack_image.empty() )                      // Check for invalid input
+    {
+      std::cout <<  "Could not open or find the tidl-sw-stack-small image" << std::endl ;
+    } else {
+      namedWindow( "TIDL SW Stack", WINDOW_AUTOSIZE | CV_GUI_NORMAL ); // Create a window for display.
+      cv::imshow( "TIDL SW Stack", sw_stack_image );                // Show our image inside it.
+    }
+
+    namedWindow("ClassList", WINDOW_AUTOSIZE | CV_GUI_NORMAL);
+    namedWindow(imagenet_win, WINDOW_AUTOSIZE | CV_GUI_NORMAL);
+    //set the callback function for any mouse event
+    setMouseCallback(imagenet_win, imagenetCallBackFunc, NULL);
+
+    Mat classlist_image = cv::Mat::zeros(40 + selected_items_size * 20, 220, CV_8UC3);
+    char tmp_classwindow_string[160];
+    //Erase window
+    classlist_image.setTo(Scalar::all(0));
+
+    for (int i = 0; i < selected_items_size; i ++)
+    {
+      sprintf(tmp_classwindow_string, "%2d) %12s", 1+i, labels_classes[selected_items[i]].c_str());
+      cv::putText(classlist_image, tmp_classwindow_string,
+                  cv::Point(5, 40 + i * 20),
+                  cv::FONT_HERSHEY_COMPLEX_SMALL,
+                  0.75,
+                  cv::Scalar(255,255,255), 1, 8);
+    }
+    cv::imshow("ClassList", classlist_image);
+
+#endif
+    Mat r_frame, r_mframe, r_blend;
+    Mat to_stream;
+    VideoCapture cap;
+
+   if(live_input >= 0)
+   {
+      cap.open(live_input);
+      VideoWriter writer;  // gstreamer
+
+      const double fps = cap.get(CAP_PROP_FPS);
+      const int width  = cap.get(CAP_PROP_FRAME_WIDTH);
+      const int height = cap.get(CAP_PROP_FRAME_HEIGHT);
+      std::cout << "Capture camera with " << fps << " fps, " << width << "x" << height << " px" << std::endl;
+
+#ifdef RMT_GST_STREAMER
+      writer.open(" appsrc ! videoconvert ! video/x-raw, format=(string)NV12, width=(int)640, height=(int)480, framerate=(fraction)30/1 ! \
+                ducatih264enc bitrate=2000 ! queue ! h264parse config-interval=1 ! \
+                mpegtsmux ! udpsink host=158.218.102.235 sync=false port=5000",
+                0,fps,Size(640,480),true);
+
+      if (!writer.isOpened()) {
+        cap.release();
+        std::cerr << "Can't create gstreamer writer. Do you have the correct version installed?" << std::endl;
+        std::cerr << "Print out OpenCV build information" << std::endl;
+        std::cout << getBuildInformation() << std::endl;
+        return false;
+      }
+#endif
+   } else {
+     std::cout << "Video input clip: " << video_clip << std::endl;
+     cap.open(std::string(video_clip));
+      const double fps = cap.get(CAP_PROP_FPS);
+      const int width  = cap.get(CAP_PROP_FRAME_WIDTH);
+      const int height = cap.get(CAP_PROP_FRAME_HEIGHT);
+      std::cout << "Clip with " << fps << " fps, " << width << "x" << height << " px" << std::endl;
+
+   }
+   std::cout << "About to start ProcessFrame loop!!" << std::endl;
+
+
+    Rect rectCrop[NUM_ROI];
+    for (int y = 0; y < NUM_ROI_Y; y ++) {
+      for (int x = 0; x < NUM_ROI_X; x ++) {
+         rectCrop[y * NUM_ROI_X + x] = Rect(X_OFFSET + x * X_STEP, Y_OFFSET + y * Y_STEP, 224, 224);
+         std::cout << "Rect[" << X_OFFSET + x * X_STEP << ", " << Y_OFFSET + y * Y_STEP << "]" << std::endl;
+      }
+    }
+    int num_frames = 99999;
+
+    if (!cap.isOpened()) {
+      std::cout << "Video input not opened!" << std::endl;
+      return false;
+    }
+    Mat in_image, image, r_image, show_image, bgr_frames[3];
+    int is_object;
+    for(int k = 0; k < NUM_ROI; k++) {
+      for(int i = 0; i < 3; i ++) selclass_history[k][i] = -1;
+    }
+
+        #define MAX_NUM_EOS  4
+        struct timespec t0[MAX_NUM_EOS], t1;
+
+        // Process frames with available execution objects in a pipelined manner
+        // additional num_eos iterations to flush the pipeline (epilogue)
+        for (int frame_idx = 0;
+             frame_idx < configuration.numFrames + num_eos; frame_idx++)
+        {
+            ExecutionObject* eo = execution_objects[frame_idx % num_eos].get();
+
+            // Wait for previous frame on the same eo to finish processing
+            if (eo->ProcessFrameWait())
+            {
+                clock_gettime(CLOCK_MONOTONIC, &t1);
+                double elapsed_host =
+                                ms_diff(t0[eo->GetFrameIndex() % num_eos], t1);
+                double elapsed_device = eo->GetProcessTimeInMilliSeconds();
+                double overhead = 100 - (elapsed_device/elapsed_host*100);
+#ifdef PERF_VERBOSE
+                std::cout << "frame[" << eo->GetFrameIndex() << "]: "
+                          << "Time on device: "
+                          << std::setw(6) << std::setprecision(4)
+                          << elapsed_device << "ms, "
+                          << "host: "
+                          << std::setw(6) << std::setprecision(4)
+                          << elapsed_host << "ms ";
+                std::cout << "API overhead: "
+                          << std::setw(6) << std::setprecision(3)
+                          << overhead << " %" << std::endl;
+#endif
+
+             int f_id = eo->GetFrameIndex();
+             int curr_roi = f_id % NUM_ROI;
+             is_object = tf_postprocess((uchar*) eo->GetOutputBufferPtr(), IMAGE_CLASSES_NUM, curr_roi, frame_idx, f_id);
+             selclass_history[curr_roi][2] = selclass_history[curr_roi][1];
+             selclass_history[curr_roi][1] = selclass_history[curr_roi][0];
+             selclass_history[curr_roi][0] = is_object;
+
+             if(is_object >= 0) {
+                  std::cout << "frame[" << eo->GetFrameIndex() << "]: "
+                          << "Time on device: "
+                          << std::setw(6) << std::setprecision(4)
+                          << elapsed_device << "ms, "
+                          << "host: "
+                          << std::setw(6) << std::setprecision(4)
+                          << elapsed_host << "ms ";
+             }
+
+             for (int r = 0; r < NUM_ROI; r ++) 
+             {
+	        int rpt_id =  ShowRegion(selclass_history[r]);
+                if(rpt_id >= 0)
+                {
+                  // overlay the display window, if ball seen during last two times
+                  cv::putText(show_image, labels_classes[rpt_id].c_str(),
+                    cv::Point(rectCrop[r].x + 5,rectCrop[r].y + 20), // Coordinates
+                    cv::FONT_HERSHEY_COMPLEX_SMALL, // Font
+                    1.0, // Scale. 2.0 = 2x bigger
+                    cv::Scalar(0,0,255), // Color
+                    1, // Thickness
+                    8); // Line type
+                  cv::rectangle(show_image, rectCrop[r], Scalar(255,0,0), 3);
+                  std::cout << "ROI(" << r << ")(" << rpt_id << ")=" << labels_classes[rpt_id].c_str() << std::endl;
+
+                  classlist_image.setTo(Scalar::all(0));
+                  for (int k = 0; k < selected_items_size; k ++)
+                  {
+                     sprintf(tmp_classwindow_string, "%2d) %12s", 1+k, labels_classes[selected_items[k]].c_str());
+                     cv::putText(classlist_image, tmp_classwindow_string,
+                                 cv::Point(5, 40 + k * 20),
+                                 cv::FONT_HERSHEY_COMPLEX_SMALL,
+                                 0.75,
+                                 selected_items[k] == rpt_id ? cv::Scalar(0,0,255) : cv::Scalar(255,255,255), 1, 8);
+                  }
+                  sprintf(tmp_classwindow_string, "FPS:%5.2lf", (double)num_devices * 1000.0 / elapsed_host );
+                  cv::putText(classlist_image, tmp_classwindow_string,
+                              cv::Point(5, 20),
+                              cv::FONT_HERSHEY_COMPLEX_SMALL,
+                              0.75,
+                              cv::Scalar(0,255,0), 1, 8);
+                  cv::imshow("ClassList", classlist_image);
+               }
+             }
+#ifdef LIVE_DISPLAY
+             cv::imshow(imagenet_win, show_image);
+#endif
+
+#ifdef RMT_GST_STREAMER
+             cv::resize(show_image, to_stream, cv::Size(640,480));
+             writer << to_stream;
+#endif
+
+#ifdef LIVE_DISPLAY
+             waitKey(2);
+#endif
+
+            }
+
+
+        if (cap.grab() && frame_idx < num_frames)
+        {
+            if (cap.retrieve(in_image))
+            {
+                cv::resize(in_image, image, Size(RES_X,RES_Y));
+                r_image = Mat(image, rectCrop[frame_idx % NUM_ROI]);
+
+#ifdef LIVE_DISPLAY
+                if(NUM_ROI > 1)
+                {
+                   char tmp_string[80];
+                   sprintf(tmp_string, "ROI[%02d]", frame_idx % NUM_ROI);
+                   cv::imshow(tmp_string, r_image);
+                }
+#endif
+                //Convert from BGR pixel interleaved to BGR plane interleaved!
+                cv::split(r_image, bgr_frames);
+                tf_preprocess((uchar*) eo->GetInputBufferPtr(), bgr_frames[0].ptr(), 224*224);
+                tf_preprocess((uchar*) eo->GetInputBufferPtr()+224*224, bgr_frames[1].ptr(), 224*224);
+                tf_preprocess((uchar*) eo->GetInputBufferPtr()+2*224*224, bgr_frames[2].ptr(), 224*224);
+                eo->SetFrameIndex(frame_idx);
+                clock_gettime(CLOCK_MONOTONIC, &t0[frame_idx % num_eos]);
+                eo->ProcessFrameStartAsync();
+
+#ifdef RMT_GST_STREAMER
+                cv::resize(Mat(image, Rect(0,32,640,448)), to_stream, Size(640,480));
+                writer << to_stream;
+#endif
+
+#ifdef LIVE_DISPLAY
+                //waitKey(2);
+                image.copyTo(show_image);
+#endif
+            }
+        } else {
+          if(live_input == -1) {
+            //Rewind!
+            cap.release();
+            cap.open(std::string(video_clip)); 
+          }
+        }
+ 
+        }
+
+        for (auto b : buffers)
+            __free_ddr(b);
+
+    }
+    catch (tidl::Exception &e)
+    {
+        std::cerr << e.what() << std::endl;
+        status = false;
+    }
+
+
+    input_data_file.close();
+    output_data_file.close();
+
+    return status;
+}
+
+bool ReadFrame(ExecutionObject &eo, int frame_idx,
+               const Configuration& configuration,
+               std::istream& input_file)
+{
+    if (frame_idx >= configuration.numFrames)
+        return false;
+
+    char*  frame_buffer = eo.GetInputBufferPtr();
+    assert (frame_buffer != nullptr);
+
+    memset (frame_buffer, 0,  eo.GetInputBufferSizeInBytes());
+    input_file.read(frame_buffer, eo.GetInputBufferSizeInBytes() / (configuration.inNumChannels == 1 ? 2 : 1));
+
+    if (input_file.eof())
+        return false;
+
+    assert (input_file.good());
+
+    // Set the frame index  being processed by the EO. This is used to
+    // sort the frames before they are output
+    eo.SetFrameIndex(frame_idx);
+
+    if (input_file.good())
+        return true;
+
+    return false;
+}
+
+bool WriteFrame(const ExecutionObject &eo, std::ostream& output_file)
+{
+    output_file.write(
+            eo.GetOutputBufferPtr(), eo.GetOutputBufferSizeInBytes());
+    assert(output_file.good() == true);
+
+    if (output_file.good())
+        return true;
+
+    return false;
+}
+
+void ProcessArgs(int argc, char *argv[], std::string& config_file,
+                 int& num_devices, DeviceType& device_type)
+{
+    const struct option long_options[] =
+    {
+        {"labels_classes_file", required_argument, 0, 'l'},
+        {"selected_classes_file", required_argument, 0, 's'},
+        {"config_file", required_argument, 0, 'c'},
+        {"num_devices", required_argument, 0, 'n'},
+        {"device_type", required_argument, 0, 't'},
+        {"help",        no_argument,       0, 'h'},
+        {"verbose",     no_argument,       0, 'v'},
+        {0, 0, 0, 0}
+    };
+
+    int option_index = 0;
+
+    while (true)
+    {
+        int c = getopt_long(argc, argv, "l:c:s:i:n:t:hv", long_options, &option_index);
+
+        if (c == -1)
+            break;
+
+        switch (c)
+        {
+            case 'l': populate_labels(optarg);
+                      break;
+
+            case 's': populate_selected_items(optarg);
+                      break;
+
+            case 'i': if(strlen(optarg) == 1)
+                      {
+                        live_input = atoi(optarg);
+                      } else {
+                        live_input = -1;
+                        strcpy(video_clip, optarg);
+                      }
+                      break;
+
+            case 'c': config_file = optarg;
+                      break;
+
+            case 'n': num_devices = atoi(optarg);
+                      assert (num_devices > 0 && num_devices <= 4);
+                      break;
+
+            case 't': if (*optarg == 'e')
+                          device_type = DeviceType::EVE;
+                      else if (*optarg == 'd')
+                          device_type = DeviceType::DSP;
+                      else
+                      {
+                          std::cerr << "Invalid argument to -t, only e or d"
+                                       " allowed" << std::endl;
+                          exit(EXIT_FAILURE);
+                      }
+                      break;
+
+            case 'v': __TI_show_debug_ = true;
+                      break;
+
+            case 'h': DisplayHelp();
+                      exit(EXIT_SUCCESS);
+                      break;
+
+            case '?': // Error in getopt_long
+                      exit(EXIT_FAILURE);
+                      break;
+
+            default:
+                      std::cerr << "Unsupported option: " << c << std::endl;
+                      break;
+        }
+    }
+}
+
+void DisplayHelp()
+{
+    std::cout << "Usage: tidl\n"
+                 "  Will run all available networks if tidl is invoked without"
+                 " any arguments.\n  Use -c to run a single network.\n"
+                 "Optional arguments:\n"
+                 " -c                   Path to the configuration file\n"
+                 " -n <number of cores> Number of cores to use (1 - 4)\n"
+                 " -t <d|e>             Type of core. d -> DSP, e -> EVE\n"
+                 " -l                   List of label strings (of all classes in model)\n"
+                 " -s                   List of strings with selected classes\n"
+                 " -i                   Video input (for camera:0,1 or video clip)\n"
+                 " -v                   Verbose output during execution\n"
+                 " -h                   Help\n";
+
+}
+
+
+bool tf_expected_id(int id)
+{
+   // Filter out unexpected IDs
+   for (int i = 0; i < selected_items_size; i ++)
+   {
+       if(id == selected_items[i]) return true;
+   }
+   return false;
+}
+
+int tf_postprocess(uchar *in, int size, int roi_idx, int frame_idx, int f_id)
+{
+  // sort and get k largest values and corresponding indices
+  const int k = TOP_CANDIDATES;
+  int accum_in = 0;
+  int rpt_id = -1;
+
+  typedef std::pair<uchar, int> val_index;
+  auto constexpr cmp = [](val_index &left, val_index &right) { return left.first > right.first; };
+  std::priority_queue<val_index, std::vector<val_index>, decltype(cmp)> queue(cmp);
+  // initialize priority queue with smallest value on top
+  for (int i = 0; i < k; i++) {
+    queue.push(val_index(in[i], i));
+    accum_in += (int)in[i];
+  }
+  // for rest input, if larger than current minimum, pop mininum, push new val
+  for (int i = k; i < size; i++)
+  {
+    if (in[i] > queue.top().first)
+    {
+      queue.pop();
+      queue.push(val_index(in[i], i));
+    }
+    accum_in += (int)in[i];
+  }
+
+  // output top k values in reverse order: largest val first
+  std::vector<val_index> sorted;
+  while (! queue.empty())
+   {
+    sorted.push_back(queue.top());
+    queue.pop();
+  }
+
+  for (int i = k-1; i >= 0; i--)
+  {
+      int id = sorted[i].second;
+      char res2show[320];
+      bool found = false;
+
+      if (tf_expected_id(id))
+      {
+        std::cout << "Frame:" << frame_idx << "," << f_id << " ROI[" << roi_idx << "]: rank="
+                  << k-i << ", prob=" << (float) sorted[i].first / 255 << ", "
+                  << labels_classes[sorted[i].second] << " accum_in=" << accum_in << std::endl;
+        rpt_id = id;
+        found  = true;
+      }
+  }
+  return rpt_id;
+}
+
+void tf_preprocess(uchar *out, uchar *in, int size)
+{
+  for (int i = 0; i < size; i++)
+  {
+    out[i] = (uchar) (in[i] /*- 128*/);
+  }
+}
+
+int ShowRegion(int roi_history[])
+{
+  if((roi_history[0] >= 0) && (roi_history[0] == roi_history[1])) return roi_history[0];    
+  if((roi_history[0] >= 0) && (roi_history[0] == roi_history[2])) return roi_history[0];    
+  if((roi_history[1] >= 0) && (roi_history[1] == roi_history[2])) return roi_history[1];    
+  return -1;
+}
+
+
diff --git a/examples/classification/multiple_executors.cpp b/examples/classification/multiple_executors.cpp
new file mode 100644
index 0000000..78a1789
--- /dev/null
+++ b/examples/classification/multiple_executors.cpp
@@ -0,0 +1,216 @@
+/******************************************************************************
+ * Copyright (c) 2018, Texas Instruments Incorporated - http://www.ti.com/
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions are met:
+ *       * Redistributions of source code must retain the above copyright
+ *         notice, this list of conditions and the following disclaimer.
+ *       * Redistributions in binary form must reproduce the above copyright
+ *         notice, this list of conditions and the following disclaimer in the
+ *         documentation and/or other materials provided with the distribution.
+ *       * Neither the name of Texas Instruments Incorporated nor the
+ *         names of its contributors may be used to endorse or promote products
+ *         derived from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ *   AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ *   IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ *   ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ *   LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ *   CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ *   SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ *   INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ *   CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ *   ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+ *   THE POSSIBILITY OF SUCH DAMAGE.
+ *****************************************************************************/
+
+//! @file multiple_executors.cpp
+//! Illustrates how to setup multiple Executor instances using
+//! non-overlapping sets of device ids and running the Executor instances
+//! in parallel - each in its own thread
+
+#include <signal.h>
+#include <getopt.h>
+#include <iostream>
+#include <fstream>
+#include <cassert>
+#include <string>
+#include <functional>
+#include <algorithm>
+#include <pthread.h>
+
+#include "executor.h"
+#include "execution_object.h"
+#include "configuration.h"
+
+using namespace tidl;
+
+extern bool ReadFrame(ExecutionObject&     eo,
+               int                  frame_idx,
+               const Configuration& configuration,
+               std::istream&        input_file);
+
+extern bool WriteFrame(const ExecutionObject &eo,
+                std::ostream& output_file);
+
+void* run_network(void *data);
+
+struct ThreadArg
+{
+    std::string config_file;
+    DeviceIds ids;
+    ThreadArg(const DeviceIds& ids, const std::string& s):
+        ids(ids), config_file(s) {}
+
+};
+
+bool thread_status[2];
+
+bool RunMultipleExecutors(const std::string& config_file_1,
+                          const std::string& config_file_2,
+                          uint32_t num_devices_available)
+{
+    // If there is only 1 device available, skip
+    if (num_devices_available == 1)
+        return true;
+
+    DeviceIds ids1, ids2;
+
+    if (num_devices_available == 4)
+    {
+        ids1 = {DeviceId::ID2, DeviceId::ID3};
+        ids2 = {DeviceId::ID0, DeviceId::ID1};
+    }
+    else
+    {
+        ids1 = {DeviceId::ID0};
+        ids2 = {DeviceId::ID1};
+    }
+
+    // Set up devices and config files for each thread
+    ThreadArg arg1(ids2, config_file_1);
+    ThreadArg arg2(ids1, config_file_2);
+
+    // Run network 1 in a thread
+    std::cout << std::endl << "Multiple Executor..." << std::endl;
+    std::cout << "Running network "
+              << arg1.config_file.substr(arg1.config_file.find("tidl"))
+              << " on EVEs: ";
+    for (DeviceId id : arg1.ids)
+        std::cout << static_cast<int>(id) << " ";
+    std::cout << " in thread 0" << std::endl;
+
+    pthread_t network_thread_1;
+    pthread_create(&network_thread_1, 0, &run_network, &arg1);
+
+    // Run network 2 in a thread
+    std::cout << "Running network "
+              << arg2.config_file.substr(arg2.config_file.find("tidl"))
+              << " on EVEs: ";
+    for (DeviceId id : arg2.ids)
+        std::cout << static_cast<int>(id) << " ";
+    std::cout << " in thread 1" << std::endl;
+
+    pthread_t network_thread_2;
+    pthread_create(&network_thread_2, 0, &run_network, &arg2);
+
+    // Wait for both networks to complete
+    void *thread_return_val1;
+    void *thread_return_val2;
+    pthread_join(network_thread_1, &thread_return_val1);
+    pthread_join(network_thread_2, &thread_return_val2);
+
+    if (thread_return_val1 == 0 || thread_return_val2 == 0)
+    {
+        std::cout << "Multiple executors: FAILED" << std::endl;
+        return false;
+    }
+
+    std::cout << "Multiple executors: PASSED" << std::endl;
+    return true;
+}
+
+
+void* run_network(void *data)
+{
+    const ThreadArg* arg = static_cast<const ThreadArg *>(data);
+
+    const DeviceIds& ids = arg->ids;
+    const std::string& config_file = arg->config_file;
+
+    // Read the TI DL configuration file
+    Configuration configuration;
+    bool status = configuration.ReadFromFile(config_file);
+    assert (status != false);
+
+    configuration.outData += std::to_string(pthread_self());
+
+    // Open input and output files
+    std::ifstream input_data_file(configuration.inData, std::ios::binary);
+    std::ofstream output_data_file(configuration.outData, std::ios::binary);
+    assert (input_data_file.good());
+    assert (output_data_file.good());
+
+    // Determine input frame size from configuration
+    size_t frame_sz = configuration.inWidth * configuration.inHeight *
+                      configuration.inNumChannels;
+
+    try
+    {
+        // Create a executor with the approriate core type, number of cores
+        // and configuration specified
+        Executor executor(DeviceType::EVE, ids, configuration);
+
+        const ExecutionObjects& execution_objects =
+                                                executor.GetExecutionObjects();
+        int num_eos = execution_objects.size();
+
+        // Allocate input and output buffers for each execution object
+        std::vector<void *> buffers;
+        for (auto &eo : execution_objects)
+        {
+            ArgInfo in  = { ArgInfo(malloc_ddr<char>(frame_sz), frame_sz)};
+            ArgInfo out = { ArgInfo(malloc_ddr<char>(frame_sz), frame_sz)};
+            eo->SetInputOutputBuffer(in, out);
+
+            buffers.push_back(in.ptr());
+            buffers.push_back(out.ptr());
+        }
+
+        // Process frames with available execution objects in a pipelined manner
+        // additional num_eos iterations to flush the pipeline (epilogue)
+        for (int frame_idx = 0;
+             frame_idx < configuration.numFrames + num_eos; frame_idx++)
+        {
+            ExecutionObject* eo = execution_objects[frame_idx % num_eos].get();
+
+            // Wait for previous frame on the same eo to finish processing
+            if (eo->ProcessFrameWait())
+                WriteFrame(*eo, output_data_file);
+
+            // Read a frame and start processing it with current eo
+            if (ReadFrame(*eo, frame_idx, configuration, input_data_file))
+                eo->ProcessFrameStartAsync();
+        }
+
+
+        for (auto b : buffers)
+            __free_ddr(b);
+    }
+    catch (tidl::Exception &e)
+    {
+        std::cerr << e.what() << std::endl;
+        status = false;
+    }
+
+    input_data_file.close();
+    output_data_file.close();
+
+    // Return 1 for true, 0 for false. void * pattern follows example from:
+    // "Advanced programming in the Unix Environment"
+    if (!status) return ((void *)0);
+
+    return ((void *)1);
+}
diff --git a/examples/classification/readme.md b/examples/classification/readme.md
new file mode 100644
index 0000000..00177b0
--- /dev/null
+++ b/examples/classification/readme.md
@@ -0,0 +1,4 @@
+# Live camera input
+./tidl_classification -n 2 -t e -l labels.txt -i 1 -s ./classlist.txt -c ./stream_config_j11_v2.txt
+# Use video clip as input stream
+./tidl_classification -n 2 -t e -l labels.txt -i ./clips/test1.mp4 -s ./classlist.txt -c ./stream_config_j11_v2.txt
diff --git a/examples/classification/stream_config_dogs.txt b/examples/classification/stream_config_dogs.txt
new file mode 100644
index 0000000..a11f130
--- /dev/null
+++ b/examples/classification/stream_config_dogs.txt
@@ -0,0 +1,8 @@
+numFrames   = 9000
+inData   = /usr/share/ti/tidl/examples/test/testvecs/input/shar_pei.raw
+outData   = "/usr/share/ti/tidl/examples/classification/stats_tool_out.bin"
+netBinFile      = "/usr/share/ti/tidl/examples/test/testvecs/config/tidl_models/dogs_net_j11v2.bin"
+paramsBinFile   = "/usr/share/ti/tidl/examples//test/testvecs/config/tidl_models/dogs_param_j11v2.bin"
+inWidth = 224
+inHeight = 224
+inNumChannels = 3
diff --git a/examples/classification/stream_config_j11_v2.txt b/examples/classification/stream_config_j11_v2.txt
new file mode 100644
index 0000000..21db243
--- /dev/null
+++ b/examples/classification/stream_config_j11_v2.txt
@@ -0,0 +1,8 @@
+numFrames   = 9000
+inData   = /usr/share/ti/tidl/examples/test/testvecs/input/preproc_0_224x224.y
+outData   = "/usr/share/ti/tidl/examples/classification/stats_tool_out.bin"
+netBinFile      = "/usr/share/ti/tidl/examples/test/testvecs/config/tidl_models/tidl_net_imagenet_jacintonet11v2.bin"
+paramsBinFile   = "/usr/share/ti/tidl/examples//test/testvecs/config/tidl_models/tidl_param_imagenet_jacintonet11v2.bin"
+inWidth = 224
+inHeight = 224
+inNumChannels = 3
diff --git a/examples/classification/tidl-sw-stack-small.png b/examples/classification/tidl-sw-stack-small.png
new file mode 100644
index 0000000..e1e430b
Binary files /dev/null and b/examples/classification/tidl-sw-stack-small.png differ
diff --git a/examples/imagenet/main.cpp b/examples/imagenet/main.cpp
index 8fe8912..5d54c66 100644
--- a/examples/imagenet/main.cpp
+++ b/examples/imagenet/main.cpp
@@ -89,9 +89,9 @@ int main(int argc, char *argv[])
     signal(SIGTERM, exit);
 
     // If there are no devices capable of offloading TIDL on the SoC, exit
-    uint32_t num_dla = Executor::GetNumDevices(DeviceType::DLA);
+    uint32_t num_eve = Executor::GetNumDevices(DeviceType::EVE);
     uint32_t num_dsp = Executor::GetNumDevices(DeviceType::DSP);
-    if (num_dla == 0 && num_dsp == 0)
+    if (num_eve == 0 && num_dsp == 0)
     {
         std::cout << "TI DL not supported on this SoC." << std::endl;
         return EXIT_SUCCESS;
@@ -101,7 +101,7 @@ int main(int argc, char *argv[])
     std::string config      = "j11_v2";
     std::string input_file  = "../test/testvecs/input/objects/cat-pet-animal-domestic-104827.jpeg";
     int         num_devices = 1;
-    DeviceType  device_type = (num_dla > 0 ? DeviceType::DLA:DeviceType::DSP);
+    DeviceType  device_type = (num_eve > 0 ? DeviceType::EVE:DeviceType::DSP);
     ProcessArgs(argc, argv, config, num_devices, device_type, input_file);
 
     std::cout << "Input: " << input_file << std::endl;
@@ -316,7 +316,7 @@ bool WriteFrameOutput(const ExecutionObject &eo)
     for (int i = k - 1; i >= 0; i--)
     {
         std::cout << k-i << ": " << imagenet_classes[sorted[i].second]
-                  << ", prob = " << (float) sorted[i].first / 256 << std::endl;
+                  << std::endl;
     }
 
     return true;
@@ -357,7 +357,7 @@ void ProcessArgs(int argc, char *argv[], std::string& config,
                       break;
 
             case 't': if (*optarg == 'e')
-                          device_type = DeviceType::DLA;
+                          device_type = DeviceType::EVE;
                       else if (*optarg == 'd')
                           device_type = DeviceType::DSP;
                       else
@@ -398,7 +398,7 @@ void DisplayHelp()
                  "Optional arguments:\n"
                  " -c <config>          Valid configs: j11_bn, j11_prelu, j11_v2\n"
                  " -n <number of cores> Number of cores to use (1 - 4)\n"
-                 " -t <d|e>             Type of core. d -> DSP, e -> DLA\n"
+                 " -t <d|e>             Type of core. d -> DSP, e -> EVE\n"
                  " -i <image>           Path to the image file\n"
                  " -i camera            Use camera as input\n"
                  " -v                   Verbose output during execution\n"
diff --git a/examples/make.common b/examples/make.common
index aca5ece..e011117 100644
--- a/examples/make.common
+++ b/examples/make.common
@@ -28,7 +28,7 @@ RM = rm
 AR = ar
 CP = cp
 
-TIDL_API_DIR = ../../tidl_api
+TIDL_API_DIR ?= ${TARGET_ROOTDIR}/usr/share/ti/tidl/tidl_api
 TIDL_API_LIB_NAME = tidl_api.a
 TIDL_API_LIB = $(TIDL_API_DIR)/$(TIDL_API_LIB_NAME)
 TIDL_API_LIB_IMGUTIL_NAME = tidl_imgutil.a
@@ -51,18 +51,6 @@ LIBS    = -lOpenCL -locl_util -lpthread
 
 all: $(EXE)
 
-.PHONY: $(TIDL_API_LIB)
-$(TIDL_API_LIB):
-	$(MAKE) -C $(TIDL_API_DIR) $(TIDL_API_LIB_NAME)
-
-.PHONY: $(TIDL_API_LIB_IMGUTIL)
-$(TIDL_API_LIB_IMGUTIL):
-	$(MAKE) -C $(TIDL_API_DIR) $(TIDL_API_LIB_IMGUTIL_NAME)
-
-realclean: clean
-	$(MAKE) -C $(TIDL_API_DIR) clean
-	$(MAKE) -C $(TIDL_API_DIR)/dsp clean
-
 clean::
 	$(RM) -f $(EXE) stats_tool_out.* *.out
 
diff --git a/examples/segmentation/main.cpp b/examples/segmentation/main.cpp
index ec18328..86f81e5 100644
--- a/examples/segmentation/main.cpp
+++ b/examples/segmentation/main.cpp
@@ -97,9 +97,9 @@ int main(int argc, char *argv[])
     signal(SIGTERM, exit);
 
     // If there are no devices capable of offloading TIDL on the SoC, exit
-    uint32_t num_dla = Executor::GetNumDevices(DeviceType::DLA);
+    uint32_t num_eve = Executor::GetNumDevices(DeviceType::EVE);
     uint32_t num_dsp = Executor::GetNumDevices(DeviceType::DSP);
-    if (num_dla == 0 && num_dsp == 0)
+    if (num_eve == 0 && num_dsp == 0)
     {
         std::cout << "TI DL not supported on this SoC." << std::endl;
         return EXIT_SUCCESS;
@@ -109,7 +109,7 @@ int main(int argc, char *argv[])
     std::string config      = DEFAULT_CONFIG;
     std::string input_file  = DEFAULT_INPUT;
     int         num_devices = 1;
-    DeviceType  device_type = (num_dla > 0 ? DeviceType::DLA:DeviceType::DSP);
+    DeviceType  device_type = (num_eve > 0 ? DeviceType::EVE:DeviceType::DSP);
     ProcessArgs(argc, argv, config, num_devices, device_type, input_file);
 
     if ((object_class_table = GetObjectClassTable(config)) == nullptr)
@@ -422,7 +422,7 @@ void ProcessArgs(int argc, char *argv[], std::string& config,
                       break;
 
             case 't': if (*optarg == 'e')
-                          device_type = DeviceType::DLA;
+                          device_type = DeviceType::EVE;
                       else if (*optarg == 'd')
                           device_type = DeviceType::DSP;
                       else
@@ -463,7 +463,7 @@ void DisplayHelp()
                  "Optional arguments:\n"
                  " -c <config>          Valid configs: jseg21_tiscapes, jseg21\n"
                  " -n <number of cores> Number of cores to use (1 - 4)\n"
-                 " -t <d|e>             Type of core. d -> DSP, e -> DLA\n"
+                 " -t <d|e>             Type of core. d -> DSP, e -> EVE\n"
                  " -i <image>           Path to the image file\n"
                  "                      Default are 3 frames in testvecs\n"
                  " -i camera            Use camera as input\n"
diff --git a/examples/ssd_multibox/main.cpp b/examples/ssd_multibox/main.cpp
index c3d9c8e..6d39dda 100644
--- a/examples/ssd_multibox/main.cpp
+++ b/examples/ssd_multibox/main.cpp
@@ -98,11 +98,11 @@ int main(int argc, char *argv[])
     signal(SIGTERM, exit);
 
     // If there are no devices capable of offloading TIDL on the SoC, exit
-    uint32_t num_dla = Executor::GetNumDevices(DeviceType::DLA);
+    uint32_t num_eve = Executor::GetNumDevices(DeviceType::EVE);
     uint32_t num_dsp = Executor::GetNumDevices(DeviceType::DSP);
-    if (num_dla == 0 || num_dsp == 0)
+    if (num_eve == 0 || num_dsp == 0)
     {
-        std::cout << "ssd_multibox requires both DLA and DSP for execution."
+        std::cout << "ssd_multibox requires both EVE and DSP for execution."
                   << std::endl;
         return EXIT_SUCCESS;
     }
@@ -111,14 +111,14 @@ int main(int argc, char *argv[])
     std::string config      = DEFAULT_CONFIG;
     std::string input_file  = DEFAULT_INPUT;
     uint32_t num_devices    = 1;
-    DeviceType  device_type = DeviceType::DLA;
+    DeviceType  device_type = DeviceType::EVE;
     ProcessArgs(argc, argv, config, num_devices, device_type, input_file);
 
-    // Use same number of DLAs and DSPs
-    num_devices = std::min(num_devices, std::min(num_dla, num_dsp));
+    // Use same number of EVEs and DSPs
+    num_devices = std::min(num_devices, std::min(num_eve, num_dsp));
     if (num_devices == 0)
     {
-        std::cout << "Partitioned execution requires at least 1 DLA and 1 DSP."
+        std::cout << "Partitioned execution requires at least 1 EVE and 1 DSP."
                   << std::endl;
         return EXIT_FAILURE;
     }
@@ -190,30 +190,30 @@ bool RunConfiguration(const std::string& config_file, uint32_t num_devices,
     {
         // Create a executor with the approriate core type, number of cores
         // and configuration specified
-        // DLA will run layersGroupId 1 in the network, while
+        // EVE will run layersGroupId 1 in the network, while
         // DSP will run layersGroupId 2 in the network
-        Executor executor_dla(DeviceType::DLA, ids, configuration, 1);
+        Executor executor_eve(DeviceType::EVE, ids, configuration, 1);
         Executor executor_dsp(DeviceType::DSP, ids, configuration, 2);
 
         // Query Executor for set of ExecutionObjects created
-        const ExecutionObjects& execution_objects_dla =
-                                            executor_dla.GetExecutionObjects();
+        const ExecutionObjects& execution_objects_eve =
+                                            executor_eve.GetExecutionObjects();
         const ExecutionObjects& execution_objects_dsp =
                                             executor_dsp.GetExecutionObjects();
-        int num_eos = execution_objects_dla.size();
+        int num_eos = execution_objects_eve.size();
 
         // Allocate input and output buffers for each execution object
-        // Note that "out" is both the output of eo_dla and the input of eo_dsp
+        // Note that "out" is both the output of eo_eve and the input of eo_dsp
         // This is how two layersGroupIds, 1 and 2, are tied together
         std::vector<void *> buffers;
         for (int i = 0; i < num_eos; i++)
         {
-            ExecutionObject *eo_dla = execution_objects_dla[i].get();
-            size_t in_size  = eo_dla->GetInputBufferSizeInBytes();
-            size_t out_size = eo_dla->GetOutputBufferSizeInBytes();
+            ExecutionObject *eo_eve = execution_objects_eve[i].get();
+            size_t in_size  = eo_eve->GetInputBufferSizeInBytes();
+            size_t out_size = eo_eve->GetOutputBufferSizeInBytes();
             ArgInfo in  = { ArgInfo(malloc(in_size),  in_size)  };
             ArgInfo out = { ArgInfo(malloc(out_size), out_size) };
-            eo_dla->SetInputOutputBuffer(in, out);
+            eo_eve->SetInputOutputBuffer(in, out);
 
             ExecutionObject *eo_dsp = execution_objects_dsp[i].get();
             size_t out2_size = eo_dsp->GetOutputBufferSizeInBytes();
@@ -231,11 +231,11 @@ bool RunConfiguration(const std::string& config_file, uint32_t num_devices,
 
         // Process frames with available execution objects in a pipelined manner
         // additional num_eos iterations to flush the pipeline (epilogue)
-        ExecutionObject *eo_dla, *eo_dsp, *eo_input;
+        ExecutionObject *eo_eve, *eo_dsp, *eo_input;
         for (int frame_idx = 0;
              frame_idx < num_frames + num_eos; frame_idx++)
         {
-            eo_dla = execution_objects_dla[frame_idx % num_eos].get();
+            eo_eve = execution_objects_eve[frame_idx % num_eos].get();
             eo_dsp = execution_objects_dsp[frame_idx % num_eos].get();
 
             // Wait for previous frame on the same eo to finish processing
@@ -247,23 +247,23 @@ bool RunConfiguration(const std::string& config_file, uint32_t num_devices,
                            ms_diff(t0[finished_idx % num_eos], t1),
                            eo_dsp->GetProcessTimeInMilliSeconds());
 
-                eo_input = execution_objects_dla[finished_idx % num_eos].get();
+                eo_input = execution_objects_eve[finished_idx % num_eos].get();
                 WriteFrameOutput(*eo_input, *eo_dsp, configuration);
             }
 
             // Read a frame and start processing it with current eo
-            if (ReadFrame(*eo_dla, frame_idx, configuration, num_frames,
+            if (ReadFrame(*eo_eve, frame_idx, configuration, num_frames,
                           image_file, cap))
             {
                 clock_gettime(CLOCK_MONOTONIC, &t0[frame_idx % num_eos]);
-                eo_dla->ProcessFrameStartAsync();
+                eo_eve->ProcessFrameStartAsync();
 
-                if (eo_dla->ProcessFrameWait())
+                if (eo_eve->ProcessFrameWait())
                 {
                     clock_gettime(CLOCK_MONOTONIC, &t1);
-                    ReportTime(frame_idx, "DLA",
+                    ReportTime(frame_idx, "EVE",
                                ms_diff(t0[frame_idx % num_eos], t1),
-                               eo_dla->GetProcessTimeInMilliSeconds());
+                               eo_eve->GetProcessTimeInMilliSeconds());
 
                     clock_gettime(CLOCK_MONOTONIC, &t0[frame_idx % num_eos]);
                     eo_dsp->ProcessFrameStartAsync();
@@ -501,7 +501,7 @@ void DisplayHelp()
                  "  Will run partitioned ssd_multibox network to perform "
                  "multi-objects detection\n"
                  "  and classification.  First part of network "
-                 "(layersGroupId 1) runs on DLA,\n"
+                 "(layersGroupId 1) runs on EVE,\n"
                  "  second part (layersGroupId 2) runs on DSP.\n"
                  "  Use -c to run a different segmentation network. "
                  "Default is jdetnet.\n"
diff --git a/examples/test/main.cpp b/examples/test/main.cpp
index 645bb6f..bc87855 100644
--- a/examples/test/main.cpp
+++ b/examples/test/main.cpp
@@ -78,9 +78,9 @@ int main(int argc, char *argv[])
     signal(SIGTERM, exit);
 
     // If there are no devices capable of offloading TIDL on the SoC, exit
-    uint32_t num_dla = Executor::GetNumDevices(DeviceType::DLA);
+    uint32_t num_eve = Executor::GetNumDevices(DeviceType::EVE);
     uint32_t num_dsp = Executor::GetNumDevices(DeviceType::DSP);
-    if (num_dla == 0 && num_dsp == 0)
+    if (num_eve == 0 && num_dsp == 0)
     {
         std::cout << "TI DL not supported on this SoC." << std::endl;
         return EXIT_SUCCESS;
@@ -90,7 +90,7 @@ int main(int argc, char *argv[])
     // Process arguments
     std::string config_file;
     int         num_devices = 1;
-    DeviceType  device_type = DeviceType::DLA;
+    DeviceType  device_type = DeviceType::EVE;
     ProcessArgs(argc, argv, config_file, num_devices, device_type);
 
     bool status = true;
@@ -98,17 +98,26 @@ int main(int argc, char *argv[])
         status = RunConfiguration(config_file, num_devices, device_type);
     else
     {
-        if (num_dla > 0)
+        if (num_eve > 0)
         {
-            //TODO: Use memory availability to determine # devices
             // Run on 2 devices because there is not enough CMEM available by
             // default
-            if (num_dla = 4) num_dla = 2;
-            status = RunAllConfigurations(num_dla, DeviceType::DLA);
+            if (num_eve = 4)
+            {
+                std::cout
+                 << "Running on 2 EVE devices instead of the available 4 "
+                 << "due to insufficient OpenCL global memory. Refer the "
+                 << "TIDL API User's Guide, Frequently Asked Questions, "
+                 << "Section \"Insufficient OpenCL global memory\" for details "
+                 << "on increasing the amount of CMEM available for OpenCL."
+                 << std::endl;
+                num_eve = 2;
+            }
+            status = RunAllConfigurations(num_eve, DeviceType::EVE);
             status &= RunMultipleExecutors(
                      "testvecs/config/infer/tidl_config_j11_v2.txt",
                      "testvecs/config/infer/tidl_config_j11_cifar.txt",
-                     num_dla);
+                     num_eve);
         }
 
         if (num_dsp > 0)
@@ -243,7 +252,7 @@ bool RunAllConfigurations(int32_t num_devices, DeviceType device_type)
 {
     std::vector<std::string> configurations;
 
-    if (device_type == DeviceType::DLA)
+    if (device_type == DeviceType::EVE)
         configurations = {"dense_1x1",  "j11_bn", "j11_cifar",
                           "j11_controlLayers", "j11_prelu", "j11_v2",
                           "jseg21", "jseg21_tiscapes", "smallRoi", "squeeze1_1"};
@@ -259,7 +268,7 @@ bool RunAllConfigurations(int32_t num_devices, DeviceType device_type)
                                   + config + ".txt";
         std::cout << "Running " << config << " on " << num_devices
                   << " devices, type "
-                  << ((device_type == DeviceType::DLA) ? "EVE" : "DSP")
+                  << ((device_type == DeviceType::EVE) ? "EVE" : "DSP")
                   << std::endl;
 
         Configuration configuration;
@@ -367,7 +376,7 @@ void ProcessArgs(int argc, char *argv[], std::string& config_file,
                       break;
 
             case 't': if (*optarg == 'e')
-                          device_type = DeviceType::DLA;
+                          device_type = DeviceType::EVE;
                       else if (*optarg == 'd')
                           device_type = DeviceType::DSP;
                       else
@@ -404,7 +413,7 @@ void DisplayHelp()
                  "Optional arguments:\n"
                  " -c                   Path to the configuration file\n"
                  " -n <number of cores> Number of cores to use (1 - 4)\n"
-                 " -t <d|e>             Type of core. d -> DSP, e -> DLA\n"
+                 " -t <d|e>             Type of core. d -> DSP, e -> EVE\n"
                  " -v                   Verbose output during execution\n"
                  " -h                   Help\n";
 }
diff --git a/examples/test/multiple_executors.cpp b/examples/test/multiple_executors.cpp
index 173ee23..78a1789 100644
--- a/examples/test/multiple_executors.cpp
+++ b/examples/test/multiple_executors.cpp
@@ -161,7 +161,7 @@ void* run_network(void *data)
     {
         // Create a executor with the approriate core type, number of cores
         // and configuration specified
-        Executor executor(DeviceType::DLA, ids, configuration);
+        Executor executor(DeviceType::EVE, ids, configuration);
 
         const ExecutionObjects& execution_objects =
                                                 executor.GetExecutionObjects();
diff --git a/makefile b/makefile
new file mode 100644
index 0000000..e94e586
--- /dev/null
+++ b/makefile
@@ -0,0 +1,76 @@
+
+# Copyright (c) 2018 Texas Instruments Incorporated - http://www.ti.com/
+# All rights reserved.
+# 
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+# * Neither the name of Texas Instruments Incorporated nor the
+# names of its contributors may be used to endorse or promote products
+# derived from this software without specific prior written permission.
+# 
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+# THE POSSIBILITY OF SUCH DAMAGE.
+
+
+# makefile for building from the tidl-api git repo
+# Cross-compilation requires TARGET_ROOTDIR to be set.
+# E.g.
+# PSDK_LINUX=<path to Processor Linux SDK install>
+# TARGET_ROOTDIR=$PSDK_LINUX/linux-devkit/sysroots/armv7ahf-neon-linux-gnueabi
+
+ifneq (,$(findstring 86, $(shell uname -m)))
+DEST_DIR ?= $(CURDIR)/install/am57
+VIEWER_TARGET=x86
+ifeq ($(TARGET_ROOTDIR),)
+$(error Set TARGET_ROOTDIR to the ARM Linux devkit/filesystem)
+endif
+else
+VIEWER_TARGET=arm
+endif
+
+INSTALL_DIR_API = $(DEST_DIR)/usr/share/ti/tidl
+INSTALL_DIR_EXAMPLES = $(DEST_DIR)/usr/share/ti/examples/tidl
+
+CP_ARGS = -Prf
+ifneq (,$(findstring 86, $(shell uname -m)))
+CP_ARGS += --preserve=mode,timestamps --no-preserve=ownership
+endif
+
+build-api:
+	$(MAKE) -C tidl_api
+
+build-examples: install-api
+	$(MAKE) -C examples
+
+# Build HTML from Sphinx RST, requires Sphinx to be installed
+build-docs:
+	$(MAKE) -C docs
+
+install-api: build-api
+	mkdir -p $(INSTALL_DIR_API)
+	cp $(CP_ARGS) tidl_api $(INSTALL_DIR_API)/
+
+install-examples: build-examples
+	mkdir -p $(INSTALL_DIR_EXAMPLES)
+	cp $(CP_ARGS) examples/* $(INSTALL_DIR_EXAMPLES)/
+
+build-viewer:
+	$(MAKE) TARGET=$(VIEWER_TARGET) -C viewer
+
+clean:
+	$(MAKE) -C tidl_api	clean
+	$(MAKE) -C examples	clean
diff --git a/readme.md b/readme.md
index 91f447c..092f09b 100644
--- a/readme.md
+++ b/readme.md
@@ -1,4 +1,4 @@
 TI Deep Learning (TIDL) API
 ---------------------------
 
-TIDL API brings Deep Learning to the edge and enables Linux applications to leverage TIâs proprietary CNN/DNN implementation on Deep Learning Accelerators (DLAs) and C66x DSPs in AM57x SoCs.  It requires OpenCL v1.1.15.1 or newer. Refer the User's Guide for details: http://software-dl.ti.com/mctools/esd/docs/tidl-api/index.html
+TIDL API brings Deep Learning to the edge and enables Linux applications to leverage TIâs proprietary CNN/DNN implementation on EVEs and C66x DSPs in AM57x SoCs.  It requires OpenCL v1.1.15.1 or newer. Refer the User's Guide for details: http://software-dl.ti.com/mctools/esd/docs/tidl-api/index.html
diff --git a/tidl_api/inc/executor.h b/tidl_api/inc/executor.h
index 9c6238f..2b20eaf 100644
--- a/tidl_api/inc/executor.h
+++ b/tidl_api/inc/executor.h
@@ -44,14 +44,14 @@ namespace tidl {
 
 //! Enumerates types of devices available to offload the network.
 enum class DeviceType { DSP, /**< Offload to C66x DSP */
-                        DLA  /**< Offload to TI DLA */
+                        EVE  /**< Offload to TI EVE */
                       };
 
 //! Enumerates IDs for devices of a given type.
-enum class DeviceId : int { ID0=0, /**< DSP1 or DLA1 */
-                            ID1,   /**< DSP2 or DLA2 */
-                            ID2,   /**< DLA3 */
-                            ID3    /**< DLA4 */
+enum class DeviceId : int { ID0=0, /**< DSP1 or EVE1 */
+                            ID1,   /**< DSP2 or EVE2 */
+                            ID2,   /**< EVE3 */
+                            ID3    /**< EVE4 */
                           };
 
 //! Used to specify the set of devices available to an Executor
@@ -79,10 +79,10 @@ class Executor
         //!   Configuration configuration;
         //!   configuration.ReadFromFile("path to configuration file");
         //!   DeviceIds ids1 = {DeviceId::ID2, DeviceId::ID3};
-        //!   Executor executor(DeviceType::DLA, ids, configuration);
+        //!   Executor executor(DeviceType::EVE, ids, configuration);
         //! @endcode
         //!
-        //! @param device_type DSP or EVE/DLA device
+        //! @param device_type DSP or EVE device
         //! @param ids Set of devices uses by this instance of the Executor
         //! @param configuration Configuration used to initialize the Executor
         //! @param layers_group_id Layers group that this Executor should run
@@ -100,7 +100,7 @@ class Executor
 
         //! @brief Returns the number of devices of the specified type
         //! available for TI DL.
-        //! @param  device_type DSP or EVE/DLA device
+        //! @param  device_type DSP or EVE/EVE device
         //! @return number of devices available
         static uint32_t GetNumDevices(DeviceType device_type);
 
diff --git a/tidl_api/make.buildid b/tidl_api/make.buildid
index 4137d81..5de94ef 100644
--- a/tidl_api/make.buildid
+++ b/tidl_api/make.buildid
@@ -27,6 +27,7 @@
 MAJOR_VER=1
 MINOR_VER=1
 PATCH_VER=0
+BUILD_VER=2
 
 ifeq ($(shell git rev-parse --short HEAD 2>&1 1>/dev/null; echo $$?),0)
 BUILD_SHA?=$(shell git rev-parse --short HEAD)
@@ -34,6 +35,6 @@ endif
 
 .PHONY: $(BUILD_ID)
 BUILD_ID := -D_BUILD_VER=$(shell echo "" | \
-                awk '{ printf ("%02d.%02d.%02d", $(MAJOR_VER), \
-                $(MINOR_VER), $(PATCH_VER)); }') \
+                awk '{ printf ("%02d.%02d.%02d.%02d", $(MAJOR_VER), \
+                $(MINOR_VER), $(PATCH_VER), $(BUILD_VER)); }') \
 			-D_BUILD_SHA=$(BUILD_SHA)
diff --git a/tidl_api/src/executor.cpp b/tidl_api/src/executor.cpp
index 0e329cf..6283a98 100644
--- a/tidl_api/src/executor.cpp
+++ b/tidl_api/src/executor.cpp
@@ -84,7 +84,7 @@ ExecutorImpl::ExecutorImpl(DeviceType core_type, const DeviceIds& ids,
     std::string name;
     if (core_type_m == DeviceType::DSP)
         name  = "";
-    else if (core_type_m == DeviceType::DLA)
+    else if (core_type_m == DeviceType::EVE)
         name = STRING(SETUP_KERNEL) ";" STRING(INIT_KERNEL) ";" STRING(PROCESS_KERNEL) ";" STRING(CLEANUP_KERNEL);
 
     device_m = Device::Create(core_type_m, ids, name);
diff --git a/tidl_api/src/ocl_device.cpp b/tidl_api/src/ocl_device.cpp
index 2e0b0bd..a3853a5 100644
--- a/tidl_api/src/ocl_device.cpp
+++ b/tidl_api/src/ocl_device.cpp
@@ -48,7 +48,7 @@ Device::Device(cl_device_type t, const DeviceIds& ids):
 {
     TRACE::print("\tOCL Device: %s created\n",
               device_type_m == CL_DEVICE_TYPE_ACCELERATOR ? "DSP" :
-              device_type_m == CL_DEVICE_TYPE_CUSTOM ? "DLA" : "Unknown");
+              device_type_m == CL_DEVICE_TYPE_CUSTOM ? "EVE" : "Unknown");
 
     for (int i = 0; i < MAX_DEVICES; i++)
         queue_m[i] = nullptr;
@@ -465,7 +465,7 @@ Device::Ptr Device::Create(DeviceType core_type, const DeviceIds& ids,
     Device::Ptr p(nullptr);
     if (core_type == DeviceType::DSP)
         p.reset(new DspDevice(ids, name));
-    else if (core_type == DeviceType::DLA)
+    else if (core_type == DeviceType::EVE)
         p.reset(new EveDevice(ids, name));
 
     return p;
@@ -503,7 +503,7 @@ uint32_t Device::GetNumDevices(DeviceType device_type)
     if (!PlatformIsAM57()) return 0;
 
     // Convert DeviceType to OpenCL device type
-    cl_device_type t = (device_type == DeviceType::DLA) ?
+    cl_device_type t = (device_type == DeviceType::EVE) ?
                                     CL_DEVICE_TYPE_CUSTOM :
                                     CL_DEVICE_TYPE_ACCELERATOR;