Added graph viewer for TIDL API timestamp data
authorAjay Jayaraj <ajayj@ti.com>
Wed, 24 Oct 2018 14:40:08 +0000 (09:40 -0500)
committerAjay Jayaraj <ajayj@ti.com>
Thu, 25 Oct 2018 20:55:48 +0000 (15:55 -0500)
1. Created a python matplotlib based utility for viewing timestamp data
   generated from the TIDL API (viewer/execution_graph.py)
2. Minor updates to API internals to add ExecutionObject type, index to
   timestamp output

(MCT-1073)

14 files changed:
docs/source/changelog.rst
docs/source/conf.py
docs/source/images/execution_graph_1eo.png [new file with mode: 0644]
docs/source/viewer.rst
examples/one_eo_per_frame/main.cpp
examples/two_eo_per_frame_opt/main.cpp
tidl_api/inc/execution_object.h
tidl_api/src/execution_object.cpp
tidl_api/src/executor.cpp
tidl_api/src/ocl_device.cpp
tidl_api/src/ocl_device.h
tidl_api/src/util.cpp
tidl_api/src/util.h
viewer/execution_graph.py [new file with mode: 0755]

index 93ddfedb5e3c0a7d140e8ccde794cfa5e4bc430e..17fbe83b416c4a6569b025aaa3a11bf44cef9fcf 100644 (file)
@@ -6,21 +6,44 @@ Changelog
 ===============================
 **Added**
 
-* Python 3 bindings for TIDL API
+#. Execution Graph generation
+
+    Enable a two phase approach to generating execution graphs. Use the
+    following API function to enable timestamp generation:
+
+    .. code::
+
+        bool EnableTimeStamps(const std::string& file = "timestamp.log", size_t num_frames=32);
+
+    The generated log file can be viewed by using the execution_graph.py script. Refer to :ref:`execution-graph` for details.
+
+#. Python 3 bindings for TIDL API
 
 **Removed**
 
-* Configuration::enableInternalInput, not used.
-* Execution::GetExecutionObjects. Use Execution::operator[] and Execution::GetNumExecutionObjects() instead. See :ref:`examples` for usage.
+#. Configuration::enableInternalInput. Not used by the API.
+
+#. Execution::GetExecutionObjects().
+
+   Use Execution::operator[] and Execution::GetNumExecutionObjects() instead.
+   See :ref:`examples` for usage.
+
+#. The timing methods for host execution in EOPs and EOs:
+
+   * GetProcessTimeInMilliSeconds()
+   * GetHostProcessTimeInMilliSeconds()
+
+   These methods were replaced by a timestamp based approach because they were
+   no longer accurate with multiple ExecutionObject contexts and pipelining.
 
 1.1.0 [Processor Linux SDK 5.1]
 ===============================
 **Added**
 
-* :term:`ExecutionObjectPipeline` class to hide complexity of executing network across C66x/EVE
-* API methods for tracing outputs from intermediate network layers - see :ref:`network_layer_output`.
-* Support for updating layer group id assignment before execution - see :ref:`layer-group-override`.
-* Provide feedback to the user on parameter and network heap size requirements - see :ref:`sizing_device_heaps`.
+#. :term:`ExecutionObjectPipeline` class to hide complexity of executing network across C66x/EVE
+#. API methods for tracing outputs from intermediate network layers - see :ref:`network_layer_output`.
+#. Support for updating layer group id assignment before execution - see :ref:`layer-group-override`.
+#. Provide feedback to the user on parameter and network heap size requirements - see :ref:`sizing_device_heaps`.
 
 
 1.0.0 [Processor Linux SDK 5.0]
@@ -29,15 +52,16 @@ First release of the TI Deep Learning API. TIDL API brings deep learning to the
 
 **Supported AM57x Sitara Processors**
 
+ * `AM5749`_ (offload to EVEs and C66x DSPs)
  * `AM571x`_ (offload to C66x DSPs)
  * `AM5728`_ (offload to C66x DSPs)
- * `AM574x`_ (offload to EVEs and C66x DSPs)
+ * `AM5748`_ (offload to C66x DSPs)
 
 **Supported Evaluation Modules (EVMs)**
 
+ * `AM574x IDK EVM`_
  * `AM572x EVM`_
  * `AM571x IDK EVM`_
- * `AM574x IDK EVM`_
 
 
 .. _AM572x EVM:  http://www.ti.com/tool/tmdsevm572x
@@ -45,4 +69,6 @@ First release of the TI Deep Learning API. TIDL API brings deep learning to the
 .. _AM574x IDK EVM:  http://www.ti.com/tool/tmdsidk574
 .. _AM571x:     http://www.ti.com/processors/sitara/arm-cortex-a15/am57x/products.html#p2098=1%20C66x&p809=2;2
 .. _AM5728:     http://www.ti.com/product/AM5728
+.. _AM5748:     http://www.ti.com/product/am5748
+.. _AM5749:     http://www.ti.com/product/am5749
 .. _AM574x:     http://www.ti.com/processors/sitara/arm-cortex-a15/am57x/products.html#p2098=2%20C66x&p815=ECC
index f01f94295767c39df43401d50f311badda5c501d..203aabfa69bedab3af25eee9916771986b5e15c6 100644 (file)
@@ -60,9 +60,9 @@ copyright = u'2018, Texas Instruments Incorporated'
 # built documents.
 #
 # The short X.Y version.
-version = '1.1'
+version = '1.2.x'
 # The full version, including alpha/beta/rc tags.
-release = '1.1.0'
+release = '1.2.x'
 
 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.
diff --git a/docs/source/images/execution_graph_1eo.png b/docs/source/images/execution_graph_1eo.png
new file mode 100644 (file)
index 0000000..86759a3
Binary files /dev/null and b/docs/source/images/execution_graph_1eo.png differ
index bae591d1a8da0430f58bf983cdd4db65f0a1052a..280a7c756ef940097dc2ce2f6a1f17e0b67fe1cf 100644 (file)
@@ -1,6 +1,9 @@
-**************
-Network Viewer
-**************
+*******
+Viewers
+*******
+
+Network graph viewer
+++++++++++++++++++++
 
 The TIDL network viewer utility, ``tidl_viewer``, can be used to view the network graph. If the ``dot`` utility is available, ``tidl_viewer`` uses it to convert the dot file to svg. ``dot`` is included in the ``Graphviz`` Ubuntu package and typically installed to ``/usr/bin/dot``.
 
@@ -25,3 +28,45 @@ On x86/Linux, if ``/usr/bin/dot`` is available, ``tidl_viewer`` also generates a
 
     * `Graphviz <https://www.graphviz.org/>`_
     * `Ubuntu Graphviz package <https://packages.ubuntu.com/search?keywords=graphviz>`_
+
+
+
+.. _execution-graph:
+
+Execution Graph viewer
+++++++++++++++++++++++
+
+TIDL API v1.2 enables generation of a log file with timestamps for the following methods in the :term:`ExecutionObjectPipeline` and :term:`ExecutionObject` classes:
+
+* ProcessFrameStartAsync (start)
+* ProcessFrameStartAsync (end)
+* ProcessFrameWait (start)
+* ProcessFrameWait (end)
+
+Add the following call to the TIDL API application to enable timestamp generation:
+
+    .. code:: c++
+
+        bool EnableTimeStamps(const std::string& file, size_t num_frames);
+
+The generated log file can be viewed on x86/Linux by using the ``execution_graph.py`` script located at ``viewer/execution_graph.py``.
+
+    .. code:: bash
+
+        -> <install path>/viewer/execution_graph.py timestamp.log
+
+:numref:`Execution Graph Output` shows the output of ``execution_graph.py`` for the ``one_eo_per_frame`` example when run on an AM5749 with 2 EVEs and 2 C66x DSPs:
+
+.. _`Execution Graph Output`:
+
+.. figure:: images/execution_graph_1eo.png
+    :align: center
+    :width: 80%
+    :alt: Output from execution_graph.py
+
+    Graph of frame index vs. execution time generated by execution_graph.py (click to enlarge)
+
+
+.. note::
+    ``execution_graph.py`` requires the Python `Matplotlib <https://matplotlib.org/users/installing.html>`_ module.
+
index 36197c82905040efc85dbb0eaeba9e817abdfd1b..8872c06438d3aed18c54d8ebbcbd0bcf6fa320cd 100644 (file)
@@ -74,6 +74,11 @@ int main(int argc, char *argv[])
 
     unique_ptr<const char> reference_output(ReadReferenceOutput(ref_file));
 
+    // Enable time stamp generation. The timestamp file is post processed
+    // by execution_graph.py to generate graphical view of frame execution.
+    // Refer to the User's Guide for details.
+    EnableTimeStamps("1eo.log");
+
     bool status = Run(config_file, num_eve, num_dsp, reference_output.get());
 
     if (!status)
index 659485e1dbf7e1c053593a2bc4a2bad6d5cb6dd0..43310c034f1596b0c028c1889c9e7c1cfb800e9f 100644 (file)
@@ -69,6 +69,11 @@ int main(int argc, char *argv[])
     string ref_file ="../test/testvecs/reference/j11_v2_ref.bin";
     unique_ptr<const char> reference_output(ReadReferenceOutput(ref_file));
 
+    // Enable time stamp generation. The timestamp file is post processed
+    // by execution_graph.py to generate graphical view of frame execution.
+    // Refer to the User's Guide for details.
+    EnableTimeStamps("2eo_opt.log");
+
     bool status = Run(num_eve, num_dsp, reference_output.get());
 
     if (!status)
index 5abe33d15370315ebc388dbb49ef8626218d70a0..d3fb16cb1bc24236c313e616d048d1b024028499 100644 (file)
@@ -52,7 +52,7 @@ class ExecutionObject : public ExecutionObjectInternalInterface
 
         //! @private
         // Used by the Executor to construct an ExecutionObject
-        ExecutionObject(Device* d, uint8_t device_index,
+        ExecutionObject(Device* d, DeviceType t, uint8_t device_index,
                         const  ArgInfo& create_arg,
                         const  ArgInfo& param_heap_arg,
                         const  Configuration& configuration,
index b6292da6761a35f962707839f01f7065a39d09cc..95c7aa25ea8034a78d64437977c82b382f351c78 100644 (file)
@@ -49,7 +49,7 @@ using namespace tidl;
 class ExecutionObject::Impl
 {
     public:
-        Impl(Device* d, uint8_t device_index,
+        Impl(Device* d, DeviceType t, uint8_t device_index,
              const DeviceArgInfo& create_arg,
              const DeviceArgInfo& param_heap_arg,
              const Configuration& configuration,
@@ -65,12 +65,21 @@ class ExecutionObject::Impl
         void AcquireContext(uint32_t& context_idx);
         void ReleaseContext(uint32_t  context_idx);
 
+        // Trace related
+        void WriteLayerOutputsToFile (const std::string& filename_prefix) const;
+        const LayerOutput* GetOutputFromLayer (uint32_t layer_index,
+                                               uint32_t output_index) const;
+        const LayerOutputs* GetOutputsFromAllLayers() const;
+
+
         Device*                         device_m;
+        DeviceType                      device_type_m;
+
         // Index of the OpenCL device/queue used by this EO
         uint8_t                         device_index_m;
         std::string                     device_name_m;
 
-        up_malloc_ddr<char>             tidl_extmem_heap_m;
+        up_malloc_ddr<char>                      tidl_extmem_heap_m;
         up_malloc_ddr<OCL_TIDL_InitializeParams> shared_initialize_params_m;
         up_malloc_ddr<OCL_TIDL_ProcessParams>    shared_process_params_m;
 
@@ -85,13 +94,6 @@ class ExecutionObject::Impl
         // LayersGroupId being processed by the EO
         int layers_group_id_m;
 
-        // Trace related
-        void WriteLayerOutputsToFile (const std::string& filename_prefix) const;
-
-        const LayerOutput* GetOutputFromLayer (uint32_t layer_index,
-                                               uint32_t output_index) const;
-        const LayerOutputs* GetOutputsFromAllLayers() const;
-
         uint32_t                          num_network_layers_m;
         up_malloc_ddr<OCL_TIDL_BufParams> trace_buf_params_m;
         size_t                            trace_buf_params_sz_m;
@@ -120,12 +122,13 @@ class ExecutionObject::Impl
 };
 
 
-ExecutionObject::ExecutionObject(Device* d,
-                                 uint8_t device_index,
-                                 const   ArgInfo& create_arg,
-                                 const   ArgInfo& param_heap_arg,
-                                 const   Configuration& configuration,
-                                 int     layers_group_id)
+ExecutionObject::ExecutionObject(Device*    d,
+                                 DeviceType t,
+                                 uint8_t    device_index,
+                                 const      ArgInfo& create_arg,
+                                 const      ArgInfo& param_heap_arg,
+                                 const      Configuration& configuration,
+                                 int        layers_group_id)
 {
     TRACE::print("-> ExecutionObject::ExecutionObject()\n");
 
@@ -133,7 +136,7 @@ ExecutionObject::ExecutionObject(Device* d,
     DeviceArgInfo param_heap_arg_d(param_heap_arg, DeviceArgInfo::Kind::BUFFER);
 
     pimpl_m = std::unique_ptr<ExecutionObject::Impl>
-              { new ExecutionObject::Impl(d, device_index,
+              { new ExecutionObject::Impl(d, t, device_index,
                                           create_arg_d,
                                           param_heap_arg_d,
                                           configuration,
@@ -142,12 +145,13 @@ ExecutionObject::ExecutionObject(Device* d,
 }
 
 
-ExecutionObject::Impl::Impl(Device* d, uint8_t device_index,
+ExecutionObject::Impl::Impl(Device* d, DeviceType t, uint8_t device_index,
                             const DeviceArgInfo& create_arg,
                             const DeviceArgInfo& param_heap_arg,
                             const Configuration& configuration,
                             int    layers_group_id):
     device_m(d),
+    device_type_m(t),
     device_index_m(device_index),
     tidl_extmem_heap_m (nullptr, &__free_ddr),
     shared_initialize_params_m(nullptr, &__free_ddr),
@@ -569,7 +573,9 @@ bool ExecutionObject::Impl::RunAsync(CallType ct, uint32_t context_idx)
         {
             RecordEvent(current_frame_idx_m[context_idx],
                         (layers_group_id_m == 1) ? TimeStamp::EO1_PFSA_START:
-                                                   TimeStamp::EO2_PFSA_START);
+                                                   TimeStamp::EO2_PFSA_START,
+                        static_cast<int>(device_type_m),
+                        device_index_m);
 
             OCL_TIDL_ProcessParams *p_params = shared_process_params_m.get()
                                                + context_idx;
@@ -617,15 +623,11 @@ bool ExecutionObject::Impl::Wait(CallType ct, uint32_t context_idx)
         }
         case CallType::PROCESS:
         {
-            float host_elapsed_ms = 0.0f;
-            float *p_host_elapsed_ms = tidl::internal::NUM_CONTEXTS == 1 ?
-                                       &host_elapsed_ms : nullptr;
-
             RecordEvent(current_frame_idx_m[context_idx],
                         (layers_group_id_m == 1) ? TimeStamp::EO1_PFW_START:
                                                    TimeStamp::EO2_PFW_START);
 
-            bool has_work = k_process_m->Wait(p_host_elapsed_ms, context_idx);
+            bool has_work = k_process_m->Wait(context_idx);
             if (has_work)
             {
                 OCL_TIDL_ProcessParams *p_params = shared_process_params_m.get()
index 2535f90aef11625baea0d5818d7ddecdbd3489d5..ca5969623d8f90f9f41eeba07d4be0be8d4a2246 100644 (file)
@@ -153,7 +153,7 @@ bool ExecutorImpl::Initialize(const Configuration& configuration)
         uint8_t index = static_cast<uint8_t>(ids);
         execution_objects_m.push_back(
              unique_ptr<ExecutionObject>
-             {new ExecutionObject(device_m.get(), index,
+             {new ExecutionObject(device_m.get(), core_type_m, index,
                                   create_arg, param_heap_arg,
                                   configuration_m,
                                   layers_group_id_m)} );
index 74e1150263b6d4a0ca1dcd3225e6de16e26aae5d..8e6d16b6bbe0f7bff49a901cca6d918368cd1490 100644 (file)
@@ -262,14 +262,16 @@ bool EveDevice::BuildProgramFromBinary(const std::string& kernel_names,
 
 Kernel::Kernel(Device* device, const std::string& name,
                const KernelArgs& args, uint8_t device_index):
-           name_m(name), device_m(device), device_index_m(device_index),
-           num_running_contexts_m(0)
+           name_m(name), device_m(device), device_index_m(device_index)
 {
     TRACE::print("Creating kernel %s\n", name.c_str());
     cl_int err;
     kernel_m = clCreateKernel(device_m->program_m, name_m.c_str(), &err);
     errorCheck(err, __LINE__);
 
+    for (int i=0; i < tidl::internal::NUM_CONTEXTS; i++)
+        event_m[i] = nullptr;
+
     int arg_index = 0;
     for (const auto& arg : args)
     {
@@ -319,38 +321,26 @@ Kernel& Kernel::RunAsync(uint32_t context_idx)
     cl_int ret = clEnqueueTask(device_m->queue_m[device_index_m],
                                kernel_m, 0, 0, &event_m[context_idx]);
     errorCheck(ret, __LINE__);
-    __sync_fetch_and_add(&num_running_contexts_m, 1);
 
     return *this;
 }
 
-
-bool Kernel::Wait(float *host_elapsed_ms, uint32_t context_idx)
+bool Kernel::Wait(uint32_t context_idx)
 {
     // Wait called without a corresponding RunAsync
-    if (num_running_contexts_m == 0)
+    if (event_m[context_idx] == nullptr)
         return false;
 
     TRACE::print("\tKernel: waiting context %d...\n", context_idx);
     cl_int ret = clWaitForEvents(1, &event_m[context_idx]);
     errorCheck(ret, __LINE__);
 
-    if (host_elapsed_ms != nullptr)
-    {
-        cl_ulong t_que, t_end;
-        clGetEventProfilingInfo(event_m[context_idx],
-                                CL_PROFILING_COMMAND_QUEUED,
-                                sizeof(cl_ulong), &t_que, nullptr);
-        clGetEventProfilingInfo(event_m[context_idx], CL_PROFILING_COMMAND_END,
-                                sizeof(cl_ulong), &t_end, nullptr);
-        *host_elapsed_ms = (t_end - t_que) / 1.0e6;  // nano to milli seconds
-    }
-
     ret = clReleaseEvent(event_m[context_idx]);
     errorCheck(ret, __LINE__);
+    event_m[context_idx] = nullptr;
+
     TRACE::print("\tKernel: finished execution\n");
 
-    __sync_fetch_and_sub(&num_running_contexts_m, 1);
     return true;
 }
 
@@ -365,7 +355,9 @@ void EventCallback(cl_event event, cl_int exec_status, void *user_data)
 
 bool Kernel::AddCallback(void *user_data, uint32_t context_idx)
 {
-    if (num_running_contexts_m == 0)  return false;
+    if (event_m[context_idx] == nullptr)
+        return false;
+
     return clSetEventCallback(event_m[context_idx], CL_COMPLETE, EventCallback,
                               user_data) == CL_SUCCESS;
 }
index 7b627e619ce5efce189798fd11e612a9b5b7584c..773a27e3900fce547e9ebd42c17cd5d9987effeb 100644 (file)
@@ -145,7 +145,7 @@ class Kernel
 
         bool UpdateScalarArg(uint32_t index, size_t size, const void *value);
         Kernel& RunAsync(uint32_t context_idx = 0);
-        bool Wait(float *host_elapsed_ms = nullptr, uint32_t context_idx = 0);
+        bool Wait(uint32_t context_idx = 0);
         bool AddCallback(void *user_data, uint32_t context_idx = 0);
 
     private:
@@ -156,7 +156,6 @@ class Kernel
 
         Device*             device_m;
         uint8_t             device_index_m;
-        uint32_t            num_running_contexts_m;
 };
 
 
index 711b289ec4977a952f8b9979151fe0a8dc4387b9..2c0b3727b33fbad09058836f2733c54e28a1e0a3 100644 (file)
@@ -64,12 +64,24 @@ TimeStamp::TimeStamp(const std::string& file, int num_entries):
     std::memset(entries_m, 0, sizeof(Entry)*num_entries_m);
 }
 
-void TimeStamp::Update(int frame_idx, EventKind k)
+void TimeStamp::Update(int frame_idx, EventKind k, int type, int id)
 {
     int idx = frame_idx % num_entries_m;
     entries_m[idx].frame_idx = frame_idx;
     entries_m[idx].timestamp[k] = duration_cast<microseconds>
                  (high_resolution_clock::now().time_since_epoch()).count();
+
+    if (k == EO1_PFSA_START)
+    {
+        entries_m[idx].eo1_id   = id;
+        entries_m[idx].eo1_type = type;
+    }
+    else if (k == EO2_PFSA_START)
+    {
+        entries_m[idx].eo2_id   = id;
+        entries_m[idx].eo2_type = type;
+    }
+
 }
 
 TimeStamp::~TimeStamp()
@@ -103,7 +115,21 @@ TimeStamp::~TimeStamp()
 
                     default:             ofs << "UNKNOWN"; break;
                 }
-                ofs << "," << entries_m[i].timestamp[j] << std::endl;
+                ofs << "," << entries_m[i].timestamp[j];
+
+                if (j == EO1_PFSA_START)
+                {
+                    ofs << "," << entries_m[i].eo1_type
+                        << "," << entries_m[i].eo1_id;
+                }
+                else if (j == EO2_PFSA_START)
+                {
+                    ofs << "," << entries_m[i].eo2_type
+                        << "," << entries_m[i].eo2_id;
+                }
+
+
+                ofs << std::endl;
             }
 
     ofs.close();
@@ -112,13 +138,15 @@ TimeStamp::~TimeStamp()
 }
 
 
-void tidl::RecordEvent(int frame_idx, TimeStamp::EventKind k)
+void tidl::RecordEvent(int frame_idx, TimeStamp::EventKind k,
+                       int eo_type, int eo_id)
 {
     TimeStamp* t = tidl_api_timestamps.get();
     if (t)
-        t->Update(frame_idx, k);
+        t->Update(frame_idx, k, eo_type, eo_id);
 }
 
+
 std::size_t tidl::GetBinaryFileSize(const std::string &F)
 {
     std::ifstream is;
index d40d4c9be780b676b14287e3934f9c7181c22f1f..368c2c552f5984c408b44f0e4d7747e6393a8a71 100644 (file)
@@ -53,6 +53,10 @@ class TimeStamp
                          NUM_EVENTS };
         struct Entry
         {
+            unsigned int       eo1_type;
+            unsigned int       eo1_id;
+            unsigned int       eo2_type;
+            unsigned int       eo2_id;
             unsigned long long frame_idx;
             unsigned long long timestamp[EventKind::NUM_EVENTS];
         };
@@ -60,7 +64,7 @@ class TimeStamp
 
         TimeStamp(const std::string& file, int num_entries);
         ~TimeStamp();
-        void Update(int frame_idx, EventKind k);
+        void Update(int frame_idx, EventKind k, int type=0, int id=0);
 
     private:
         Entry*            entries_m;
@@ -69,6 +73,7 @@ class TimeStamp
 };
 
 
-void RecordEvent(int frame_idx, TimeStamp::EventKind k);
+void RecordEvent(int frame_idx, TimeStamp::EventKind k,
+                 int eo_type=0, int eo_id=0);
 
 } // namespace tidl
diff --git a/viewer/execution_graph.py b/viewer/execution_graph.py
new file mode 100755 (executable)
index 0000000..bc320e2
--- /dev/null
@@ -0,0 +1,519 @@
+#!/usr/bin/env python
+
+# Copyright (c) 2018 Texas Instruments Incorporated - http://www.ti.com/
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+# * Neither the name of Texas Instruments Incorporated nor the
+# names of its contributors may be used to endorse or promote products
+# derived from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+# THE POSSIBILITY OF SUCH DAMAGE.
+
+""" Frame Execution Graph
+
+This script parses timestamp output generated from the TIDL API and
+displays a graph of frame execution.
+
+The script requires `matplotlib` to be installed.
+"""
+
+import argparse
+import matplotlib.pyplot as mpyplot
+import matplotlib.patches as mpatch
+
+USAGE = """
+    Display frame execution using trace data generated by the TIDL API.
+"""
+
+# Supported TIDL API classes
+COMPONENTS = ('eop', 'eo1', 'eo2')
+
+# APIs with timestamps
+# ProcessFrameStartAsync, ProcessFrameWait, RunAsyncNext
+KEY_PFSA = 'pfsa'
+KEY_PFW = 'pfw'
+KEY_RAN = 'ran'
+
+KEY_START = 'start'
+KEY_END = 'end'
+
+BARH_COLORS = ('lightgray', 'green', 'blue', 'yellow',
+               'black', 'orange', 'red', 'cyan')
+
+BARH_LEGEND_STR = {'eop' : 'ExecutionObjectPipeline',
+                   'eo1' : 'ExecutionObject 0',
+                   'eo2' : 'ExecutionObject 1',
+                   'pfw' : 'Process Frame Wait',
+                   'pfsa': 'Process Frame Start Async',
+                   'ran' : 'Run Async Next',
+                   'total': 'Total frame time'}
+
+class Range:
+    """ Defines a range in terms of start and stop timestamps. """
+
+    def __init__(self):
+        self.start = 0
+        self.end = 0
+
+    def set_start(self, start):
+        """Set the start time."""
+
+        self.start = start
+
+    def set_end(self, end):
+        """ Set the end time."""
+
+        self.end = end
+
+    def subtract(self, val):
+        """ Subtracl val from each start/end."""
+
+        self.start -= val
+        self.end -= val
+
+    def get_range(self):
+        """Return (start, duration)."""
+
+        return (self.start, self.end-self.start)
+
+
+class FrameInfo:
+    """ All recorded events for a single frame. """
+
+    def __init__(self, index):
+        """Set up dictionaries to store timestamp data"""
+
+        self.index = index
+        self.eo_type = {}
+        self.eo_id = {}
+
+        self.data = {}
+        for component in COMPONENTS:
+            self.data[component] = {}
+
+    def update(self, component, api, phase, val):
+        """Update the [c][api][phase] with timestamp"""
+
+        if component not in COMPONENTS:
+            print('Invalid component: {}'.format(component))
+            return
+
+        if api not in self.data[component]:
+            self.data[component][api] = Range()
+
+        if phase == KEY_START:
+            self.data[component][api].set_start(val)
+        elif phase == KEY_END:
+            self.data[component][api].set_end(val)
+        else:
+            raise Exception('Invalid key: {}'.format(phase))
+
+    def update_eo_info(self, component, eo_type, eo_id):
+        """Set the Execution Object type and index. Used to generate
+           the EVEx or DSPx labels"""
+
+        self.eo_type[component] = eo_type
+        self.eo_id[component] = eo_id
+
+    def eo_info(self):
+        """Return a string corresponding to the EO info. Device numbering in
+           TRM starts at 1. E.g. EVE1, EVE2, DSP1, DSP2 etc.
+        """
+
+        device_list = []
+        for component in ('eo1', 'eo2'):
+            device = ""
+            if not self.data[component]:
+                continue
+
+            # Corresponds to DeviceType enum in inc/executor.h
+            if self.eo_type[component] == 0:
+                device += 'DSP'
+            else:
+                device += 'EVE'
+
+            device += str(self.eo_id[component]+1)
+
+            device_list.append(device)
+
+        return '+'.join(device_list)
+
+    def get_range(self, component, api):
+        """Return the range for the specified component:api combination"""
+
+        if api in self.data[component]:
+            return self.data[component][api].get_range()
+
+        return None
+
+    def get_max_range(self):
+        """Return a tuple corresponding to the maximum range"""
+
+        return (self.min(), self.max() - self.min())
+
+    def get_total(self, component):
+        """Return the range for the specified component"""
+
+        if not self.data[component]:
+            print("{} not available".format(component))
+            return None
+
+        start = self.data[component][KEY_PFSA].start
+        end = self.data[component][KEY_PFW].end
+        return (start, int(end-start))
+
+    def min(self):
+        """ Return the lowest timestamp for a frame"""
+
+        vals = []
+        for component in self.data:
+            for api in self.data[component]:
+                vals.append(self.data[component][api].start)
+
+        return min(vals)
+
+    def max(self):
+        """ Return the highest timestamp for a frame"""
+        vals = []
+        for component in self.data:
+            for api in self.data[component]:
+                vals.append(self.data[component][api].end)
+
+        return max(vals)
+
+    def subtract(self, val):
+        """Subtract val from every timestamp in this frame. Use to
+           adjust start of frame 0 to 0.
+        """
+
+        for component in self.data:
+            for api in self.data[component]:
+                self.data[component][api].subtract(val)
+
+    def get_plot_ranges(self, components):
+        """ Return a list of (component, range) tuples for all api data
+            available for the specified components.
+        """
+
+        ranges = []
+        for component in self.data:
+            if component not in components:
+                continue
+
+            for api in self.data[component]:
+                range_tuple = self.get_range(component, api)
+                if range_tuple is not None:
+                    label = component + ':' + api
+                    ranges.append((label, range_tuple))
+
+        # Reverse sort by duration (improves graph rendering)
+        ranges.sort(key=lambda kv: kv[1][1], reverse=True)
+
+        return ranges
+
+    def get_plot_totals(self):
+        """ Return (component, range) tuples for all available components.
+            The range corresponds to the execution time for the entire
+            component.
+        """
+
+        ranges = []
+        for component in self.data:
+            range_tuple = self.get_total(component)
+            if range_tuple is not None:
+                ranges.append((component, range_tuple))
+
+        # Sort by start time stamp
+        ranges.sort(key=lambda kv: kv[1][0])
+
+        return ranges
+
+    def get_barh_ranges(self, found, verbosity):
+        """Return a list of range tuples for plotting. Also return
+           corresponding labels"""
+
+        label_range_tuples = []
+
+        if verbosity == 0:
+            if not found['eo2']:
+                if found['eop']:
+                    component = 'eop'
+                else:
+                    component = 'eo1'
+                label_range_tuples.append(('total', self.get_max_range()))
+                label_range_tuples.extend(self.get_plot_ranges((component)))
+            else:
+                label_range_tuples = self.get_plot_totals()
+
+        elif verbosity == 1:
+            label_range_tuples.append(('total', self.get_max_range()))
+            for component in COMPONENTS:
+                label_range_tuples.extend(self.get_plot_ranges(component))
+
+        labels = [i[0] for i in label_range_tuples]
+        ranges = [i[1] for i in label_range_tuples]
+
+        return ranges, labels
+
+    def __repr__(self):
+        string = '<FI:'
+
+        # Sort components based in order in COMPONENTS
+        components = list(self.data.keys())
+        components.sort(key=COMPONENTS.index)
+        #components.sort(key=lambda kv: COMPONENTS.index(kv))
+
+        for component in components:
+            if not self.data[component]:
+                continue
+
+            string += ' {} ['.format(component)
+
+            # Sort Ranges by start values
+            items = sorted(self.data[component].items(),
+                           key=lambda kv: kv[1].start)
+            for api in items:
+                string += '{}: {} '.format(api[0],
+                                           self.get_range(component, api[0]))
+            string += ']'
+
+        string += '>'
+
+        return string
+
+
+class Frames:
+    """ Per-frame data across a set of frames """
+
+    def __init__(self, v):
+        self.trace_data = {}
+        self.found = {}
+        for component in COMPONENTS:
+            self.found[component] = False
+        self.verbosity = v
+
+    def update(self, index, key, val):
+        """ Update event data for a frame """
+
+        # Canonicalize key to lower case
+        component, api, phase = key.split(':')
+
+        if index not in self.trace_data:
+            self.trace_data[index] = FrameInfo(index)
+
+        self.trace_data[index].update(component, api, phase, val)
+
+        self.found[component] = True
+
+    def adjust_to_zero(self):
+        """ Adjust timestamp sequence to start at 0"""
+
+        # Get a sorted list of frame indices
+        indices = self.indices()
+
+        # Find the smallest timestamp in the 0th frame
+        min_val = self.trace_data[indices[0]].min()
+
+        # Use this value to adjust all the timestamps
+        for i in indices:
+            self.trace_data[i].subtract(min_val)
+
+    def len(self):
+        """Returns the number of frames available."""
+
+        return len(self.trace_data)
+
+    def indices(self):
+        """ Returns a sorted list of frame indices."""
+
+        indices = list(self.trace_data.keys())
+        indices.sort()
+        return indices
+
+    def min_max_indices(self):
+        """ Returns the minimum and maximum frame index."""
+
+        indices = list(self.trace_data.keys())
+        return (min(indices), max(indices))
+
+    def microseconds_per_frame(self):
+        """Returns the average microseconds to execute a frame."""
+
+        # Get a sorted list of frame indices
+        indices = self.indices()
+
+        num_frames = len(indices)
+
+        # Find the smallest timestamp in the 0th frame
+        min_val = self.trace_data[indices[0]].min()
+
+        # Find the largest timestamp in the last frame
+        max_val = self.trace_data[indices[-1]].max()
+
+        return (max_val - min_val)/num_frames
+
+
+    def __repr__(self):
+        string = 'Frames:\n'
+        indices = self.indices()
+        for i in indices:
+            string += str(self.trace_data[i])
+            string += "\n"
+
+        return string
+
+
+def read_data(args):
+    """ Read a sequence of trace data and create a Frames object
+        Each row has the following syntax: frame_index, key, value
+        There is a row for each event that is recorded for a frame
+        E.g.
+        48,EOP:PFSA:Start,1540246078613202
+    """
+
+    frames = Frames(args.verbosity)
+
+    with open(args.input_file) as lines:
+        for line in lines:
+            info = line.rstrip().split(',')
+            if (len(info) != 3 and len(info) != 5):
+                continue
+
+            frame_index = int(info[0])
+            data_key = info[1].lower()
+
+            frames.update(index=frame_index, key=data_key, val=int(info[2]))
+
+            if len(info) == 5:
+                component = data_key.split(':')[0]
+                frames.trace_data[frame_index].update_eo_info(component,
+                                                              int(info[3]),
+                                                              int(info[4]))
+
+    frames.adjust_to_zero()
+
+    print('Found {} frames {} in timestamp data, {} microseconds per frame'.
+          format(frames.len(), frames.min_max_indices(),
+                 frames.microseconds_per_frame()))
+
+    return frames
+
+
+def insert_legend(labels, axes):
+    """ Create and insert a legend."""
+
+    bars = []
+    legend_strings = []
+
+    for label in labels:
+        index = labels.index(label)
+        # Create rectangles of the appropriate color for each label
+        bars.append(mpatch.Rectangle((0, 0), 1, 1, fc=BARH_COLORS[index]))
+
+        # Build a legend string from a ':' separated label string
+        legend_string = ""
+        for tags in label.split(':'):
+            legend_string += BARH_LEGEND_STR[tags] + ' '
+
+        legend_strings.append(legend_string)
+
+    # Insert the legend (mapping of colors to labels)
+    axes.legend(bars, legend_strings, loc='upper left')
+
+
+def plot(frames, filename):
+    """ Use matplotlib to plot the data."""
+
+    figure, axes = mpyplot.subplots()
+
+    y_labels = []
+    y_ticks = []
+    y_index = 0
+    y_increment = 5
+    legend_inserted = False
+
+    for frame_index in sorted(frames.trace_data):
+        frame = frames.trace_data[frame_index]
+        ranges, labels = frame.get_barh_ranges(frames.found, frames.verbosity)
+        axes.broken_barh(ranges,
+                         (y_index, 5),
+                         facecolors=BARH_COLORS,
+                         alpha=0.8)
+
+        if not legend_inserted:
+            insert_legend(labels, axes)
+            legend_inserted = True
+
+        # label string - overall time to execute the frame
+        bar_label = str(int(frame.max() - frame.min()))
+
+        # Add total time label
+        axes.annotate(bar_label,
+                      (frame.max(), y_index + 2),
+                      xytext=(5, 0),
+                      textcoords="offset points",
+                      fontsize=8,
+                      va='center',
+                      ha='left')
+
+        # Add EO info label
+        axes.annotate(frame.eo_info(),
+                      (frame.min(), y_index + 2),
+                      xytext=(-5, 0),
+                      textcoords="offset points",
+                      fontsize=6,
+                      va='center',
+                      ha='right')
+
+        y_index += y_increment
+        y_labels.append(str(frame_index))
+        y_ticks.append(y_index)
+
+    axes.set_xlabel('Microseconds')
+    axes.set_ylabel('Frames')
+    axes.set_yticks(y_ticks)
+    axes.set_yticklabels(y_labels, verticalalignment='center')
+    axes.grid(True)
+    axes.set_title(filename)
+
+    # Set a default size for the displayed figure
+    figure.set_size_inches(14, 8)
+
+    mpyplot.show()
+
+
+def main():
+    """ Parse arguments, read input into Frames and plot the data."""
+
+    parser = argparse.ArgumentParser(description=USAGE)
+    parser.add_argument("-v", "--verbosity", type=int, choices=[0, 1],
+                        default=0,
+                        help="Level of graph detail. 0->Summary, 1->Details")
+    parser.add_argument("input_file",
+                        default='trace.log',
+                        help='Path to trace log file')
+    args = parser.parse_args()
+
+
+    frames = read_data(args)
+    plot(frames, args.input_file)
+
+if __name__ == '__main__':
+    main()