Add TIDL_SUBGRAPH_NUM_EVES env var - Current subgraph implementation will initialize and use all available EVEs and DSPs, with streaming/batch inputs in mind. There are cases we only need 1 EVE and 1 DSP, for example, demonstrating subgraph offloading on a single input. This commit adds an environment variable, TIDL_SUBGRAPH_NUM_EVES, to specify number of EVEs used for subgraph inferencing. - MCT-1243
Merge branch 'release/v01.05.00'
Clean up required subgraph cfg file entries - Added environment variable TIDL_SUBGRAPH_DIR for locating the subgraph config files. - Updated documentation for subgraph runtime. - MCT-1227
replace 2 dsp + 2 group layer use cases with 1 dsp reference to PLSDK-3189. The BBAI only has enough CMEM for 4 EVEs, 1 DSP, and 2 group layers. In the case of all of our networks, the difference between 1 and 2 dsps is essentially nonexistent. The following is the benchmarks run side by side: CMDLINE: ./mcbench -g 1 -d 2 -e 4 -c ../test/testvecs/config/ CMDLINE: ./mcbench -g 1 -d 2 -e 4 -c ../test/testvecs/config/ Input: ../test/testvecs/input/preproc_0_224x224_multi.y frame Input: ../test/testvecs/input/preproc_0_224x224_multi.y frame Loop total time: 1189ms Loop total time: 1189ms FPS:42.06 FPS:42.06 mcbench PASSED mcbench PASSED CMDLINE: ./mcbench -g 1 -d 2 -e 4 -c ../test/testvecs/config/ CMDLINE: ./mcbench -g 1 -d 2 -e 4 -c ../test/testvecs/config/ Input: ../test/testvecs/input/preproc_0_224x224_multi.y frame Input: ../test/testvecs/input/preproc_0_224x224_multi.y frame Loop total time: 3066ms Loop total time: 3066ms FPS:16.31 FPS:16.31 mcbench PASSED mcbench PASSED CMDLINE: ./mcbench -g 2 -d 1 -e 4 -c ../test/testvecs/config/ | CMDLINE: ./mcbench -g 2 -d 2 -e 4 -c ../test/testvecs/config/ Input: ../test/testvecs/input/preproc_2_224x224_multi.y frame Input: ../test/testvecs/input/preproc_2_224x224_multi.y frame Loop total time: 1822ms | Loop total time: 1835ms FPS:27.44 | FPS:27.24 mcbench PASSED mcbench PASSED CMDLINE: ./mcbench -g 2 -d 1 -e 4 -c ../test/testvecs/config/ | CMDLINE: ./mcbench -g 2 -d 2 -e 4 -c ../test/testvecs/config/ Input: ../test/testvecs/input/preproc_2_224x224_multi.y frame Input: ../test/testvecs/input/preproc_2_224x224_multi.y frame Loop total time: 1823ms | Loop total time: 1841ms FPS:27.42 | FPS:27.16 mcbench PASSED mcbench PASSED CMDLINE: ./mcbench -g 2 -d 1 -e 4 -c ../test/testvecs/config/ | CMDLINE: ./mcbench -g 2 -d 2 -e 4 -c ../test/testvecs/config/ Input: ../test/testvecs/input/preproc_2_224x224_multi.y frame Input: ../test/testvecs/input/preproc_2_224x224_multi.y frame Loop total time: 1793ms | Loop total time: 1817ms FPS:27.89 | FPS:27.52 mcbench PASSED mcbench PASSED CMDLINE: ./mcbench -g 2 -d 1 -e 4 -c ../test/testvecs/config/ | CMDLINE: ./mcbench -g 2 -d 2 -e 4 -c ../test/testvecs/config/ Input: ../test/testvecs/input/preproc_0_224x224_multi.y frame Input: ../test/testvecs/input/preproc_0_224x224_multi.y frame Loop total time: 4269ms | Loop total time: 4285ms FPS:11.71 | FPS:11.67 mcbench PASSED mcbench PASSED CMDLINE: ./mcbench -g 2 -d 1 -e 4 -c ../test/testvecs/config/ | CMDLINE: ./mcbench -g 2 -d 2 -e 4 -c ../test/testvecs/config/ Input: ../test/testvecs/input/preproc_0_224x224_multi.y frame Input: ../test/testvecs/input/preproc_0_224x224_multi.y frame Loop total time: 892.9ms | Loop total time: 915ms FPS:55.99 | FPS:54.64 mcbench PASSED mcbench PASSED CMDLINE: ./mcbench -g 2 -d 1 -e 4 -c ../test/testvecs/config/ | CMDLINE: ./mcbench -g 2 -d 2 -e 4 -c ../test/testvecs/config/ Input: ../test/testvecs/input/preproc_0_224x224_multi.y frame Input: ../test/testvecs/input/preproc_0_224x224_multi.y frame Loop total time: 2008ms | Loop total time: 2014ms FPS:24.9 | FPS:24.82 mcbench PASSED mcbench PASSED
Merge branch 'release/v01.04.00'
mcbench: Add test cases for AM5729 - Add one line of comment in each script, to indicate SoC used with specific the script - Add all_5729.sh, script with benchmarking test cases for AM5729 device, 2xDSP+4xEVE - PLSDK-3140 Signed-off-by: Djordje Senicic <x0157990@ti.com>
Subgraph: use Layer2Group map in config file - If Layer2Group map exists in subgraph config file, use it. Otherwise, try derive the map from network layer types. - Added TidlFreeSubgraph() for subgraph resource de-allocation - Code changes based on review comments. - MCT-1223
Subgraph example: multi-threaded batch processing - Compared different batch size in subgraph execution example - Compared async/future implementation vs thread pool implementation, async/future has slightly worse (~1%) performance, but it is much easier to program - Recommended inference is multi-threaded batch processing, where batch_size can be obtained from TidlGetPreferredBatchSize(), number of threads can be set to 2. - MCT-1223
Parse data conversion info from subgraph config - MCT-1224
Subgraph: support batch processing - MCT-1223