1 /*
2 * This README contains instructions for using the profiling tools both
3 * manually using Code Composer Studio 6 and automatically using the bundled
4 * Debug Scripting Server tools. Scroll down to see the latter.
5 */
6 /*!
7 @page profCCS Project Profiling in CCS
8 @brief This guide steps through the manual profiling pipeline using the TI
9 Utils library and CCS
10 ##Introduction##
12 These instructions are for manually compiling and benchmarking the included
13 LLD example projects in the TI Processor Development Kit. This benchmark tool
14 uses the hardware clock of the PMU to measure the length of each task in
15 processor cycles with minimal overhead. This guide and scripts were written
16 for the AM5728 RTOS platform, but use standard C programming hooks that
17 may be adapted for any platform and/or processor architecture.
19 Notes:
20 - Functions with more than one entry point may not always map correctly with
21 their symbols and in the trace. This does not affect their child or parent
22 functions.
23 - Functions that are still on the stack at the breakpoint will be considered
24 closed at the last recorded timestamp for continuity.
25 - BIOS functions that are not referenced in the project or its library are
26 not accounted for by instrumentation and will not appear in the report.
27 - The python script used to tabulate the instrumentation logs depends on the
28 file readelf.py, which must be in the same directory.
29 - Depending on your optimization level, some functions may be optimized out
30 and/or may not appear on certain platforms, including:
31 - Empty or single-line functions
32 - ti_sysbios_* functions
33 - SemaphoreP_* functions
35 ###Part I: Project Setup###
36 -# Download and install the TI RTOS Processor SDK for AM572x or the desired
37 product.
38 - The installer for the AM572x can be found at: http://www.ti.com/tool/PROCESSOR-SDK-AM572X. All other platform SDKs can be found at: http://www.ti.com/lsds/ti/tools-software/sw_portal.page
39 -# In CCS. select the desired project in the Project Explorer, and open its
40 *.cfg file in a text editor.
41 -# Any LLDs in use must have be set to enable profiling in order to be
42 profiled at runtime. This is done by setting enableProfiling to true in the
43 LLD package. For example, in GPIO, this would be:
44 @code
45 /* Load the GPIO package */
46 var Gpio = xdc.loadPackage('ti.drv.gpio');
47 Gpio.Settings.enableProfiling = true;
48 @endcode
49 Otherwise the time elapsed by their functions will count against the caller.
50 -# In the same file, add the following line so that the profiling library is
51 included in your project as well:
52 @code
53 var Profiling = xdc.loadPackage('ti.utils.profiling');
54 @endcode
55 -# Under the Project Explorer, right-click the desired project and select
56 Properties > CCS Build > "Set Additional Flags..." and add the flags for the
57 desired platform:
58 - ARM: `-finstrument-functions -gdwarf-3 -g`
59 - DSP: `--entry_parm=address --exit_hook=ti_utils_exit --exit_parm=address --entry_hook=ti_utils_entry -g`
60 - M4: `--entry_parm=address --exit_hook=ti_utils_exit --exit_parm=address --entry_hook=ti_utils_entry -g`
61 -# Also ensure that the optimization flags (-O1, -O2, -O3...) reflect the flags
62 set in the desired program so that functions are instrumented consistently.
63 By default, example projects and new CCS projects are set to no optimization.
64 -# Close the project properties and right-click the project and select
65 "Rebuild Project" to compile.
67 ###Part II: Profiling Project Runtime###
68 -# Load the compiled program onto the desired target platform and run the
69 program to a desired breakpoint or time interval.
70 -# While the program is at the break, open the Memory Browser (View > Memory
71 Browser).
72 -# (Optional) In the search field, search for "elemlog" and ensure that the
73 log array has been populated (consists of sets of four values that begin with
74 either 00000000, 00000001, or 00000002).
75 -# Save a memory snapshot by clicking "Save", and in the popup, choosing a
76 filename and location, and setting the start address to "elemlog" and length
77 to "log_idx*4".
79 ###Part III: Post-processing the Profiling Log###
80 -# Open a command prompt window set to the directory of the
81 "decodeProfileDump.py" Python script (typically under utils/profiling/scripts)
82 -# Assemble the processing command in the following format:
83 @code
84 python decodeProfileDump.py [log 1] [executable 1] [log 2] [executable 2] ...
85 @endcode
86 where the log is the profiling log memory dump created in Part II and the
87 executable is the *.out program.
88 -# Append any desired flags:
89 - -v Display verbose output
90 - -t Breakdown function totals by their reference sites
91 - -x Print the tabulated results into a report.xlsx Excel file
92 - -csv Print the tabulated results into a report.csv file
93 - -h Print a histogram of the results (shown in the rightmost columns of the output)
94 - -off N Manual instrumentation offset of N cycles, subtracted from each function.
95 Note: The instrumentation program already generates an offset from itself that is subtracted from the function times. Use this flag only if there is an additional offset you would like to subtract.
97 ###Part IV: Understanding the Output###
98 Term | Meaning
99 ---------------|----------------------------------------------
100 Function | The name of the function that was instrumented
101 Referenced_By | The call site of the function instrumented
102 Total_Cycles | The number of processor cycles elapsed for the function instrumented, both inclusively (inc), including the cycles of its child functions within, and exclusively (exc), excluding the cycles of its child functions
103 Average_Cycles | The number of processor cycles elapsed for the function instrumented per reference, both inclusively and exclusively
104 Total_Calls | The number of internal, child functions referenced by the function that are part of the program or its library
105 Average_Calls | The number of internal, child functions referenced by the function per reference
106 Iterations | The number of times the function instrumented was referenced
108 -# If the histogram flag was set, the histogram is written in the ten columns following the measurements. These columns account for every iteration of the instrumented function, and are followed by its high, low, and bin size values, in processor cycles.
109 -# If the histogram flag was set, the last column includes the high outlying reference that used an disproportionate number of processor cycles compared to the other function references, including its file location.
110 -# The text file (generated by default) will also contain a visual trace of the results below the table, for each function reference and its measured cycle count.
111 */
113 /*!
114 @page profDSS Automated Profiling with DSS
116 @brief This guide steps through the automated profiling pipeline using the TI
117 Utils library and DSS, using the loadti script
119 ##Introduction##
121 These instructions are for benchmarking the included LLD example projects in
122 the TI Processor Development Kit using the loadti script. This benchmark tool
123 uses the hardware clock of the PMU to measure the length of each task in
124 processor cycles with minimal overhead. This guide and scripts were written
125 for the AM5728 RTOS platform, but use standard C programming hooks that
126 may be adapted for any platform and/or processor architecture.
128 Notes:
129 - Functions with more than one entry point may not always map correctly with
130 their symbols and in the trace. This does not affect their child or parent
131 functions.
132 - Functions that are still on the stack at the breakpoint will be considered
133 closed at the last recorded timestamp for continuity.
134 - BIOS functions that are not referenced in the project or its library are
135 not accounted for by instrumentation and will not appear in the report.
136 - The python script used to tabulate the instrumentation logs depends on the
137 file readelf.py, which must be in the same directory.
138 - Depending on your optimization level, some functions may be optimized out
139 and/or may not appear on certain platforms, including:
140 - Empty or single-line functions
141 - ti_sysbios_* functions
142 - SemaphoreP_* functions
144 ###Part I: Project Setup###
145 -# Download and install the TI RTOS Processor SDK for AM572x or the desired
146 product.
147 - The installer for the AM572x can be found at: http://www.ti.com/tool/PROCESSOR-SDK-AM572X. All other platform SDKs can be found at: http://www.ti.com/lsds/ti/tools-software/sw_portal.page
148 -# In the desired project directory, open the project's *.cfg file in a text
149 editor.
150 -# Any LLDs in use must have be set to enable profiling in order to be
151 profiled at runtime. This is done by setting enableProfiling to true in the
152 LLD package. For example, in GPIO, this would be:
153 @code
154 /* Load the GPIO package */
155 var Gpio = xdc.loadPackage('ti.drv.gpio');
156 Gpio.Settings.enableProfiling = true;
157 @endcode
158 Otherwise the time elapsed by their functions will count against the caller.
159 -# In the same file, add the following line so that the profiling library is
160 included in your project as well:
161 @code
162 var Profiling = xdc.loadPackage('ti.utils.profiling');
163 @endcode
164 -# Locate the configuration file for your project (typically a *.text file)
165 and add the flags for the desired platform:
166 - ARM: `-finstrument-functions -gdwarf-3 -g`
167 - DSP: `--entry_parm=address --exit_hook=ti_utils_exit --exit_parm=address --entry_hook=ti_utils_entry -g`
168 - M4: `--entry_parm=address --exit_hook=ti_utils_exit --exit_parm=address --entry_hook=ti_utils_entry -g`
169 -# Also ensure that the optimization flags (-O1, -O2, -O3...) reflect the flags
170 set in the desired program so that functions are instrumented consistently.
171 By default, example projects and new CCS projects are set to no optimization.
172 -# Save these files and recompile your project.
174 ###Part II: Profiling Project Runtime###
175 -# If you have not already, locate the loadti directory. This is typically
176 located under:
177 @code
178 C:\ti\ccsv6\ccs_base\scripting\examples\loadti
179 @endcode
180 -# Depending on the version of CCS installed, loadti may need to be patched
181 so that its saveData function can evaluate expressions as well as static
182 integer addresses.
183 - If this is the case, a patched version of the memXfer.js script is
184 included in the utils/profiling/scripts directory
185 - Simply replace the memXfer.js file in the loadti directory with the
186 patched memXfer.js file in the profiling library
187 - Note: This will not break existing applications that use static
188 integer addresses
189 -# Load the desired program onto the desired target platform and run the
190 program to a desired breakpoint or time interval using this format:
191 @code
192 loadti -v -c=[config *.ccxml] -t [time interval] -msd="0,elemlog,[output *.txt],4*log_idx,1,false" [executable *.out]
193 @endcode
194 -# This will automatically run the run the program and dump the profiling log
195 into a specified text file for post-processing.
197 ###Part III: Post-processing the Profiling Log###
198 -# Open a command prompt window set to the directory of the
199 "decodeProfileDump.py" Python script (typically under utils/profiling/scripts)
200 -# Assemble the processing command in the following format:
201 @code
202 python decodeProfileDump.py [log 1] [executable 1] [log 2] [executable 2] ...
203 @endcode
204 where the log is the profiling log memory dump created in Part II and the
205 executable is the *.out program.
206 -# Append any desired flags:
207 - -v Display verbose output
208 - -t Breakdown function totals by their reference sites
209 - -x Print the tabulated results into a report.xlsx Excel file
210 - -csv Print the tabulated results into a report.csv file
211 - -h Print a histogram of the results (shown in the rightmost columns of the output)
212 - -off N Manual instrumentation offset of N cycles, subtracted from each function.
213 Note: The instrumentation program already generates an offset from itself that is subtracted from the function times. Use this flag only if there is an additional offset you would like to subtract.
215 ###Part IV: Understanding the Output###
216 Term | Meaning
217 ---------------|----------------------------------------------
218 Function | The name of the function that was instrumented
219 Referenced_By | The call site of the function instrumented
220 Total_Cycles | The number of processor cycles elapsed for the function instrumented, both inclusively (inc), including the cycles of its child functions within, and exclusively (exc), excluding the cycles of its child functions
221 Average_Cycles | The number of processor cycles elapsed for the function instrumented per reference, both inclusively and exclusively
222 Total_Calls | The number of internal, child functions referenced by the function that are part of the program or its library
223 Average_Calls | The number of internal, child functions referenced by the function per reference
224 Iterations | The number of times the function instrumented was referenced
226 -# If the histogram flag was set, the histogram is written in the ten columns following the measurements. These columns account for every iteration of the instrumented function, and are followed by its high, low, and bin size values, in processor cycles.
227 -# If the histogram flag was set, the last column includes the high outlying reference that used an disproportionate number of processor cycles compared to the other function references, including its file location.
228 -# The text file (generated by default) will also contain a visual trace of the results below the table, for each function reference and its measured cycle count.
229 */