IAS has a rudimentary profiler which you can use for profiling single or multi-threaded applications. The profiling is at the function (or trap) level, and can count self as well as self and children timings. IAS profiles functions in terms of instruction counts, since an instruction is the smallest quantum of execution it understands. The IAS profiler can do "self+children" kind of analyses at a function or block level, and has some controls to support multi-threading, etc.Several operations are available to control profiling, all of which are available programmatically as well as from the command line. The profiler supports creation of multiple profiles, up to the current limit of 32, with the intended use being one profile associated with each running thread, but more creative uses are always possible. Any one (or none) of these profiles may be active at a given time.
All the instructions executed during when a profile is active are charged to the profile. Each profile maintains track of the call stack during execution, and in turn charges instructions to the function whose code is currently being executed as well as to all functions below it in the call stack.
Profiling can be disabled at any time. In this state, the profiler continues to track call stack activity, but effectively does not count the instructions passing by. This is useful when you want specific parts of execution to be (not) timed.
The profiler tracks all call stack changes triggered by the instructions:
inv_*_quick
,inv_i/f
,call
,return0/1/2
,athrow
, andpriv_ret_from_trap
. The profiler assumes that the calls and returns are always in LIFO order for any given profile.Profiling can also be "frozen". This is meant for cases where software running on the chip knows that following the call stack using the usual means will confuse the profiler. For example, when a kernel uses
priv_ret_from_trap
multiple times while switching contexts.Another case where this is useful is when invoking a method using reflection, in which case dispatch may possibly be done without an explicit
invoke_*
instruction. In such cases, when software knows that profiler cannot follow the call stack accurately, it should freeze the profile and then unfreeze it in the same call stack state as when it was frozen. In this case,just disabling the profile is not sufficient.Note: A frozen profile may still be enabled. It may continue to count instructions. It just stops following call stack changes, and charges all instructions to the topmost block in the stack. A warning is printed if the profiler determines the call stack depth to go below 0, or above the maximum of 4096. In such cases the software must explicitly mark the entry into a new block.
As in most profilers, a program can control the timing of blocks within a function. A program is not limited to timing only whole functions. By inserting profiler commands around the block begin and end However, the program is then responsible for ensuring that all such block enters and exits happen in a LIFO manner. the name of all such blocks are simply listed as "USER_DEFINED_BLOCK" in the summary - we can later look into whether more flexibility is needed in naming user-defined blocks.
A profile should normally be "closed" before it's data is printed out. Closing a profile pops off the remaining stack frames, and completes attributing instruction counts to the various frames. You can print out a profile's data without closing it. However, the data will not reflect counts for functions still on the stack.
Note: Profile data is currently not part of the checkpointed state in IAS.
The runtime overhead on IAS performance due to profiling depends on the amount of activity up and down the call-chain. The overhead typically is less than 5% and should not be more than 10%. There is little to no overhead if profiling is not used even in a version containing profiling support.
You can compile IAS without any profiling code. Use:
undef
PROFILING
intools/dsv/sim/src/sim_config.h
There is some memory overhead per profile created (a few tens of KB per profile + a constant times the number of unique function blocks entered).The
dumpStats
command now also dumps out the state of all profiles. printing out the profile also gives other interesting information such as the maximum depth of the call stack.Names are assigned only for traps and Java functions. Since C code function names are not accessible, IAS prints the generic string "C-call" for C functions, along with the function address.
Profiles are identified by integer id's(0..N_MAX_PROFILES-1)
Profiles can only be created, not deleted.
Profiling commands Usage Action profile create
Create a new profile. profile freeze|unfreeze
Freeze or unfreeze a profile profile enable|disable
Enable or disable counting in a profile profile switch
Make a profile the "current" profile profile reset
Reset all data in a profile. This clears all data,
and marks it frozen and disabled, i.e. as if it had
been created.profile list
List available profiles, and their status profile markenter
Mark a block entry (in the current profile) profile markexit
Mark a block exit (in the current profile) profile close
Close a profile, total up counts for methods
still remaining in the stack.profile print
Print the statistics in a profile The same effect of controlling the profile can be achieved by an ncstore_word to the PROFILE_CMD_ADDR (0xffc0) with the following value:
Prior to this, the address
#define PROFILE_CREATE_CMD 1 #define PROFILE_RESET_CMD 2 #define PROFILE_FREEZE_CMD 3 #define PROFILE_UNFREEZE_CMD 4 #define PROFILE_ENABLE_CMD 5 #define PROFILE_DISABLE_CMD 6 #define PROFILE_SWITCH_CMD 7 #define PROFILE_PRINT_CMD 8 #define PROFILE_CLOSE_CMD 9 #define PROFILE_MARK_BLOCK_ENTER 20 #define PROFILE_MARK_BLOCK_EXIT 21
PROFILE_DATA_ADDR (0xffc4)
must have been written to with an ncstore_word, indicating the profile id that the command operates on. a profile id of (-1) signifies the "current" profile, whatever it's id.One can imagine a simple Profiler class which offers methods to perform each of these commands; application programs can then transparently make calls to create, enable, close, print profiles.<
ias % loadClass foo % disAsm foo main 0x0004779c b80004 invokestatic 0004 0x0004779f 3b istore_0 0x000477a0 1a iload_0 0x000477a1 08 iconst_5 0x000477a2 68 imul 0x000477a3 05 iconst_2 0x000477a4 60 iadd 0x000477a5 3c istore_1 0x000477a6 1b iload_1 0x000477a7 9e0005 ifle 000477ac 0x000477aa 03 iconst_0 0x000477ab ac ireturn 0x000477ac 05 iconst_2 0x000477ad ac ireturn % run2 0x4779c <---- run2 1st insn of main program % profile create 0 <---- create, unfreeze, enable profile % profile unfreeze 0 profile 0 is now unfrozen % profile enable 0 profile 0 is now enabled % profile switch 0 <---- make that the current profile switched to profile 0 % run2 0x477ab <---- run2 end of program % profile close 0 <---- close and print profile % profile print 0 Profiler hashtable: 3 function entry points registered, 3 non-empty buckets entries still on call stack: 1 max call stack depth seen: 2 Total number of instructions: 293 Instruction counts ------------------ Name PC #Calls Self+children Pct Avg Max Min Self Pct Avg Max Min foo.bar()I 0x0004775c 1 2 0.68% 2 2 2 2 0.68% 2 2 2 Program 0x0004779c 1 293 100.00% 293 293 293 12 4.10% 12 12 12 invokestatic 0x00027948 1 279 95.22% 279 279 279 279 95.22% 279 279 279 %
athrow's which are caught inside a trap frame cannot be handled.IAS