Skip to content

ThisIsNotOfficialCodeItsJustForks/PEBIL

 
 

Repository files navigation

PEBIL (pebil) - README 

2.2	Binary Instrumentation Packages for Gathering Computational Trace

The PMaC Prediction Framework relies on binary instrumentation to gather
information about the computational work done during an application run. The
information in a computational trace is the floating point operation count,
memory operation count, and simulated cache hit rates (stored per basic block
for each level of the caches in the memory hierarchy) of the target machine(s).
Each basic block in an application is given a unique ID that is assigned to it
in a consistent manner during each instrumentation pass. The unique ID for every
basic block in the application is consistent as long as the executable is kept
the same . The simulated cache hit rates are computed on-the-fly during the
trace by inputing the address stream of the instrumented running application
into a cache simulator. The specifications for the cache or caches to be
simulated are specified in the CacheStructures.h and describe in more detail in
section 2.2.1.3.  To gather computation traces, PMaC provides two
instrumentation tools. PEBIL is a binary editing tool for X86/Linux systems
(described in this document) and PMaCInstrumentor is a binary editing tool for
IBM Power/AIX systems.

2.2.1	PEBIL 

PEBIL  is a binary instrumentation toolkit that operates on ELF binaries on
Linux for x86/x86_64 processors. PEBIL has a C++ API that provides the means to
inject code and data into a binary file. It also provides several
pre-implemented tools for basic block counting and cache simulation for a set of
memory hierarchies.  

2.2.1.1	Installation for Application Tracing Host Systems:

Any X86/Linux The source code for the PEBIL library and instrumentation
tools is available at through PMaC Laboratories by e-mailing help@pmaclabs.com.
The directory structure of distribution is as follows:

$ cd /home/userid/tools/PEBIL
$ ls
bin/   include/   instcode/    Makefile    scripts/    testapps/    tools/   docs/    etc/    lib/    src/ 

The include, src, tools, and instcode directories contain the source code for
instrumentation related libraries and instrumentation tools. The bin and lib
directories contain the executables and shared libraries after building the
distribution. The scripts directory contains additional PERL scripts to assist
application tracing for high level instrumentation activities.  After retrieving
the source, users can change to the top directory of the distribution and follow
the instructions in the INSTALL to build the distribution. After building the
executables and shared libraries, it is also advised to link or copy the scripts
in the scripts directory to the bin directory of the distribution so that the
scripts will be found in user's path as well. The following is an example of
this procedure:

$ cd /home/userid/tools/PEBIL
$ gmake clean
$ gmake
$ cd bin
$ ln -s ../scripts/selectSimBlocks.pl  selectSimBlocks.pl
$ ln -s ../scripts/traceHelper.pl traceHelper.pl

The Makefile at the top directory /home/userid/tools/PEBIL iterate over sub
directories and run the appropriate make commands. Alternatively, users can
follow the same steps manually. Instead of gmake at the top directory, the users
can run the following. Note that the order of making src should come before the
order of making tools directory. For an installed distribution to work, minimal
set of steps is to make src, tools and instcode directories.

$ cd /home/userid/tools/PEBIL
$ cd ./src
$ gmake
$ cd ../tools
$ gmake
$ cd ../instcode
$ gmake

Manually following the make steps is particularly useful when the default
configuration (configuration that comes with the source code) is not used for
building the instcode directory. There are various ways of building the code
under this directory and the available compilation macros are described in
details in Section 2.2.1.2. 

To use PEBIL, users need to include the bin directory of distribution to their
path.  Thus, if a particular distribution needs to be used, the user needs to
set the path variables accordingly before using that distribution as the name of
the executable and additional PERL scripts would be the same. When installed
properly, the PEBIL executable, pebil, should be in user's path. It can be
checked using the `which` command of Unix from a different directory (you
may need to source .cshrc/.tcshrc or .profile/.bashrc files for paths to be
adjusted). The --help option will print a brief usage information.

$ source ~/.cshrc
$ which pebil
/home/userid/tools/PEBIL/bin/pebil
$ pebil --help
   usage : pebil
        --typ (ide|dat|bbt|cnt|jbb|sim|csc)
        --app <executable_path>
        --inp <block_unique_ids>    <-- valid for sim/csc
        [--lib <shared_lib_topdir>]
        [--ext <output_suffix>]
        [--dtl]
        [--lpi]                                    <-- valid for sim/csc
        [--phs <phase_no>]            <-- valid for sim/csc
        [--dfp <pattern_file>]           <-- valid for sim/csc
        [--help]

Brief Descriptions for Options:
===============================
        --typ : required for all. Instrumentation type.
        --app : required for all. Executable path.
        --lib : optional for all. Instrumentation shared library top
                directory. default is $PEBIL_LIB
        --ext : optional for all. File extension for output files.
                default is (typ)inst, such as jbbinst for type jbb.
        --dtl : optional for all. Enable detailed .static file with line numbers
                and filenames. default is no details.
        --inp : required for sim/csc. File including list of block ids to instrument
        --lpi : optional for sim/csc. Loop level block inclusion for
                cache simulation. default is no.
        --phs : optional for sim/csc. Phase number. defaults to no phase,
                otherwise, .phase.N. is included in output file names
        --dfp : optional for sim/csc. dfpattern file. defaults to no dfpattern file,

2.2.1.2	Installation for MultiMAPS Tracing

Host Systems: Any X86/Linux In the framework, the MultiMAPS data for a target
system needs to be augmented with the cooresponding hit rates. To accomplish
this, MultiMAPS needs to be traced with the target system's cache structure.
The default PEBIL installation, which comes configured for application tracing,
uses address stream sampling for cache simulation. Rather than using every
address in the address stream of an application run, it uses only a subset of
addresses in the address stream and simulates them for the target memory
subsystems. This method of sampling the address stream has been used to reduce
the overhead of instrumentation.  For MultiMAPS tracing, however, the entire
address stream needs to be consumed for cache simulations, therefore sampling
should not be used. Hence, separate installations of PEBIL are needed for
tracing MultiMAPS for cache hit rates.  To install PEBIL for MultiMAPS tracing,
you need to edit the instcode/Makefile file under the PEBIL installation
directory (a different directory than the one installed for application tracing)
and repeat the installation process described in Section 2.2.1.1 for this new
directory. To install pebil for MultiMAPS tracing, user needs to replace the
EXTENDED_SAMPLING compilation macro in instcode/Makefile with the
NO_SAMPLING_MODE compilation macro. Assuming the source code for PEBIL for
MultiMAPS tracing is under /home/userid/mmaps user needs to:

$ cd /home/userid/mmaps/PEBIL
$ vi instcode/Makefile
### change extended sampling mode to no sampling mode
#EXTRA_CFLGS = -DEXTENDED_SAMPLING -DPER_SET_RECENT -DVICTIM_CACHE
EXTRA_CFLGS = -DNO_SAMPLING_MODE -DPER_SET_RECENT -DVICTIM_CACHE
$ gmake

2.2.1.3	Using a Different Set of Memory Hierarchies (Caches)

The PEBIL source is distributed with a set of memory hierarchies to be used in
cache simulation runs. However, it also is flexible enough to easily allow
a different set of memory hierarchies with the installation.  PEBIL provides
a PERL script under the scripts directory of the distribution,
scripts/generateCaches.pl to do this.  This script takes an input file of memory
hierarchy specifications for a set of systems and generates a C header file that
contains these same specifications as C code. The user can also specify the
stride system  to use from one of the caches given in the input file if stride
information will be collected. Otherwise, the default action of the script is to
add a 2MB direct cache that is used for gathering stride information.
Information on how to use the generateCaches.pl script can be seen by using the
--help flag. The following example shows the --help flag:

$ /home/userid /PEBIL/scripts/generateCaches.pl --help
   usage : scripts/generateCaches.pl
   [--sys-file <cache_desc_file>]  <-- defaults to CacheDescriptions.txt
   [--out-file <outfile>]                   <-- defaults to CacheStructures.h
   [--stride-sys <system idx>]       <-- defaults to 0, 2 MB direct cache
   [--nostd]                                    ? static allocation of cache structures and use of safe lib functions
   [--puniq]                                    ? print unique cache information.
   [--help]
   [--readme]

A sample input to generateCaches.pl script is given in the text box below.
Everything after �#� sign is assumed to be a comment and is ignored. Each
line in the input file defines a memory hierarchy by listing the sysid, number
of cache levels and the specifications of each cache. The sysid needs to be
greater than 0 and unique. It is used by the prediction database to deferentiate
different systems and cache structures. Each cache is defined by 4 attributes:
the cache size (which can be given in bytes or in KB or MBs), the associativity,
the line size in bytes and the replacement policy.  The replacement policy can
be lru, lru_vc, dir or ran where lru is the LRU pseudo implementation, lru_vc is
the LRU pseudo implementation for victim caches, dir is the direct addressed and
ran is a specific random replacement.

Note that each cache specification in the memory hierarchy description file
needs to be per computation unit (core or processors). For instance, if L1 is
private but L2 is shared between cores or processors, the L2 specification needs
to be given per core/processor. Since there is no easy way of dividing the
shared caches per computation unit, the simplest way is to divide caches evenly
among the sharing units.
 
The output of this script is a C header file that will be compiled into the
shared libraries under the instcode directory for the cache simulator rewriting
tools. So if the user wants to use different caches structures or memory
hierarchies for application and MultiMAPS tracing than the set of hierarchies
distributed with the source (very likely), then before installing PEBIL as
described in Sections 2.2.1.1 and 2.2.1.2 they need to create memory hierarchy
specifications and generate the C header file for those specifications using
scripts/generateCaches.pl.

For instance if the user wants to install new cache structures listed in a file
named MyCaches.txt, they would perform the following steps:

$ cd /home/userid /PEBIL/instcode
$ vi MyCaches,txt
  ### add your cache specifications to this file such as 
  # id level size  assoc  line  repl  size  assoc line repl  size  assoc line   repl
  1   2    256KB   8    128  lru   4MB     8    128  lru_vc
$ ../scripts/generateCaches.pl --sys-file ./MyCaches.txt
   Processing input file ---- MyCaches.txt
   Output file will be   ---- CacheStructures.h
  *** DONE *** SUCCESS *** SUCCESS *** SUCCESS *****************
$ gmake clean
$ gmake

Note that this script generates a file named CacheStructures.h by default since
CacheStructures.h is the name of the file used in the source code of the
instrumentation libraries. If you decide to use some other file as output,
please copy that file to CacheStructures.h in the instcode directory before
building the code in the instcode directory again. More importantly, since cache
structures are embedded into the PEBIL installation as source code, for each
different set of cache structures, a new installation of PEBIL is required. 

2.4	Tracing with PEBIL or PMaCInstrumentor

The computational traces for an application run are gathered with the aid of
binary instrumentation. During application tracing, a trace file for each task
in the application is gathered. Such a trace file includes cache hit rates
across the specified set of cache structures for each basic block that was
executed by that task during the application run. Since it is not practical to
perform cache simulation on all basic blocks in the application (there might be
hundreds of thousands), the computational traces are collected in two steps.
During the first step, the executable is instrumented to count the number
executions for each basic block (JBB Tracing). During the second step, only the
most frequently executed basic blocks are chosen to be instrumented with cache
simulation code (Cache Simulation). This means that to gather the computational
traces for an application, it needs to be run twice using two different
instrumented binaries. For the complete computational trace, all of the trace
files generated during these two steps need to be used by the PMaC Prediction
Framework.

The computational traces for an application are gathered by instrumenting the
application binary using either PMaCInstrumentor or PEBIL. To help and ease the
gathering of computational traces, these tools each provide a PERL script called
traceHelper.pl under the scripts directory (and under bin directory also if the
installation steps are followed). This script is designed to guide the user to
instrument executables and to collect the computational traces that are used to
generate performance predictions. The following two subsections describe the
steps of computational trace collection in details.

For the remainder of this section we will assume that pmacInst is installed at
/site/pmac_tools/Instrumentor, the application to be instrumented is compiled at
/home/userid/app and the name of the executable is app.exe that is under the
same directory, and the executable is run from the /home/userid/run directory. 

For pmacinst or pebil to be used for computation tracing, it inserts calls to
shared libraries from the lib directory of the installation into the executable
to dump the traces to the disk in addition to inserting the cache simulation
code. The traceHelper.pl script allows the user to specify the path to those
shared libraries using --inst_lib option every time it is executed . The value
for the --inst_lib option should point to the top directory of
PMaCInstrumentor/PEBIL installation, not the lib directory under the
installation.

2.4.1	JBB Tracing

JBB Tracing instruments the executable to insert a counter at every basic block.
Thus during the execution of the instrumented executable, it counts the number
of times each basic block is executed. For JBB tracing, user needs to instrument
the executable using traceHelper.pl and run the executable in the same fashion
as original. To instrument the executable for JBB tracing, the user can run the
following:

$ /site/pmac_tools/Instrumentor/scripts/traceHelper.pl 
                        --action jbbinst
                        --application app --dataset standard --cpu_count  64 
                        --exec_file /home/userid/app/app.exe
                        --pmacinst_dir $HOME/mytraces
                        --inst_lib /site/pmac_tools/PMaCInstrumentor

If successful, the traceHelper.pl script will create a directory, named
pmacTRACE, under $HOME/mytraces. All tracing-related files that are generated
will be copied to this directory. Under this directory, using the action type
given by --action option, traceHelper.pl will create subdirectories and save the
tracing related files including the instrumented executable and the static files
that are needed later for post-processing the traces. Here is an example layout
of the directories after this step.

$ ls $HOME/mytraces/pmacTRACE/
   jbbinst
$ ls $HOME/mytraces/pmacTRACE/jbbinst/
   app_standard_0064
$ ls $HOME/mytraces/pmacTRACE/jbbinst/app_standard_0064/
   App.exe.jbbinst         app.exe.jbbinst.static 

Note that the --exec_file option points to the path to the executable file and
the --pmacinst_dir option points to the top directory where all traces will be
collected.  The app.exe.jbbinst file under
$HOME/mytraces/pmacTRACE/jbbinst/app_standard_0064/ directory is the
instrumented executable and user needs to run this executable the same as the
original to generate basic block execution counts. If run successfully, it will
generate trace files in the run directory as follows (assuming the run directory
is /home/userid/run). The number of trace files generated should match the
number of tasks in the application run. Furthermore, the user should expect the
execution of the jbb-instrumented application to take a factor of 1.5-2.0 times
longer than the original application run.

$ ls /home/userid/run/*.meta*.jbbinst
  app.exe.meta_0000.jbbinst
  app.exe.meta_0001.jbbinst
  .............................................
  App.exe.meta_0064.jbbinst

If the trace files do not include the rank id as part of name as above
(0000,0001, �etc.) but include some other non-consecutive numbers (process
IDs), it indicates that PMaC timers are not linked in to the executable
properly. Make sure the times are inserted properly. Similarly, if the
instrumented run does not generate the trace files, that indicates the
termination of run was not recognized by the instrumentation code and the timers
need to be included. In both cases, the RankPid file in the run directory is not
expected to exist, indicating that the timers are not included properly. 

The user needs to collect these trace files under the mytraces directory by
running the following command:

$ /site/pmac_tools/Instrumentor/scripts/traceHelper.pl 
                        --action jbbcoll
                        --application app --dataset standard --cpu_count  64 
                        --exec_name app.exe
                        --jbb_trc_dir /home/userid/run
                        --pmacinst_dir $HOME/mytraces

If the script runs successfully, it will generate a subdirectory under
$HOME/mytraces/pmacTRACE with the action name. Under that directory, there will
be an application-specific directory to where the script will copy the trace
files.

$ ls $HOME/mytraces/pmacTRACE/
  jbbcoll  jbbinst
$ ls $HOME/mytraces/pmacTRACE/jbbcoll/
  app_standard_0064
$ ls $HOME/mytraces/pmacTRACE/jbbcoll/app_standard_0064/
  app.exe.meta_0000.jbbinst
  app.exe.meta_0001.jbbinst
  ......................................
  app.exe.meta_0064.jbbinst

Note that rather than a path to the executable, this command takes the
executable name using the --exec_name option. Note also that the --jbb_trc_dir
points to the directory where the *.meta*.jbbinst trace files are located.

2.4.2	Cache Simulation

The JBB tracing generates files that contain basic block execution counts. Since
it is not practical to instrument all basic blocks for cache simulation, PMaC
framework chooses a subset of the most frequently executed basic blocks that
cover only a certain percentile of all basic block execution counts. The user
does not need to know anything about how they are chosen as the traceHelper.pl
script automatically calls the selectSimBlocks.pl script to choose the blocks to
instrument for cache simulation. The next steps for computational trace
collection is to instrument the executable for cache simulation, run the
instrumented executable and then collect the cache simulation traces generated. 

The user needs to ensure that the following are true before continuing with this
step; First, the selectSimBlocks.pl script is his path (please refer to Section
2.1.1.1). Second, the target cache structures and memory hierarchies are
installed for cache simulation rather than the sample structures shipped by
default with the distribution (please refer to Section 2.1.1.3).

First, the user needs to instrument the executable for cache simulation. This
can be done by running the following command:

$ /site/pmac_tools/Instrumentor/scripts/traceHelper.pl 
        --action siminst
        --application app --dataset standard --cpu_count  64 
        --exec_file /home/userid/app/app.exe
        --jbb_trc_dir $HOME/mytraces/pmacTRACE/jbbcoll/app_standard_0064/
        --jbb_static $HOME/mytraces/pmacTRACE/jbbinst/app_standard_0064/app.exe.jbbinst.static 
        --phase_no 1 --phase_count 1
        --pmacinst_dir $HOME/mytraces
        --inst_lib /site/pmac_tools/Instrumentor

If successful, this command will instrument the executable with cache simulation
code and save the generated files under the $HOME/mytraces directory as follows.

$ ls $HOME/mytraces/pmacTRACE/
  jbbcoll  jbbinst  siminst
$ ls $HOME/mytraces/pmacTRACE/siminst/
app_standard_0064
$ ls $HOME/mytraces/pmacTRACE/siminst/app_standard_0064/
p01
$ ls $HOME/mytraces/pmacTRACE/siminst/app_standard_0064/p01/
  app.phase.1o1.0064.jbbinst.lbb         
  app.exe.phase.1.0064.siminst         
  app.exe.phase.1.0064.siminst.static

The app.exe.phase.1.0064.siminst file is the instrumented executable for cache
simulations. The user needs to run this executable the same way as the original
executable to generate traces for cache simulation. Note that unlike JBB
tracing, to instrument the executable for cache simulation, we also passed
number of phases (using the --phase_count option) and phase number (using the
--phase_no option) to the helper script. This is due to fact that cache
simulation is computationally expensive and may introduce slowdowns of 6 to
10-fold during the run of the instrumented executable. In the HPC resource being
used to run the instrumented executable, the upper limit in execution imposed by
the system may not be large enough to complete tracing of the application in
full. So users can divide the cache simulation into multiple phases, instrument
the executable to perform cache simulation for only those blocks selected for
a particular phase, then run each phase's instrumented executable. Note that the
pmacTRACE directory structures uses the phase number for saving the cache
simulation files generated (but most of the tracing we have conducted up to now
has required only 1 phase).

If the user runs the instrumented executable app.exe.phase.1.0064.siminst in
a manner similar to the original executable and if the run completes
successfully, it should generate trace files as follows. The number of trace
files should match the number of tasks in the application run.

$ ls /home/userid/run/*phase*.meta*.siminst
app.exe.phase.1.meta_0000.0064.siminst
app.exe.phase.1.meta_0001.0064.siminst
...................................................................
app.exe.phase.1.meta_0063.0064.siminst

If the trace files do not include the rank id as part of name as above
(0000,0001, and son on) but include some other non-consecutive numbers (process
IDs), it indicates that PMaC timers are not linked in to the executable properly
as described in Section4.1 . Make sure the times are inserted properly.
Similarly, if the instrumented run does not generate the trace files, that
indicates the termination of run was not recognized by the instrumentation code
and the timers need to be included. In both cases, the RankPid file in the run
directory is not expected to exist, indicating that the timers are not included
properly. 

After running the instrumented executable, the user needs to collect these
traces using the simcoll action (which is functionally similar to what the
jbbcoll action does for jbb tracing).

$ /site/pmac_tools/Instrumentor/scripts/traceHelper.pl 
                        --action simcoll
                        --application app --dataset standard --cpu_count  64 
                        --exec_name app.exe
                        --sim_trc_dir /home/userid/run
                        --pmacinst_dir $HOME/mytraces
                        --phase_no 1
Note that the --phase_no option is passed to the helper script to indicate which
phase's traces are being collected under the $HOME/mytraces/pmacTRACE directory.
If ran successfully, the trace files will be copied and the directory structure
will be.

$ ls $HOME/mytraces/pmacTRACE/
  jbbcoll  jbbinst  simcoll  siminst
$ ls $HOME/mytraces/pmacTRACE/simcoll/
  app_standard_0064
$ ls $HOME/mytraces/pmacTRACE/simcoll/app_standard_0064/
  p01
$ ls $HOME/mytraces/pmacTRACE/simcoll/app_standard_0064/p01/
  app.exe.phase.1.meta_0000.0064.siminst
  app.exe.phase.1.meta_0001.0064.siminst
  ......................................................................
  app.exe.phase.1.meta_0063.0064.siminst

After this step, all traces for the application for a particular dataset, size
and CPU count should be complete. For each application or dataset/size/CPU count
that will be traced for performance prediction, similar steps are required. The
same directory for the --pmacinst_dir option can be used  for the other
applications, datasets, sizes and CPU counts to collect all trace files
conveniently under one roof even though this is not a requirement. 

About

Fast static binary instrumentation for linux/x86

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 43.1%
  • C++ 39.1%
  • Shell 8.7%
  • Python 3.5%
  • Perl 3.2%
  • D 1.3%
  • Other 1.1%