diff options
author | Bob Wilson <bob.wilson@apple.com> | 2014-06-17 00:45:30 +0000 |
---|---|---|
committer | Bob Wilson <bob.wilson@apple.com> | 2014-06-17 00:45:30 +0000 |
commit | a6264641edbd5cca1651e8117c3e0666e3757118 (patch) | |
tree | 6f89a20da9f5cf98d0f06ef238b56e86dae62842 /docs/UsersManual.rst | |
parent | a3bb0d84f3d9ace1ed333746550d0afcb8ddcd0a (diff) | |
download | clang-a6264641edbd5cca1651e8117c3e0666e3757118.tar.gz clang-a6264641edbd5cca1651e8117c3e0666e3757118.tar.bz2 clang-a6264641edbd5cca1651e8117c3e0666e3757118.tar.xz |
Add documentation for PGO with instrumentation to clang's User's Manual.
<rdar://problem/16771671>
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@211085 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs/UsersManual.rst')
-rw-r--r-- | docs/UsersManual.rst | 89 |
1 files changed, 78 insertions, 11 deletions
diff --git a/docs/UsersManual.rst b/docs/UsersManual.rst index 63154a8d7b..2460e94664 100644 --- a/docs/UsersManual.rst +++ b/docs/UsersManual.rst @@ -1119,8 +1119,29 @@ are listed below. only. This only applies to the AArch64 architecture. -Using Sampling Profilers for Optimization ------------------------------------------ +Profile Guided Optimization +--------------------------- + +Profile information enables better optimization. For example, knowing that a +branch is taken very frequently helps the compiler make better decisions when +ordering basic blocks. Knowing that a function ``foo`` is called more +frequently than another function ``bar`` helps the inliner. + +Clang supports profile guided optimization with two different kinds of +profiling. A sampling profiler can generate a profile with very low runtime +overhead, or you can build an instrumented version of the code that collects +more detailed profile information. Both kinds of profiles can provide execution +counts for instructions in the code and information on branches taken and +function invocation. + +Regardless of which kind of profiling you use, be careful to collect profiles +by running your code with inputs that are representative of the typical +behavior. Code that is not exercised in the profile will be optimized as if it +is unimportant, and the compiler may make poor optimization choices for code +that is disproportionately used while profiling. + +Using Sampling Profilers +^^^^^^^^^^^^^^^^^^^^^^^^ Sampling profilers are used to collect runtime information, such as hardware counters, while your application executes. They are typically @@ -1128,14 +1149,6 @@ very efficient and do not incur a large runtime overhead. The sample data collected by the profiler can be used during compilation to determine what the most executed areas of the code are. -In particular, sample profilers can provide execution counts for all -instructions in the code and information on branches taken and function -invocation. The compiler can use this information in its optimization -cost models. For example, knowing that a branch is taken very -frequently helps the compiler make better decisions when ordering -basic blocks. Knowing that a function ``foo`` is called more -frequently than another function ``bar`` helps the inliner. - Using the data from a sample profiler requires some changes in the way a program is built. Before the compiler can use profiling information, the code needs to execute under the profiler. The following is the @@ -1195,7 +1208,7 @@ usual build cycle when using sample profilers for optimization: Sample Profile Format -^^^^^^^^^^^^^^^^^^^^^ +""""""""""""""""""""" If you are not using Linux Perf to collect profiles, you will need to write a conversion tool from your profiler to LLVM's format. This section @@ -1279,6 +1292,60 @@ d. [OPTIONAL] Potential call targets and samples. If present, this with ``baz()`` being the relatively more frequently called target. +Profiling with Instrumentation +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Clang also supports profiling via instrumentation. This requires building a +special instrumented version of the code and has some runtime +overhead during the profiling, but it provides more detailed results than a +sampling profiler. It also provides reproducible results, at least to the +extent that the code behaves consistently across runs. + +Here are the steps for using profile guided optimization with +instrumentation: + +1. Build an instrumented version of the code by compiling and linking with the + ``-fprofile-instr-generate`` option. + + .. code-block:: console + + $ clang++ -O2 -fprofile-instr-generate code.cc -o code + +2. Run the instrumented executable with inputs that reflect the typical usage. + By default, the profile data will be written to a ``default.profraw`` file + in the current directory. You can override that default by setting the + ``LLVM_PROFILE_FILE`` environment variable to specify an alternate file. + Any instance of ``%p`` in that file name will be replaced by the process + ID, so that you can easily distinguish the profile output from multiple + runs. + + .. code-block:: console + + $ LLVM_PROFILE_FILE="code-%p.profraw" ./code + +3. Combine profiles from multiple runs and convert the "raw" profile format to + the input expected by clang. Use the ``merge`` command of the llvm-profdata + tool to do this. + + .. code-block:: console + + $ llvm-profdata merge -output=code.profdata code-*.profraw + + Note that this step is necessary even when there is only one "raw" profile, + since the merge operation also changes the file format. + +4. Build the code again using the ``-fprofile-instr-use`` option to specify the + collected profile data. + + .. code-block:: console + + $ clang++ -O2 -fprofile-instr-use=code.profdata code.cc -o code + + You can repeat step 4 as often as you like without regenerating the + profile. As you make changes to your code, clang may no longer be able to + use the profile data. It will warn you when this happens. + + Controlling Size of Debug Information ------------------------------------- |