summaryrefslogtreecommitdiff
path: root/lib/Analysis
diff options
context:
space:
mode:
authorBenjamin Kramer <benny.kra@googlemail.com>2012-08-07 11:13:19 +0000
committerBenjamin Kramer <benny.kra@googlemail.com>2012-08-07 11:13:19 +0000
commitb6fdd022b7414f916ba7b08560a0f55e14863326 (patch)
tree48dcd9bf25a957fc9918f197f9c4303902332887 /lib/Analysis
parent961e1acfb275613679c0d00d4a0b4ed394b51a9d (diff)
downloadllvm-b6fdd022b7414f916ba7b08560a0f55e14863326.tar.gz
llvm-b6fdd022b7414f916ba7b08560a0f55e14863326.tar.bz2
llvm-b6fdd022b7414f916ba7b08560a0f55e14863326.tar.xz
PR13095: Give an inline cost bonus to functions using byval arguments.
We give a bonus for every argument because the argument setup is not needed anymore when the function is inlined. With this patch we interpret byval arguments as a compact representation of many arguments. The byval argument setup is implemented in the backend as an inline memcpy, so to model the cost as accurately as possible we take the number of pointer-sized elements in the byval argument and give a bonus of 2 instructions for every one of those. The bonus is capped at 8 elements, which is the number of stores at which the x86 backend switches from an expanded inline memcpy to a real memcpy. It would be better to use the real memcpy threshold from the backend, but it's not available via TargetData. This change brings the performance of c-ray in line with gcc 4.7. The included test case tries to reproduce the c-ray problem to catch regressions for this benchmark early, its performance is dominated by the inline decision of a specific call. This only has a small impact on most code, more on x86 and arm than on x86_64 due to the way the ABI works. When building LLVM for x86 it gives a small inline cost boost to virtually any function using StringRef or STL allocators, but only a 0.01% increase in overall binary size. The size of gcc compiled by clang actually shrunk by a couple bytes with this patch applied, but not significantly. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161413 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'lib/Analysis')
-rw-r--r--lib/Analysis/InlineCost.cpp30
1 files changed, 27 insertions, 3 deletions
diff --git a/lib/Analysis/InlineCost.cpp b/lib/Analysis/InlineCost.cpp
index a6bf4a86a9..bc1ecd2ea4 100644
--- a/lib/Analysis/InlineCost.cpp
+++ b/lib/Analysis/InlineCost.cpp
@@ -797,9 +797,33 @@ bool CallAnalyzer::analyzeCall(CallSite CS) {
FiftyPercentVectorBonus = Threshold;
TenPercentVectorBonus = Threshold / 2;
- // Subtract off one instruction per call argument as those will be free after
- // inlining.
- Cost -= CS.arg_size() * InlineConstants::InstrCost;
+ // Give out bonuses per argument, as the instructions setting them up will
+ // be gone after inlining.
+ for (unsigned I = 0, E = CS.arg_size(); I != E; ++I) {
+ if (TD && CS.isByValArgument(I)) {
+ // We approximate the number of loads and stores needed by dividing the
+ // size of the byval type by the target's pointer size.
+ PointerType *PTy = cast<PointerType>(CS.getArgument(I)->getType());
+ unsigned TypeSize = TD->getTypeSizeInBits(PTy->getElementType());
+ unsigned PointerSize = TD->getPointerSizeInBits();
+ // Ceiling division.
+ unsigned NumStores = (TypeSize + PointerSize - 1) / PointerSize;
+
+ // If it generates more than 8 stores it is likely to be expanded as an
+ // inline memcpy so we take that as an upper bound. Otherwise we assume
+ // one load and one store per word copied.
+ // FIXME: The maxStoresPerMemcpy setting from the target should be used
+ // here instead of a magic number of 8, but it's not available via
+ // TargetData.
+ NumStores = std::min(NumStores, 8U);
+
+ Cost -= 2 * NumStores * InlineConstants::InstrCost;
+ } else {
+ // For non-byval arguments subtract off one instruction per call
+ // argument.
+ Cost -= InlineConstants::InstrCost;
+ }
+ }
// If there is only one call of the function, and it has internal linkage,
// the cost of inlining it drops dramatically.