summaryrefslogtreecommitdiff
path: root/lib
diff options
context:
space:
mode:
authorChandler Carruth <chandlerc@gmail.com>2014-01-27 11:12:24 +0000
committerChandler Carruth <chandlerc@gmail.com>2014-01-27 11:12:24 +0000
commit424b2b00931496f61260f8e114538bd0c4b7252a (patch)
tree360725de7cbf1212b2e9076866c7ef10025f1d19 /lib
parent9f22a8788fee099c016f8bec5769c3b8f3a720dd (diff)
downloadllvm-424b2b00931496f61260f8e114538bd0c4b7252a.tar.gz
llvm-424b2b00931496f61260f8e114538bd0c4b7252a.tar.bz2
llvm-424b2b00931496f61260f8e114538bd0c4b7252a.tar.xz
[vectorizer] Teach the loop vectorizer's unroller to only unroll by
powers of two. This is essentially always the correct thing given the impact on alignment, scaling factors that can be used in addressing modes, etc. Also, fix the management of the unroll vs. small loop cost to more accurately model things with this world. Enhance a test case to actually exercise more of the unroll machinery if using synthetic constants rather than a specific target model. Before this change, with the added flags this test will unroll 3 times instead of either 2 or 4 (the two sensible answers). While I don't expect this to make a huge difference, if there are lots of loops sitting right on the edge of hitting the 'small unroll' factor, they might change behavior. However, I've benchmarked moving the small loop cost up and down in many various ways and by a huge factor (2x) without seeing more than 0.2% code size growth. Small adjustments such as the series that led up here have led to about 1% improvement on some benchmarks, but it is very close to the noise floor so I mostly checked that nothing regressed. Let me know if you see bad behavior on other targets but I don't expect this to be a sufficiently dramatic change to trigger anything. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200213 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'lib')
-rw-r--r--lib/Transforms/Vectorize/LoopVectorize.cpp9
1 files changed, 6 insertions, 3 deletions
diff --git a/lib/Transforms/Vectorize/LoopVectorize.cpp b/lib/Transforms/Vectorize/LoopVectorize.cpp
index 23dc8f39ee..5c9933a2c7 100644
--- a/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -5004,8 +5004,11 @@ LoopVectorizationCostModel::selectUnrollFactor(bool OptForSize,
// registers. These registers are used by all of the unrolled instances.
// Next, divide the remaining registers by the number of registers that is
// required by the loop, in order to estimate how many parallel instances
- // fit without causing spills.
- unsigned UF = (TargetNumRegisters - R.LoopInvariantRegs) / R.MaxLocalUsers;
+ // fit without causing spills. All of this is rounded down if necessary to be
+ // a power of two. We want power of two unroll factors to simplify any
+ // addressing operations or alignment considerations.
+ unsigned UF = PowerOf2Floor((TargetNumRegisters - R.LoopInvariantRegs) /
+ R.MaxLocalUsers);
// Clamp the unroll factor ranges to reasonable factors.
unsigned MaxUnrollSize = TTI.getMaximumUnrollFactor();
@@ -5045,7 +5048,7 @@ LoopVectorizationCostModel::selectUnrollFactor(bool OptForSize,
DEBUG(dbgs() << "LV: Loop cost is " << LoopCost << '\n');
if (LoopCost < SmallLoopCost) {
DEBUG(dbgs() << "LV: Unrolling to reduce branch cost.\n");
- unsigned NewUF = SmallLoopCost / (LoopCost + 1);
+ unsigned NewUF = PowerOf2Floor(SmallLoopCost / LoopCost);
return std::min(NewUF, UF);
}