summaryrefslogtreecommitdiff
path: root/test/Analysis/CostModel
diff options
context:
space:
mode:
authorHal Finkel <hfinkel@anl.gov>2014-04-03 00:53:59 +0000
committerHal Finkel <hfinkel@anl.gov>2014-04-03 00:53:59 +0000
commitd68b03bcd2b734ab38c8a590456c4faa71deeb94 (patch)
tree6437d41e174de794f9e3cb6a5d29360fa1bb23e2 /test/Analysis/CostModel
parent14ae43449c7038afa8dced4f5cb054b243b67d76 (diff)
downloadllvm-d68b03bcd2b734ab38c8a590456c4faa71deeb94.tar.gz
llvm-d68b03bcd2b734ab38c8a590456c4faa71deeb94.tar.bz2
llvm-d68b03bcd2b734ab38c8a590456c4faa71deeb94.tar.xz
Account for scalarization costs in BasicTTI::getMemoryOpCost for extending vector loads
When a vector type legalizes to a larger vector type, and the target does not support the associated extending load (or truncating store), then legalization will scalarize the load (or store) resulting in an associated scalarization cost. BasicTTI::getMemoryOpCost needs to account for this. Between this, and r205487, PowerPC on the P7 with VSX enabled shows: MultiSource/Benchmarks/PAQ8p/paq8p: 43% speedup SingleSource/Benchmarks/BenchmarkGame/puzzle: 51% speedup SingleSource/UnitTests/Vectorizer/gcc-loops 28% speedup (some of these are new; some of these, such as PAQ8p, just reverse regressions that VSX support would trigger) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205495 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'test/Analysis/CostModel')
-rw-r--r--test/Analysis/CostModel/PowerPC/load_store.ll5
1 files changed, 5 insertions, 0 deletions
diff --git a/test/Analysis/CostModel/PowerPC/load_store.ll b/test/Analysis/CostModel/PowerPC/load_store.ll
index c77cce955a..8145a1dc71 100644
--- a/test/Analysis/CostModel/PowerPC/load_store.ll
+++ b/test/Analysis/CostModel/PowerPC/load_store.ll
@@ -29,6 +29,11 @@ define i32 @loads(i32 %arg) {
; CHECK: cost of 4 {{.*}} load
load i128* undef, align 4
+ ; FIXME: There actually are sub-vector Altivec loads, and so we could handle
+ ; this with a small expense, but we don't currently.
+ ; CHECK: cost of 60 {{.*}} load
+ load <4 x i16>* undef, align 2
+
ret i32 undef
}