MI-Sched: handle latency of in-order operations with the new machine model.

The per-operand machine model allows the target to define "unbuffered" processor resources. This change is a quick, cheap way to model stalls caused by the latency of operations that use such resources. This only applies when the processor's micro-op buffer size is non-zero (Out-of-Order). We can't precisely model in-order stalls during out-of-order execution, but this is an easy and effective heuristic. It benefits cortex-a9 scheduling when using the new machine model, which is not yet on by default. MI-Sched for armv7 was evaluated on Swift (and only not enabled because of a performance bug related to predication). However, we never evaluated Cortex-A9 performance on MI-Sched in its current form. This change adds MI-Sched functionality to reach performance goals on A9. The only remaining change is to allow MI-Sched to run as a PostRA pass. I evaluated performance using a set of options to estimate the performance impact once MI sched is default on armv7: -mcpu=cortex-a9 -disable-post-ra -misched-bench -scheditins=false For a simple saxpy loop I see a 1.7x speedup. Here are the llvm-testsuite results: (min run time over 2 runs, filtering tiny changes) Speedups: | Benchmarks/BenchmarkGame/recursive | 52.39% | | Benchmarks/VersaBench/beamformer | 20.80% | | Benchmarks/Misc/pi | 19.97% | | Benchmarks/Misc/mandel-2 | 19.95% | | SPEC/CFP2000/188.ammp | 18.72% | | Benchmarks/McCat/08-main/main | 18.58% | | Benchmarks/Misc-C++/Large/sphereflake | 18.46% | | Benchmarks/Olden/power | 17.11% | | Benchmarks/Misc-C++/mandel-text | 16.47% | | Benchmarks/Misc/oourafft | 15.94% | | Benchmarks/Misc/flops-7 | 14.99% | | Benchmarks/FreeBench/distray | 14.26% | | SPEC/CFP2006/470.lbm | 14.00% | | mediabench/mpeg2/mpeg2dec/mpeg2decode | 12.28% | | Benchmarks/SmallPT/smallpt | 10.36% | | Benchmarks/Misc-C++/Large/ray | 8.97% | | Benchmarks/Misc/fp-convert | 8.75% | | Benchmarks/Olden/perimeter | 7.10% | | Benchmarks/Bullet/bullet | 7.03% | | Benchmarks/Misc/mandel | 6.75% | | Benchmarks/Olden/voronoi | 6.26% | | Benchmarks/Misc/flops-8 | 5.77% | | Benchmarks/Misc/matmul_f64_4x4 | 5.19% | | Benchmarks/MiBench/security-rijndael | 5.15% | | Benchmarks/Misc/flops-6 | 5.10% | | Benchmarks/Olden/tsp | 4.46% | | Benchmarks/MiBench/consumer-lame | 4.28% | | Benchmarks/Misc/flops-5 | 4.27% | | Benchmarks/mafft/pairlocalalign | 4.19% | | Benchmarks/Misc/himenobmtxpa | 4.07% | | Benchmarks/Misc/lowercase | 4.06% | | SPEC/CFP2006/433.milc | 3.99% | | Benchmarks/tramp3d-v4 | 3.79% | | Benchmarks/FreeBench/pifft | 3.66% | | Benchmarks/Ptrdist/ks | 3.21% | | Benchmarks/Adobe-C++/loop_unroll | 3.12% | | SPEC/CINT2000/175.vpr | 3.12% | | Benchmarks/nbench | 2.98% | | SPEC/CFP2000/183.equake | 2.91% | | Benchmarks/Misc/perlin | 2.85% | | Benchmarks/Misc/flops-1 | 2.82% | | Benchmarks/Misc-C++-EH/spirit | 2.80% | | Benchmarks/Misc/flops-2 | 2.77% | | Benchmarks/NPB-serial/is | 2.42% | | Benchmarks/ASC_Sequoia/CrystalMk | 2.33% | | Benchmarks/BenchmarkGame/n-body | 2.28% | | Benchmarks/SciMark2-C/scimark2 | 2.27% | | Benchmarks/Olden/bh | 2.03% | | skidmarks10/skidmarks | 1.81% | | Benchmarks/Misc/flops | 1.72% | Slowdowns: | Benchmarks/llubenchmark/llu | -14.14% | | Benchmarks/Polybench/stencils/seidel-2d | -5.67% | | Benchmarks/Adobe-C++/functionobjects | -5.25% | | Benchmarks/Misc-C++/oopack_v1p8 | -5.00% | | Benchmarks/Shootout/hash | -2.35% | | Benchmarks/Prolangs-C++/ocean | -2.01% | | Benchmarks/Polybench/medley/floyd-warshall | -1.98% | | Polybench/linear-algebra/kernels/3mm | -1.95% | | Benchmarks/McCat/09-vor/vor | -1.68% | git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196516 91177308-0d34-0410-b5e6-96231b3b80d8
author: Andrew Trick <atrick@apple.com> 2013-12-05 17:55:58 +0000
committer: Andrew Trick <atrick@apple.com> 2013-12-05 17:55:58 +0000
commit: 573931394fc307a4606bd0b1854d4df5bf5638a1 (patch)
tree: 9f16093011d94726301cec49d8ded650df2cff3a /include/llvm
parent: bdbcb4dfbc4e5c0bfeafa8416c9ac1ae39e4b794 (diff)
download: llvm-573931394fc307a4606bd0b1854d4df5bf5638a1.tar.gz
llvm-573931394fc307a4606bd0b1854d4df5bf5638a1.tar.bz2
llvm-573931394fc307a4606bd0b1854d4df5bf5638a1.tar.xz
1 files changed, 13 insertions, 9 deletions
diff --git a/include/llvm/CodeGen/ScheduleDAG.h b/include/llvm/CodeGen/ScheduleDAG.h
index ccba1b0364..48ed232a15 100644
--- a/include/llvm/CodeGen/ScheduleDAG.h
+++ b/include/llvm/CodeGen/ScheduleDAG.h
@@ -292,6 +292,7 @@ namespace llvm {
     bool isScheduleHigh   : 1;          // True if preferable to schedule high.
     bool isScheduleLow    : 1;          // True if preferable to schedule low.
     bool isCloned         : 1;          // True if this node has been cloned.
+    bool isUnbuffered     : 1;          // Reads an unbuffered resource.
     Sched::Preference SchedulingPref;   // Scheduling preference.
 
   private:
@@ -316,9 +317,10 @@ namespace llvm {
         isTwoAddress(false), isCommutable(false), hasPhysRegUses(false),
         hasPhysRegDefs(false), hasPhysRegClobbers(false), isPending(false),
         isAvailable(false), isScheduled(false), isScheduleHigh(false),
-        isScheduleLow(false), isCloned(false), SchedulingPref(Sched::None),
-        isDepthCurrent(false), isHeightCurrent(false), Depth(0), Height(0),
-        TopReadyCycle(0), BotReadyCycle(0), CopyDstRC(NULL), CopySrcRC(NULL) {}
+        isScheduleLow(false), isCloned(false), isUnbuffered(false),
+        SchedulingPref(Sched::None), isDepthCurrent(false),
+        isHeightCurrent(false), Depth(0), Height(0), TopReadyCycle(0),
+        BotReadyCycle(0), CopyDstRC(NULL), CopySrcRC(NULL) {}
 
     /// SUnit - Construct an SUnit for post-regalloc scheduling to represent
     /// a MachineInstr.
@@ -330,9 +332,10 @@ namespace llvm {
         isTwoAddress(false), isCommutable(false), hasPhysRegUses(false),
         hasPhysRegDefs(false), hasPhysRegClobbers(false), isPending(false),
         isAvailable(false), isScheduled(false), isScheduleHigh(false),
-        isScheduleLow(false), isCloned(false), SchedulingPref(Sched::None),
-        isDepthCurrent(false), isHeightCurrent(false), Depth(0), Height(0),
-        TopReadyCycle(0), BotReadyCycle(0), CopyDstRC(NULL), CopySrcRC(NULL) {}
+        isScheduleLow(false), isCloned(false), isUnbuffered(false),
+        SchedulingPref(Sched::None), isDepthCurrent(false),
+        isHeightCurrent(false), Depth(0), Height(0), TopReadyCycle(0),
+        BotReadyCycle(0), CopyDstRC(NULL), CopySrcRC(NULL) {}
 
     /// SUnit - Construct a placeholder SUnit.
     SUnit()
@@ -343,9 +346,10 @@ namespace llvm {
         isTwoAddress(false), isCommutable(false), hasPhysRegUses(false),
         hasPhysRegDefs(false), hasPhysRegClobbers(false), isPending(false),
         isAvailable(false), isScheduled(false), isScheduleHigh(false),
-        isScheduleLow(false), isCloned(false), SchedulingPref(Sched::None),
-        isDepthCurrent(false), isHeightCurrent(false), Depth(0), Height(0),
-        TopReadyCycle(0), BotReadyCycle(0), CopyDstRC(NULL), CopySrcRC(NULL) {}
+        isScheduleLow(false), isCloned(false), isUnbuffered(false),
+        SchedulingPref(Sched::None), isDepthCurrent(false),
+        isHeightCurrent(false), Depth(0), Height(0), TopReadyCycle(0),
+        BotReadyCycle(0), CopyDstRC(NULL), CopySrcRC(NULL) {}
 
     /// \brief Boundary nodes are placeholders for the boundary of the
     /// scheduling region.
author	Andrew Trick <atrick@apple.com>	2013-12-05 17:55:58 +0000
committer	Andrew Trick <atrick@apple.com>	2013-12-05 17:55:58 +0000
commit	573931394fc307a4606bd0b1854d4df5bf5638a1 (patch)
tree	9f16093011d94726301cec49d8ded650df2cff3a /include/llvm
parent	bdbcb4dfbc4e5c0bfeafa8416c9ac1ae39e4b794 (diff)
download	llvm-573931394fc307a4606bd0b1854d4df5bf5638a1.tar.gz llvm-573931394fc307a4606bd0b1854d4df5bf5638a1.tar.bz2 llvm-573931394fc307a4606bd0b1854d4df5bf5638a1.tar.xz