llvm - Unofficial llvm GIT mirror used in EmbToolkit

	Commit message (Collapse)	Author	Age
*	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl"	Duncan P. N. Exon Smith	2014-04-19
\| \| \| \| \| \|	This reverts commit r206704, as expected. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206707 91177308-0d34-0410-b5e6-96231b3b80d8
*	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl"	Duncan P. N. Exon Smith	2014-04-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit r206677, reapplying my BlockFrequencyInfo rewrite. I've done a careful audit, added some asserts, and fixed a couple of bugs (unfortunately, they were in unlikely code paths). There's a small chance that this will appease the failing bots [1][2]. (If so, great!) If not, I have a follow-up commit ready that will temporarily add -debug-only=block-freq to the two failing tests, allowing me to compare the code path between what the failing bots and what my machines (and the rest of the bots) are doing. Once I've triggered those builds, I'll revert both commits so the bots go green again. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 [2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445 <rdar://problem/14292693> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206704 91177308-0d34-0410-b5e6-96231b3b80d8
*	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2)	Duncan P. N. Exon Smith	2014-04-19
\| \| \| \| \| \| \| \| \| \| \|	This reverts commit r206666, as planned. Still stumped on why the bots are failing. Sanitizer bots haven't turned anything up. If anyone can help me debug either of the failures (referenced in r206666) I'll owe them a beer. (In the meantime, I'll be auditing my patch for undefined behaviour.) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206677 91177308-0d34-0410-b5e6-96231b3b80d8
*	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2)	Duncan P. N. Exon Smith	2014-04-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit r206628, reapplying r206622 (and r206626). Two tests are failing only on buildbots [1][2]: i.e., I can't reproduce on Darwin, and Chandler can't reproduce on Linux. Asan and valgrind don't tell us anything, but we're hoping the msan bot will catch it. So, I'm applying this again to get more feedback from the bots. I'll leave it in long enough to trigger builds in at least the sanitizer buildbots (it was failing for reasons unrelated to my commit last time it was in), and hopefully a few others.... and then I expect to revert a third time. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 [2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206666 91177308-0d34-0410-b5e6-96231b3b80d8
*	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2)	Duncan P. N. Exon Smith	2014-04-18
\| \| \| \| \| \| \| \| \|	This reverts commit r206622 and the MSVC fixup in r206626. Apparently the remotely failing tests are still failing, despite my attempt to fix the nondeterminism in r206621. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206628 91177308-0d34-0410-b5e6-96231b3b80d8
*	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl"	Duncan P. N. Exon Smith	2014-04-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit r206556, effectively reapplying commit r206548 and its fixups in r206549 and r206550. In an intervening commit I've added target triples to the tests that were failing remotely [1] (but passing locally). I'm hoping the mystery is solved? I'll revert this again if the tests are still failing remotely. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206622 91177308-0d34-0410-b5e6-96231b3b80d8
*	[LCG] Add support for building persistent and connected SCCs to the	Chandler Carruth	2014-04-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	LazyCallGraph. This is the start of the whole point of this different abstraction, but it is just the initial bits. Here is a run-down of what's going on here. I'm planning to incorporate some (or all) of this into comments going forward, hopefully with better editing and wording. =] The crux of the problem with the traditional way of building SCCs is that they are ephemeral. The new pass manager however really needs the ability to associate analysis passes and results of analysis passes with SCCs in order to expose these analysis passes to the SCC passes. Making this work is kind-of the whole point of the new pass manager. =] So, when we're building SCCs for the call graph, we actually want to build persistent nodes that stick around and can be reasoned about later. We'd also like the ability to walk the SCC graph in more complex ways than just the traditional postorder traversal of the current CGSCC walk. That means that in addition to being persistent, the SCCs need to be connected into a useful graph structure. However, we still want the SCCs to be formed lazily where possible. These constraints are quite hard to satisfy with the SCC iterator. Also, using that would bypass our ability to actually add data to the nodes of the call graph to facilite implementing the Tarjan walk. So I've re-implemented things in a more direct and embedded way. This immediately makes it easy to get the persistence and connectivity correct, and it also allows leveraging the existing nodes to simplify the algorithm. I've worked somewhat to make this implementation more closely follow the traditional paper's nomenclature and strategy, although it is still a bit obtuse because it isn't recursive, using an explicit stack and a tail call instead, and it is interruptable, resuming each time we need another SCC. The other tricky bit here, and what actually took almost all the time and trials and errors I spent building this, is exactly what graph structure to build for the SCCs. The naive thing to build is the call graph in its newly acyclic form. I wrote about 4 versions of this which did precisely this. Inevitably, when I experimented with them across various use cases, they became incredibly awkward. It was all implementable, but it felt like a complete wrong fit. Square peg, round hole. There were two overriding aspects that pushed me in a different direction: 1) We want to discover the SCC graph in a postorder fashion. That means the root node will be the last node we find. Using the call-SCC DAG as the graph structure of the SCCs results in an orphaned graph until we discover a root. 2) We will eventually want to walk the SCC graph in parallel, exploring distinct sub-graphs independently, and synchronizing at merge points. This again is not helped by the call-SCC DAG structure. The structure which, quite surprisingly, ended up being completely natural to use is the inverse of the call-SCC DAG. We add the leaf SCCs to the graph as "roots", and have edges to the caller SCCs. Once I switched to building this structure, everything just fell into place elegantly. Aside from general cleanups (there are FIXMEs and too few comments overall) that are still needed, the other missing piece of this is support for iterating across levels of the SCC graph. These will become useful for implementing #2, but they aren't an immediate priority. Once SCCs are in good shape, I'll be working on adding mutation support for incremental updates and adding the pass manager that this analysis enables. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206581 91177308-0d34-0410-b5e6-96231b3b80d8
*	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl"	Duncan P. N. Exon Smith	2014-04-18
\| \| \| \| \| \| \| \| \| \| \|	This reverts commits r206548, r206549 and r206549. There are some unit tests failing that aren't failing locally [1], so reverting until I have time to investigate. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206556 91177308-0d34-0410-b5e6-96231b3b80d8
*	blockfreq: Rewrite BlockFrequencyInfoImpl	Duncan P. N. Exon Smith	2014-04-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rewrite the shared implementation of BlockFrequencyInfo and MachineBlockFrequencyInfo entirely. The old implementation had a fundamental flaw: precision losses from nested loops (or very wide branches) compounded past loop exits (and convergence points). The @nested_loops testcase at the end of test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating. This function has three nested loops, with branch weights in the loop headers of 1:4000 (exit:continue). The old analysis gives non-sensical results: Printing analysis 'Block Frequency Analysis' for function 'nested_loops': ---- Block Freqs ---- entry = 1.0 for.cond1.preheader = 1.00103 for.cond4.preheader = 5.5222 for.body6 = 18095.19995 for.inc8 = 4.52264 for.inc11 = 0.00109 for.end13 = 0.0 The new analysis gives correct results: Printing analysis 'Block Frequency Analysis' for function 'nested_loops': block-frequency-info: nested_loops - entry: float = 1.0, int = 8 - for.cond1.preheader: float = 4001.0, int = 32007 - for.cond4.preheader: float = 16008001.0, int = 128064007 - for.body6: float = 64048012001.0, int = 512384096007 - for.inc8: float = 16008001.0, int = 128064007 - for.inc11: float = 4001.0, int = 32007 - for.end13: float = 1.0, int = 8 Most importantly, the frequency leaving each loop matches the frequency entering it. The new algorithm leverages BlockMass and PositiveFloat to maintain precision, separates "probability mass distribution" from "loop scaling", and uses dithering to eliminate probability mass loss. I have unit tests for these types out of tree, but it was decided in the review to make the classes private to BlockFrequencyInfoImpl, and try to shrink them (or remove them entirely) in follow-up commits. The new algorithm should generally have a complexity advantage over the old. The previous algorithm was quadratic in the worst case. The new algorithm is still worst-case quadratic in the presence of irreducible control flow, but it's linear without it. The key difference between the old algorithm and the new is that control flow within a loop is evaluated separately from control flow outside, limiting propagation of precision problems and allowing loop scale to be calculated independently of mass distribution. Loops are visited bottom-up, their loop scales are calculated, and they are replaced by pseudo-nodes. Mass is then distributed through the function, which is now a DAG. Finally, loops are revisited top-down to multiply through the loop scales and the masses distributed to pseudo nodes. There are some remaining flaws. - Irreducible control flow isn't modelled correctly. LoopInfo and MachineLoopInfo ignore irreducible edges, so this algorithm will fail to scale accordingly. There's a note in the class documentation about how to get closer. See also the comments in test/Analysis/BlockFrequencyInfo/irreducible.ll. - Loop scale is limited to 4096 per loop (2^12) to avoid exhausting the 64-bit integer precision used downstream. - The "bias" calculation proposed on llvmdev is not incorporated here. This will be added in a follow-up commit, once comments from this review have been handled. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206548 91177308-0d34-0410-b5e6-96231b3b80d8
*	Fix a bug in which BranchProbabilityInfo wasn't setting branch weights of ↵	Akira Hatanaka	2014-04-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	basic blocks inside loops correctly. Previously, BranchProbabilityInfo::calcLoopBranchHeuristics would determine the weights of basic blocks inside loops even when it didn't have enough information to estimate the branch probabilities correctly. This patch fixes the function to exit early if it doesn't see any exit edges or back edges and let the later heuristics determine the weights. This fixes PR18705 and <rdar://problem/15991090>. Differential Revision: http://reviews.llvm.org/D3363 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206194 91177308-0d34-0410-b5e6-96231b3b80d8
*	Don't assert in BasicTTI::getMemoryOpCost for non-simple types	Hal Finkel	2014-04-14
\| \| \| \| \| \| \| \|	BasicTTI::getMemoryOpCost must explicitly check for non-simple types; setting AllowUnknown=true with TLI->getSimpleValueType is not sufficient because, for example, non-power-of-two vector types return non-simple EVTs (not MVT::Other). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206150 91177308-0d34-0410-b5e6-96231b3b80d8
*	in findGCD of multiply expr return the gcd	Sebastian Pop	2014-04-08
\| \| \| \| \| \|	we used to return 1 instead of the gcd git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205800 91177308-0d34-0410-b5e6-96231b3b80d8
*	[PowerPC] Adjust load/store costs in PPCTTI	Hal Finkel	2014-04-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This provides more realistic costs for the insert/extractelement instructions (which are load/store pairs), accounts for the cheap unaligned Altivec load sequence, and for unaligned VSX load/stores. Bad news: MultiSource/Applications/sgefa/sgefa - 35% slowdown (this will require more investigation) SingleSource/Benchmarks/McGill/queens - 20% slowdown (we no longer vectorize this, but it was a constant store that was scalarized) MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 - 2% slowdown Good news: SingleSource/Benchmarks/Shootout/ary3 - 54% speedup SingleSource/Benchmarks/Shootout-C++/ary - 40% speedup MultiSource/Benchmarks/Ptrdist/ks/ks - 35% speedup MultiSource/Benchmarks/FreeBench/neural/neural - 30% speedup MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt - 20% speedup Unfortunately, estimating the costs of the stack-based scalarization sequences is hard, and adjusting these costs is like a game of whac-a-mole :( I'll revisit this again after we have better codegen for vector extloads and truncstores and unaligned load/stores. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205658 91177308-0d34-0410-b5e6-96231b3b80d8
*	Account for scalarization costs in BasicTTI::getMemoryOpCost for extending ↵	Hal Finkel	2014-04-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vector loads When a vector type legalizes to a larger vector type, and the target does not support the associated extending load (or truncating store), then legalization will scalarize the load (or store) resulting in an associated scalarization cost. BasicTTI::getMemoryOpCost needs to account for this. Between this, and r205487, PowerPC on the P7 with VSX enabled shows: MultiSource/Benchmarks/PAQ8p/paq8p: 43% speedup SingleSource/Benchmarks/BenchmarkGame/puzzle: 51% speedup SingleSource/UnitTests/Vectorizer/gcc-loops 28% speedup (some of these are new; some of these, such as PAQ8p, just reverse regressions that VSX support would trigger) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205495 91177308-0d34-0410-b5e6-96231b3b80d8
*	Fix multi-register costs in BasicTTI::getCastInstrCost	Hal Finkel	2014-04-02
\| \| \| \| \| \| \| \| \| \| \| \| \|	For an cast (extension, etc.), the currently logic predicts a low cost if the associated operation (keyed on the destination type) is legal (or promoted). This is not true when the number of values required to legalize the type is changing. For example, <8 x i16> being sign extended by <8 x i32> is not generically cheap on PPC with VSX, even though sign extension to v4i32 is legal, because two output v4i32 values are required compared to the single v8i16 input value, and without custom logic in the target, this conversion will scalarize. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205487 91177308-0d34-0410-b5e6-96231b3b80d8
*	ARM64: initial backend import	Tim Northover	2014-03-29
\| \| \| \| \| \| \| \| \| \| \| \|	This adds a second implementation of the AArch64 architecture to LLVM, accessible in parallel via the "arm64" triple. The plan over the coming weeks & months is to merge the two into a single backend, during which time thorough code review should naturally occur. Everything will be easier with the target in-tree though, hence this commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205090 91177308-0d34-0410-b5e6-96231b3b80d8
*	PR15967 Fix in basicaa for faulty returning no alias.	Arnold Schwaighofer	2014-03-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit consist of two parts. The first part fix the PR15967. The wrong conclusion was made when the MaxLookup limit was reached. The fix introduce a out parameter (MaxLookupReached) to DecomposeGEPExpression that the function aliasGEP can act upon. The second part is introducing the constant MaxLookupSearchDepth to make sure that DecomposeGEPExpression and GetUnderlyingObject use the same search depth. This is a small cleanup to clarify the original algorithm. Patch by Karl-Johan Karlsson! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204859 91177308-0d34-0410-b5e6-96231b3b80d8
*	ScalarEvolution: Compute exit counts for loops with a power-of-2 step.	Benjamin Kramer	2014-03-25
\| \| \| \| \| \| \| \| \| \| \| \| \|	If we have a loop of the form for (unsigned n = 0; n != (k & -32); n += 32) {} then we know that n is always divisible by 32 and the loop must terminate. Even if we have a condition where the loop counter will overflow it'll always hold this invariant. PR19183. Our loop vectorizer creates this pattern and it's also occasionally formed by loop counters derived from pointers. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204728 91177308-0d34-0410-b5e6-96231b3b80d8
*	Reject alias to undefined symbols in the verifier.	Rafael Espindola	2014-03-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On ELF and COFF an alias is just another name for a position in the file. There is no way to refer to a position in another file, so an alias to undefined is meaningless. MachO currently doesn't support aliases. The spec has a N_INDR, which when implemented will have a different set of restrictions. Adding support for it shouldn't be harder than any other IR extension. For now, having the IR represent what is actually possible with current tools makes it easier to fix the design of GlobalAlias. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203705 91177308-0d34-0410-b5e6-96231b3b80d8
*	When analyzing vectors of element type that require legalization,	Raul E. Silvera	2014-03-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the legalization cost must be included to get an accurate estimation of the total cost of the scalarized vector. The inaccurate cost triggered unprofitable SLP vectorization on 32-bit X86. Summary: Include legalization overhead when computing scalarization cost Reviewers: hfinkel, nadav CC: chandlerc, rnk, llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2992 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203509 91177308-0d34-0410-b5e6-96231b3b80d8
*	Teach lint about address spaces	Matt Arsenault	2014-03-06
\| \| \| \|	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203132 91177308-0d34-0410-b5e6-96231b3b80d8
*	add -da-delinearize runs and checks to MIV testcases	Sebastian Pop	2014-02-21
\| \| \| \|	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201869 91177308-0d34-0410-b5e6-96231b3b80d8
*	Add extra CHECK prefix to tests with explicit prefix	Nico Rieck	2014-02-16
\| \| \| \| \| \| \|	These tests mistakenly assume that CHECK is still available even if an explicit prefix is specified. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201492 91177308-0d34-0410-b5e6-96231b3b80d8
*	Actually call FileCheck in tests	Nico Rieck	2014-02-16
\| \| \| \|	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201491 91177308-0d34-0410-b5e6-96231b3b80d8
*	Fix broken CHECK lines	Nico Rieck	2014-02-16
\| \| \| \|	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201479 91177308-0d34-0410-b5e6-96231b3b80d8
*	[Vectorizer] Add a new 'OperandValueKind' in TargetTransformInfo called	Andrea Di Biagio	2014-02-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	'OK_NonUniformConstValue' to identify operands which are constants but not constant splats. The cost model now allows returning 'OK_NonUniformConstValue' for non splat operands that are instances of ConstantVector or ConstantDataVector. With this change, targets are now able to compute different costs for instructions with non-uniform constant operands. For example, On X86 the cost of a vector shift may vary depending on whether the second operand is a uniform or non-uniform constant. This patch applies the following changes: - The cost model computation now takes into account non-uniform constants; - The cost of vector shift instructions has been improved in X86TargetTransformInfo analysis pass; - BBVectorize, SLPVectorizer and LoopVectorize now know how to distinguish between non-uniform and uniform constant operands. Added a new test to verify that the output of opt '-cost-model -analyze' is valid in the following configurations: SSE2, SSE4.1, AVX, AVX2. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201272 91177308-0d34-0410-b5e6-96231b3b80d8
*	Test case I forgot to 'add' for r201126.	Craig Topper	2014-02-12
\| \| \| \|	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201207 91177308-0d34-0410-b5e6-96231b3b80d8
*	ScalarEvolution: Analyze trip count of loops with a switch guarding the exit.	Benjamin Kramer	2014-02-11
\| \| \| \|	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201159 91177308-0d34-0410-b5e6-96231b3b80d8
*	X86: add costs for 64-bit vector ext/trunc & rebalance	Tim Northover	2014-02-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The most important part of this is probably adding any cost at all for operations like zext <8 x i8> to <8 x i32>. Before they were being recorded as extremely costly (24, I believe) which made LLVM fall back on a 4-wide vectorisation of a loop. It also rebalances the values for sext, zext and trunc. Lacking any other sane metric that might work across CPU microarchitectures I went for instructions. This seems to be in reasonable accord with the rest of the table (sitofp, ...) though no doubt at least one value is sub-optimal for some bizarre reason. Finally, separate AVX and AVX2 values are provided where appropriate. The CodeGen is quite different in many cases. rdar://problem/15981990 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200928 91177308-0d34-0410-b5e6-96231b3b80d8
*	[PM] Add a new "lazy" call graph analysis pass for the new pass manager.	Chandler Carruth	2014-02-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The primary motivation for this pass is to separate the call graph analysis used by the new pass manager's CGSCC pass management from the existing call graph analysis pass. That analysis pass is (somewhat unfortunately) over-constrained by the existing CallGraphSCCPassManager requirements. Those requirements make it really hard to cleanly layer the needed functionality for the new pass manager on top of the existing analysis. However, there are also a bunch of things that the pass manager would specifically benefit from doing differently from the existing call graph analysis, and this new implementation tries to address several of them: - Be lazy about scanning function definitions. The existing pass eagerly scans the entire module to build the initial graph. This new pass is significantly more lazy, and I plan to push this even further to maximize locality during CGSCC walks. - Don't use a single synthetic node to partition functions with an indirect call from functions whose address is taken. This node creates a huge choke-point which would preclude good parallelization across the fanout of the SCC graph when we got to the point of looking at such changes to LLVM. - Use a memory dense and lightweight representation of the call graph rather than value handles and tracking call instructions. This will require explicit update calls instead of some updates working transparently, but should end up being significantly more efficient. The explicit update calls ended up being needed in many cases for the existing call graph so we don't really lose anything. - Doesn't explicitly model SCCs and thus doesn't provide an "identity" for an SCC which is stable across updates. This is essential for the new pass manager to work correctly. - Only form the graph necessary for traversing all of the functions in an SCC friendly order. This is a much simpler graph structure and should be more memory dense. It does limit the ways in which it is appropriate to use this analysis. I wish I had a better name than "call graph". I've commented extensively this aspect. This is still very much a WIP, in fact it is really just the initial bits. But it is about the fourth version of the initial bits that I've implemented with each of the others running into really frustrating problms. This looks like it will actually work and I'd like to split the actual complexity across commits for the sake of my reviewers. =] The rest of the implementation along with lots of wiring will follow somewhat more rapidly now that there is a good path forward. Naturally, this doesn't impact any of the existing optimizer. This code is specific to the new pass manager. A bunch of thanks are deserved for the various folks that have helped with the design of this, especially Nick Lewycky who actually sat with me to go through the fundamentals of the final version here. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200903 91177308-0d34-0410-b5e6-96231b3b80d8
*	Fix crasher introduced in r200203 and caught by a libc++ buildbot. Don't ↵	Nick Lewycky	2014-01-27
\| \| \| \| \| \|	assume that getMulExpr returns a SCEVMulExpr, it may have simplified it to something else! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200210 91177308-0d34-0410-b5e6-96231b3b80d8
*	Teach SCEV to handle more cases of 'and X, CST', specifically where CST is ↵	Nick Lewycky	2014-01-27
\| \| \| \| \| \| \| \| \| \| \|	any number of contiguous 1 bits in a row, with any number of leading and trailing 0 bits. Unfortunately, this in turn led to some lower quality SCEVs due to some different paths through expression simplification, so add getUDivExactExpr and use it. This fixes all instances of the problems that I found, but we can make that function smarter as necessary. Merge test "xor-and.ll" into "and-xor.ll" since I needed to update it anyways. Test 'nsw-offset.ll' analyzes a little deeper, %n now gets a scev in terms of %no instead of a SCEVUnknown. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200203 91177308-0d34-0410-b5e6-96231b3b80d8
*	Fix known typos	Alp Toker	2014-01-24
\| \| \| \| \| \| \|	Sweep the codebase for common typos. Includes some changes to visible function names that were misspelt. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200018 91177308-0d34-0410-b5e6-96231b3b80d8
*	BasicAA: We need to check both access sizes when comparing a gep and an	Arnold Schwaighofer	2014-01-16
\| \| \| \| \| \| \| \|	underlying object of unknown size. Fixes PR18460. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199351 91177308-0d34-0410-b5e6-96231b3b80d8
*	Fix broken CHECK lines.	Benjamin Kramer	2014-01-11
\| \| \| \|	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199016 91177308-0d34-0410-b5e6-96231b3b80d8
*	Fixed old typo in ScalarEvolution, that caused wrong SCEVs zext operation.	Stepan Dyatkovskiy	2014-01-09
\| \| \| \| \| \| \| \| \| \| \|	Detailed description is here: http://llvm.org/bugs/show_bug.cgi?id=18000#c16 For participation in bugfix process special thanks to David Wiberg. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198863 91177308-0d34-0410-b5e6-96231b3b80d8
*	BasicAA: Use reachabilty instead of dominance for checking value equality in phi	Arnold Schwaighofer	2014-01-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cycles This allows the value equality check to work even if we don't have a dominator tree. Also add some more comments. I was worried about compile time impacts and did not implement reachability but used the dominance check in the initial patch. The trade-off was that the dominator tree was required. The llvm utility function isPotentiallyReachable cuts off the recursive search after 32 visits. Testing did not show any compile time regressions showing my worries unjustfied. No compile time or performance regressions at O3 -flto -mavx on test-suite + externals. Addresses review comments from r198290. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198400 91177308-0d34-0410-b5e6-96231b3b80d8
*	BasicAA: Fix value equality and phi cycles	Arnold Schwaighofer	2014-01-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When there are cycles in the value graph we have to be careful interpreting "Value" identity as "value" equivalence. We interpret the value of a phi node as the value of its operands. When we check for value equivalence now we make sure that the "Value" dominates all cycles (phis). %0 = phi [%noaliasval, %addr2] %l = load %ptr %addr1 = gep @a, 0, %l %addr2 = gep @a, 0, (%l + 1) store %ptr ... Before this patch we would return NoAlias for (%0, %addr1) which is wrong because the value of the load is from different iterations of the loop. Tested on x86_64 -mavx at O3 and O3 -flto with no performance or compile time regressions. PR18068 radar://15653794 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198290 91177308-0d34-0410-b5e6-96231b3b80d8
*	Use correct size for address space in BasicAA.	Matt Arsenault	2013-11-16
\| \| \| \| \| \| \| \| \| \| \| \|	The tests just hit this with a different sized address space since I haven't figured out how to use this to break it. I thought I committed this a long time ago, and I'm not sure why missing this hasn't caused any problems. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194903 91177308-0d34-0410-b5e6-96231b3b80d8
*	improve dependence analysis testcases	Sebastian Pop	2013-11-12
\| \| \| \| \| \| \|	print the name of the function on which the dependence analysis is performed such that changes to the testcase are easier to review. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194528 91177308-0d34-0410-b5e6-96231b3b80d8
*	delinearization of arrays	Sebastian Pop	2013-11-12
\| \| \| \|	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194527 91177308-0d34-0410-b5e6-96231b3b80d8
*	Rewrite SCEV's backedge taken count computation.	Andrew Trick	2013-11-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch by Michele Scandale! Rewrite of the functions used to compute the backedge taken count of a loop on LT and GT comparisons. I decided to split the handling of LT and GT cases becasue the trick "a > b == -a < -b" in some cases prevents the trip count computation due to the multiplication by -1 on the two operands of the comparison. This issue comes from the conservative computation of value range of SCEVs: taking the negative SCEV of an expression that have a small positive range (e.g. [0,31]), we would have a SCEV with a fullset as value range. Indeed, in the new rewritten function I tried to better handle the maximum backedge taken count computation when MAX/MIN expression are used to handle the cases where no entry guard is found. Some test have been modified in order to check the new value correctly (I manually check them and reasoning on possible overflow the new values seem correct). I finally added a new test case related to the multiplication by -1 issue on GT comparisons. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194116 91177308-0d34-0410-b5e6-96231b3b80d8
*	Consider (x == -1) unlikely in BranchProbabilityInfo	Hal Finkel	2013-11-01
\| \| \| \| \| \| \| \| \| \| \| \|	This adds another heuristic to BPI, similar to the existing heuristic that considers (x == 0) unlikely to be true. As suggested in the PACT'98 paper by Deitrich, Cheng, and Hwu, -1 is often used to indicate an invalid index, and equality comparisons with -1 are also unlikely to succeed. Local experimentation supports this hypothesis: This yields a 1-2% speedup in the test-suite sqlite benchmark on the PPC A2 core, with no significant regressions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193855 91177308-0d34-0410-b5e6-96231b3b80d8
*	SCEV: Make the final add of an inbounds GEP nuw if we know that the index is ↵	Benjamin Kramer	2013-10-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	positive. We can't do this for the general case as saying a GEP with a negative index doesn't have unsigned wrap isn't valid for negative indices. %gep = getelementptr inbounds i32* %p, i64 -1 But an inbounds GEP cannot run past the end of address space. So we check for the very common case of a positive index and make GEPs derived from that NUW. Together with Andy's recent non-unit stride work this lets us analyze loops like void foo3(int a, int b) { for (; a < b; a++) {} } PR12375, PR12376. Differential Revision: http://llvm-reviews.chandlerc.com/D2033 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193514 91177308-0d34-0410-b5e6-96231b3b80d8
*	Revert r193251 : Use address-taken to disambiguate global variable and ↵	Shuxin Yang	2013-10-27
\| \| \| \| \| \|	indirect memops. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193489 91177308-0d34-0410-b5e6-96231b3b80d8
*	X86: Custom lower sext v16i8 to v16i16, and the corresponding truncate.	Benjamin Kramer	2013-10-23
\| \| \| \| \| \|	Also update the cost model. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193270 91177308-0d34-0410-b5e6-96231b3b80d8
*	Use address-taken to disambiguate global variable and indirect memops.	Shuxin Yang	2013-10-23
\| \| \| \| \| \| \| \| \| \| \|	Major steps include: 1). introduces a not-addr-taken bit-field in GlobalVariable 2). GlobalOpt pass sets "not-address-taken" if it proves a global varirable dosen't have its address taken. 3). AA use this info for disambiguation. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193251 91177308-0d34-0410-b5e6-96231b3b80d8
*	Simplify testing case (Thanks Rafael for the testing case).	Manman Ren	2013-10-22
\| \| \| \|	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193177 91177308-0d34-0410-b5e6-96231b3b80d8
*	TBAA: fix PR17620.	Manman Ren	2013-10-22
\| \| \| \| \| \| \| \|	We can have a struct type with a single field and the field does not start with 0. In that case, we should correctly update the offset. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193137 91177308-0d34-0410-b5e6-96231b3b80d8
*	Fix creating bitcasts between address spaces in SCEV.	Matt Arsenault	2013-10-21
\| \| \| \| \| \| \| \|	The test before wasn't successfully testing this since it was missing the datalayout piece to change the size of the second address space. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193102 91177308-0d34-0410-b5e6-96231b3b80d8