From 91a44dd9ccd8ec3a10fa35315c381cffade91d5b Mon Sep 17 00:00:00 2001 From: Eli Friedman Date: Fri, 12 Aug 2011 21:50:54 +0000 Subject: Some reorganization of atomic docs. Added explicit section for NonAtomic. Added example for illegal non-atomic operation. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137520 91177308-0d34-0410-b5e6-96231b3b80d8 --- docs/Atomics.html | 143 ++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 111 insertions(+), 32 deletions(-) (limited to 'docs') diff --git a/docs/Atomics.html b/docs/Atomics.html index 967ebdddb1..357f43167b 100644 --- a/docs/Atomics.html +++ b/docs/Atomics.html @@ -14,8 +14,8 @@
  1. Introduction
  2. -
  3. Load and store
  4. -
  5. Other atomic instructions
  6. +
  7. Optimization outside atomic
  8. +
  9. Atomic instructions
  10. Atomic orderings
  11. Atomics and IR optimization
  12. Atomics and Codegen
  13. @@ -75,51 +75,84 @@ instructions has been clarified in the IR.

    - Load and store + Optimization outside atomic

    The basic 'load' and 'store' allow a variety of - optimizations, but can have unintuitive results in a concurrent environment. - For a frontend writer, the rule is essentially that all memory accessed - with basic loads and stores by multiple threads should be protected by a - lock or other synchronization; otherwise, you are likely to run into - undefined behavior. (Do not use volatile as a substitute for atomics; it - might work on some platforms, but does not provide the necessary guarantees - in general.)

    + optimizations, but can lead to undefined results in a concurrent environment; + see NonAtomic. This section specifically goes + into the one optimizer restriction which applies in concurrent environments, + which gets a bit more of an extended description because any optimization + dealing with stores needs to be aware of it.

    From the optimizer's point of view, the rule is that if there are not any instructions with atomic ordering involved, concurrency does not matter, with one exception: if a variable might be visible to another thread or signal handler, a store cannot be inserted along a path where it - might not execute otherwise. For example, suppose LICM wants to take all the - loads and stores in a loop to and from a particular address and promote them - to registers. LICM is not allowed to insert an unconditional store after - the loop with the computed value unless a store unconditionally executes - within the loop. Note that speculative loads are allowed; a load which + might not execute otherwise. Take the following example:

    + +
    +/* C code, for readability; run through clang -O2 -S -emit-llvm to get
    +   equivalent IR */
    +int x;
    +void f(int* a) {
    +  for (int i = 0; i < 100; i++) {
    +    if (a[i])
    +      x += 1;
    +  }
    +}
    +
    + +

    The following is equivalent in non-concurrent situations:

    + +
    +int x;
    +void f(int* a) {
    +  int xtemp = x;
    +  for (int i = 0; i < 100; i++) {
    +    if (a[i])
    +      xtemp += 1;
    +  }
    +  x = xtemp;
    +}
    +
    + +

    However, LLVM is not allowed to transform the former to the latter: it could + introduce undefined behavior if another thread can access x at the same time. + (This example is particularly of interest because before the concurrency model + was implemented, LLVM would perform this transformation.)

    + +

    Note that speculative loads are allowed; a load which is part of a race returns undef, but does not have undefined behavior.

    -

    For cases where simple loads and stores are not sufficient, LLVM provides - atomic loads and stores with varying levels of guarantees.

    - Other atomic instructions + Atomic instructions

    +

    For cases where simple loads and stores are not sufficient, LLVM provides + various atomic instructions. The exact guarantees provided depend on the + ordering; see Atomic orderings

    + +

    load atomic and store atomic provide the same + basic functionality as non-atomic loads and stores, but provide additional + guarantees in situations where threads and signals are involved.

    +

    cmpxchg and atomicrmw are essentially like an atomic load followed by an atomic store (where the store is conditional for - cmpxchg), but no other memory operation can happen between - the load and store. Note that our cmpxchg does not have quite as many - options for making cmpxchg weaker as the C++0x version.

    + cmpxchg), but no other memory operation can happen on any thread + between the load and store. Note that LLVM's cmpxchg does not provide quite + as many options as the C++0x version.

    A fence provides Acquire and/or Release ordering which is not part of another operation; it is normally used along with Monotonic memory @@ -146,6 +179,54 @@ instructions has been clarified in the IR.

    each level includes all the guarantees of the previous level except for Acquire/Release.

    + +

    + NotAtomic +

    + +
    + +

    NotAtomic is the obvious, a load or store which is not atomic. (This isn't + really a level of atomicity, but is listed here for comparison.) This is + essentially a regular load or store. If code accesses a memory location + from multiple threads at the same time, the resulting loads return + 'undef'.

    + +
    +
    Relevant standard
    +
    This is intended to match shared variables in C/C++, and to be used + in any other context where memory access is necessary, and + a race is impossible. +
    Notes for frontends
    +
    The rule is essentially that all memory accessed with basic loads and + stores by multiple threads should be protected by a lock or other + synchronization; otherwise, you are likely to run into undefined + behavior. If your frontend is for a "safe" language like Java, + use Unordered to load and store any shared variable. Note that NotAtomic + volatile loads and stores are not properly atomic; do not try to use + them as a substitute. (Per the C/C++ standards, volatile does provide + some limited guarantees around asynchronous signals, but atomics are + generally a better solution.) +
    Notes for optimizers
    +
    Introducing loads to shared variables along a codepath where they would + not otherwise exist is allowed; introducing stores to shared variables + is not. See Optimization outside + atomic.
    +
    Notes for code generation
    +
    The one interesting restriction here is that it is not allowed to write + to bytes outside of the bytes relevant to a store. This is mostly + relevant to unaligned stores: it is not allowed in general to convert + an unaligned store into two aligned stores of the same width as the + unaligned store. Backends are also expected to generate an i8 store + as an i8 store, and not an instruction which writes to surrounding + bytes. (If you are writing a backend for an architecture which cannot + satisfy these restrictions and cares about concurrency, please send an + email to llvmdev.)
    +
    + +
    + +

    Unordered @@ -379,24 +460,22 @@ instructions has been clarified in the IR.

    • isSimple(): A load or store which is not volatile or atomic. This is what, for example, memcpyopt would check for operations it might - transform. + transform.
    • isUnordered(): A load or store which is not volatile and at most Unordered. This would be checked, for example, by LICM before hoisting - an operation. + an operation.
    • mayReadFromMemory()/mayWriteToMemory(): Existing predicate, but note that they return true for any operation which is volatile or at least - Monotonic. + Monotonic.
    • Alias analysis: Note that AA will return ModRef for anything Acquire or - Release, and for the address accessed by any Monotonic operation. + Release, and for the address accessed by any Monotonic operation.
    -

    There are essentially two components to supporting atomic operations. The - first is making sure to query isSimple() or isUnordered() instead - of isVolatile() before transforming an operation. The other piece is - making sure that a transform does not end up replacing, for example, an - Unordered operation with a non-atomic operation. Most of the other - necessary checks automatically fall out from existing predicates and - alias analysis queries.

    +

    To support optimizing around atomic operations, make sure you are using + the right predicates; everything should work if that is done. If your + pass should optimize some atomic operations (Unordered operations in + particular), make sure it doesn't replace an atomic load or store with + a non-atomic operation.

    Some examples of how optimizations interact with various kinds of atomic operations: -- cgit v1.2.3