| ATOMIC_LOADSTORE(9) | Kernel Developer's Manual | ATOMIC_LOADSTORE(9) | 
atomic_load_relaxed,
  atomic_load_acquire,
  atomic_load_consume,
  atomic_store_relaxed,
  atomic_store_release —
#include <sys/atomic.h>
T
  
  atomic_load_relaxed(const
    volatile T *p);
T
  
  atomic_load_acquire(const
    volatile T *p);
T
  
  atomic_load_consume(const
    volatile T *p);
void
  
  atomic_store_relaxed(volatile
    T *p, T v);
void
  
  atomic_store_release(volatile
    T *p, T v);
*p and the store
  operations are equivalent to *p
  = v. The pointer
  p must be aligned, even on architectures like x86 which
  generally lack strict alignment requirements; see
  SIZE AND ALIGNMENT for details.
Atomic means that the memory operations cannot be fused or torn:
        *p = v;
	x = *p;
    
    
	*p = v;
	x = v;
    
    *p will yield
      v after
      *p =
      v. For atomic memory operations,
      the implementation will not assume that
    For example,
	atomic_store_relaxed(&flag, 1);
	while (atomic_load_relaxed(&flag))
		continue;
    
    may be used to set a flag and then busy-wait until another thread clears it, whereas
	flag = 1;
	while (flag)
		continue;
    
    may be transformed into the infinite loop
	flag = 1;
	while (1)
		continue;
    
    For example, if a 32-bit word w is
        written with
        atomic_store_relaxed(&w,
        0x00010002), then an interrupt, other thread, or
        other CPU reading it with
        atomic_load_relaxed(&w)
        will never witness it partially written, whereas w
        = 0x00010002 might be compiled into a pair of
        separate 16-bit store instructions instead of one single word-sized
        store instruction, in which case other threads may see the intermediate
        state with only one of the halves written.
Atomic operations on any single object occur in a total order shared by all interrupts, threads, and CPUs, which is consistent with the program order in every interrupt, thread, and CPU. A single program without interruption or other threads or CPUs will always observe its own loads and stores in program order, but another program in an interrupt handler, in another thread, or on another CPU may issue loads that return values as if the first program's stores occurred out of program order, and vice versa. Two different threads might each observe a third thread's memory operations in different orders.
The memory ordering constraints make limited guarantees of ordering relative to memory operations on other objects as witnessed by interrupts, other threads, or other CPUs, and have the following meanings:
*p and
      *p =
      v.
    Atomic operations with relaxed ordering are cheap: they are not read/modify/write atomic operations, and they do not involve any kind of inter-CPU ordering barriers.
	int x = *p;
	if (atomic_load_acquire(q)) {
		int y = *r;
		*s = x + y;
		return 1;
	}
    
    as if it were
	if (atomic_load_acquire(q)) {
		int x = *p;
		int y = *r;
		*s = x + y;
		return 1;
	}
    
    but not as if it were
	int x = *p;
	int y = *r;
	*s = x + y;
	if (atomic_load_acquire(q)) {
		return 1;
	}
    
    For example, the implementation is allowed to treat
	struct foo *foo0, *foo1;
	struct foo *f0 = atomic_load_consume(&foo0);
	struct foo *f1 = atomic_load_consume(&foo1);
	int x = f0->x;
	int y = f1->y;
    
    as if it were
	struct foo *foo0, *foo1;
	struct foo *f1 = atomic_load_consume(&foo1);
	struct foo *f0 = atomic_load_consume(&foo0);
	int y = f1->y;
	int x = f0->x;
    
    but loading f0->x is guaranteed to
        happen after loading foo0 even if the CPU had a
        cached value for the address that f0->x
        happened to be at, and likewise for f1->y and
        foo1.
atomic_load_consume() functions like
        atomic_load_acquire() as long as the memory
        operations that must happen after it are limited to addresses that
        depend on the value returned by it, but it is almost always as cheap as
        atomic_load_relaxed(). See
        ACQUIRE OR CONSUME? below
        for more details.
	int x = *p;
	*q = x;
	atomic_store_release(r, 0);
	int y = *s;
	return x + y;
    
    as if it were
	int y = *s;
	int x = *p;
	*q = x;
	atomic_store_release(r, 0);
	return x + y;
    
    but not as if it were
	atomic_store_release(r, 0);
	int x = *p;
	int y = *s;
	*q = x;
	return x + y;
    
    atomic_store_release()
  must be paired with either
  atomic_load_acquire() or
  atomic_load_consume() in order to have an effect
  — it is only when a release operation synchronizes with an acquire or
  consume operation that any ordering guaranteed between memory operations
  before the release operation and memory operations
  after the acquire/consume operation.
For example, to set up an entry in a table and then mark the entry ready, you should:
	tab[i].x = ...;
	tab[i].y = ...;
    
    atomic_store_release() to mark it ready.
    
	atomic_store_release(&tab[i].ready, 1);
    
    atomic_load_acquire() to ascertain whether it is
      ready.
    
	if (atomic_load_acquire(&tab[i].ready) == 0)
		return EWOULDBLOCK;
    
    
	do_stuff(tab[i].x, tab[i].y);
    
    Similarly, if you want to create an object, initialize it, and then publish it to be used by another thread, then you should:
	struct mumble *m = kmem_alloc(sizeof(*m), KM_SLEEP);
	m->x = x;
	m->y = y;
	m->z = m->x + m->y;
    
    atomic_store_release() to publish it.
    
	atomic_store_release(&the_mumble, m);
    
    atomic_load_consume() to get it.
    
	struct mumble *m = atomic_load_consume(&the_mumble);
    
    
	m->y &= m->x;
	do_things(m->x, m->y, m->z);
    
    In both examples, assuming that the value written by
    atomic_store_release() in step 2 is read by
    atomic_load_acquire() or
    atomic_load_consume() in step 3, this
    guarantees that all of the memory operations in step 1 complete
    before any of the memory operations in step 4 — even if they
    happen on different CPUs.
Without both the release operation in
    step 2 and the acquire or consume operation in
    step 3, no ordering is guaranteed between the memory operations in
    steps 1 and 4. In fact, without both release
    and acquire/consume, even the assignment m->z = m->x
    + m->y in step 1 might read values of
    m->x and m->y that
    were written in step 4.
atomic_load_acquire() when subsequent
  memory operations in program order that must happen after the load are on
  objects at addresses that might not depend arithmetically on the
  resulting value. This applies particularly when the choice of whether to
  do the subsequent memory operation depends on a control-flow
  decision based on the resulting value:
	struct gadget {
		int ready, x;
	} the_gadget;
	/* Producer */
	the_gadget.x = 42;
	atomic_store_release(&the_gadget.ready, 1);
	/* Consumer */
	if (atomic_load_acquire(&the_gadget.ready) == 0)
		return EWOULDBLOCK;
	int x = the_gadget.x;
Here the decision of whether to load
    the_gadget.x depends on a control-flow decision
    depending on value loaded from the_gadget.ready, and
    loading the_gadget.x must happen after loading
    the_gadget.ready. Using
    atomic_load_acquire() guarantees that the compiler
    and CPU do not conspire to load the_gadget.x before
    we have ascertained that it is ready.
You may use atomic_load_consume() if all
    subsequent memory operations in program order that must happen after the
    load are performed on objects at addresses computed
    arithmetically from the resulting value, such as loading a pointer to a
    structure object and then dereferencing it:
	struct gizmo {
		int x, y, z;
	};
	struct gizmo null_gizmo;
	struct gizmo *the_gizmo = &null_gizmo;
	/* Producer */
	struct gizmo *g = kmem_alloc(sizeof(*g), KM_SLEEP);
	g->x = 12;
	g->y = 34;
	g->z = 56;
	atomic_store_release(&the_gizmo, g);
	/* Consumer */
	struct gizmo *g = atomic_load_consume(&the_gizmo);
	int y = g->y;
Here the address of
    g->y depends on the value of the pointer loaded
    from the_gizmo. Using
    atomic_load_consume() guarantees that we do not
    witness a stale cache for that address.
In some cases it may be unclear. For example:
int x[2]; bool b; /* Producer */ x[0] = 42; atomic_store_release(&b, 0); /* Consumer 1 */ int y = atomic_load_???(&b) ? x[0] : x[1]; /* Consumer 2 */ int y = x[atomic_load_???(&b) ? 0 : 1]; /* Consumer 3 */ int y = x[atomic_load_???(&b) ^ 1];
Although the three consumers seem to be equivalent, by the letter
    of C11 consumers 1 and 2 require
    atomic_load_acquire() because the value determines
    the address of a subsequent load only via control-flow decisions in the
    ?: operator, whereas consumer 3 can use
    atomic_load_consume(). However, if you're not sure,
    you should err on the side of atomic_load_acquire()
    until C11 implementations have ironed out the kinks in the semantics.
On all CPUs other than DEC Alpha,
    atomic_load_consume() is cheap — it is
    identical to atomic_load_relaxed(). In contrast,
    atomic_load_acquire() usually implies an expensive
    memory barrier.
All NetBSD ports support atomic loads and
    stores on units of data up to 32 bits. Some ports additionally support
    atomic loads and stores on larger quantities, like 64-bit quantities, if
    __HAVE_ATOMIC64_LOADSTORE is defined. The macros are
    not allowed on larger quantities of data than the port supports atomically;
    attempts to use them for such quantities should result in a compile-time
    assertion failure.
For example, as long as you use
    atomic_store_*() to write a 32-bit quantity, you can
    safely use atomic_load_relaxed() to optimistically
    read it outside a lock, but for a 64-bit quantity it must be conditional on
    __HAVE_ATOMIC64_LOADSTORE — otherwise it will
    lead to compile-time errors on platforms without 64-bit atomic loads and
    stores:
	struct foo {
		kmutex_t	f_lock;
		uint32_t	f_refcnt;
		uint64_t	f_ticket;
	};
	if (atomic_load_relaxed(&foo->f_refcnt) == 0)
		return 123;
#ifdef __HAVE_ATOMIC64_LOADSTORE
	if (atomic_load_relaxed(&foo->f_ticket) == ticket)
		return 123;
#endif
	mutex_enter(&foo->f_lock);
	if (foo->f_refcnt == 0 || foo->f_ticket == ticket)
		ret = 123;
	...
#ifdef __HAVE_ATOMIC64_LOADSTORE
	atomic_store_relaxed(&foo->f_ticket, foo->f_ticket + 1);
#else
	foo->f_ticket++;
#endif
	...
	mutex_exit(&foo->f_lock);
atomic_load_explicit() and
  atomic_store_explicit() with the appropriate memory
  order specifiers, and are meant to make future adoption of the C11 atomic API
  easier. Eventually it may be mandatory to use the C11
  _Atomic type qualifier or equivalent for the operands.
READ_ONCE(x) and
  WRITE_ONCE(x,
  v) which are similar to
  atomic_load_consume(&x) and
  atomic_store_relaxed(&x,
  v) , respectively. However, while Linux's
  READ_ONCE() and WRITE_ONCE()
  prevent fusing, they may in some cases be torn — and therefore fail to
  guarantee atomicity — because:
&x to be aligned.sizeof(x) to be at most the
      largest size of available atomic loads and stores on the host
      architecture.atomic_store_release() and
  atomic_load_acquire() or
  atomic_load_consume(): If
  atomic_r/m/w() is an atomic read/modify/write
  operation in atomic_ops(3),
  then
membar_exit(); atomic_r/m/w(obj, ...);
functions like a release operation on obj, and
atomic_r/m/w(obj, ...); membar_enter();
functions like a acquire operation on obj.
WARNING: The combination of
    atomic_load_relaxed() and
    membar_enter(3)
    does not make an acquire operation; only read/modify/write
    atomics may be combined with
    membar_enter(3) this
    way.
On architectures where
    __HAVE_ATOMIC_AS_MEMBAR is defined, all the
    atomic_ops(3) imply
    release and acquire operations, so the
    membar_enter(3) and
    membar_exit(3) are
    redundant.
	unsigned count;
	void
	record_event(void)
	{
		atomic_store_relaxed(&count,
		    1 + atomic_load_relaxed(&count));
	}
	unsigned
	read_event_count(void)
	{
		return atomic_load_relaxed(&count);
	}
Initialization barrier.
	int ready;
	struct data d;
	void
	setup_and_notify(void)
	{
		setup_data(&d.things);
		atomic_store_release(&ready, 1);
	}
	void
	try_if_ready(void)
	{
		if (atomic_load_acquire(&ready))
			do_stuff(d.things);
	}
Publishing a pointer to the current snapshot of data. (Caller must arrange that only one call to take_snapshot happens at any given time; generally this should be done in coordination with pserialize(9) or similar to enable resource reclamation.)
	struct data *current_d;
	void
	take_snapshot(void)
        {
		struct data *d = kmem_alloc(sizeof(*d));
		d->things = ...;
		atomic_store_release(¤t_d, d);
	}
	struct data *
	get_snapshot(void)
	{
		return atomic_load_consume(¤t_d);
	}
&&, ||,
  ?:, and , operators and the
  kill_dependency() macro, carry dependencies for which
  memory_order_consume guarantees ordering, but most or
  all implementations to date simply treat
  memory_order_consume as
  memory_order_acquire and do not take advantage of data
  dependencies to elide costly memory barriers or load-acquire CPU instructions.
Instead, we implement
    atomic_load_consume() as
    atomic_load_relaxed() followed by
    membar_datadep_consumer(3),
    which is equivalent to
    membar_consumer(3) on
    DEC Alpha and
    __insn_barrier(3)
    elsewhere.
| November 25, 2019 | NetBSD 9.0 |