NAME
__builtin_prefetch —
GNU extension to
prefetch memory
SYNOPSIS
void
__builtin_prefetch(
const
void *addr,
...);
DESCRIPTION
The
__builtin_prefetch() function prefetches memory from
addr. The rationale is to minimize cache-miss latency by
trying to move data into a cache before accessing the data. Possible use cases
include frequently called sections of code in which it is known that the data
in a given address is likely to be accessed soon.
In addition to
addr, there are two optional
stdarg(3) arguments,
rw and
locality. The value of the
latter should be a compile-time constant integer between 0 and 3. The higher
the value, the higher the temporal locality in the data. When
locality is 0, it is assumed that there is little or no
temporal locality in the data; after access, it is not necessary to leave the
data in the cache. The default value is 3. The value of
rw is either 0 or 1, corresponding with read and write
prefetch, respectively. The default value of
rw is 0.
Also
rw must be a compile-time constant integer.
The
__builtin_prefetch() function translates into prefetch
instructions only if the architecture has support for these. If there is no
support,
addr is evaluated only if it includes side
effects, although no warnings are issued by
gcc(1).
EXAMPLES
The following optimization appears in the heavily used
cpu_in_cksum() function that calculates checksums for the
inet(4) headers:
while (mlen >= 32) {
__builtin_prefetch(data + 32);
partial += *(uint16_t *)data;
partial += *(uint16_t *)(data + 2);
partial += *(uint16_t *)(data + 4);
...
partial += *(uint16_t *)(data + 28);
partial += *(uint16_t *)(data + 30);
data += 32;
mlen -= 32;
...
SEE ALSO
gcc(1),
attribute(3)
Ulrich Drepper, What
Every Programmer Should Know About Memory,
http://www.akkadia.org/drepper/cpumemory.pdf,
November 21, 2007.
CAVEATS
This is a non-standard, compiler-specific extension.