NAME
uvm —
virtual memory system external
interface
SYNOPSIS
#include <sys/param.h>
#include <uvm/uvm.h>
DESCRIPTION
The UVM virtual memory system manages access to the computer's memory resources.
User processes and the kernel access these resources through UVM's external
interface. UVM's external interface includes functions that:
- initialize UVM
sub-systems
- manage virtual address
spaces
- resolve page faults
- memory map files and
devices
- perform uio-based I/O to
virtual memory
- allocate and free kernel
virtual memory
- allocate and free physical
memory
In addition to exporting these services, UVM has two kernel-level processes:
pagedaemon and swapper. The pagedaemon process sleeps until physical memory
becomes scarce. When that happens, pagedaemon is awoken. It scans physical
memory, paging out and freeing memory that has not been recently used. The
swapper process swaps in runnable processes that are currently swapped out, if
there is room.
There are also several miscellaneous functions.
INITIALIZATION
- void
- uvm_init(void);
- void
- uvm_init_limits(struct
lwp *l);
- void
- uvm_setpagesize(void);
- void
- uvm_swap_init(void);
uvm_init() sets up the UVM system at system boot time, after
the console has been setup. It initializes global state, the page, map, kernel
virtual memory state, machine-dependent physical map, kernel memory allocator,
pager and anonymous memory sub-systems, and then enables paging of kernel
objects.
uvm_init_limits() initializes process limits for the named
process. This is for use by the system startup for process zero, before any
other processes are created.
uvm_md_init() does early boot initialization. This currently
includes:
uvm_setpagesize() which initializes the uvmexp
members pagesize (if not already done by machine-dependent code), pageshift
and pagemask.
uvm_physseg_init() which initialises the
uvm_hotplug(9) subsystem.
It should be called by machine-dependent code early in the
pmap_init() call (see
pmap(9)).
uvm_swap_init() initializes the swap sub-system.
VIRTUAL ADDRESS SPACE
MANAGEMENT
See
uvm_map(9).
PAGE FAULT HANDLING
- int
- uvm_fault(struct vm_map
*orig_map, vaddr_t vaddr,
vm_prot_t access_type);
uvm_fault() is the main entry point for faults. It takes
orig_map as the map the fault originated in, a
vaddr offset into the map the fault occurred, and
access_type describing the type of access requested.
uvm_fault() returns a standard UVM return value.
MEMORY MAPPING FILES AND
DEVICES
See
ubc(9).
VIRTUAL MEMORY I/O
- int
- uvm_io(struct vm_map
*map, struct uio *uio);
uvm_io() performs the I/O described in
uio on the memory described in
map.
ALLOCATION OF KERNEL MEMORY
See
uvm_km(9).
ALLOCATION OF PHYSICAL
MEMORY
- struct vm_page *
- uvm_pagealloc(struct
uvm_object *uobj, voff_t off,
struct vm_anon *anon, int
flags);
- void
- uvm_pagerealloc(struct
vm_page *pg, struct uvm_object *newobj,
voff_t newoff);
- void
- uvm_pagefree(struct
vm_page *pg);
- int
- uvm_pglistalloc(psize_t
size, paddr_t low, paddr_t
high, paddr_t alignment,
paddr_t boundary, struct pglist
*rlist, int nsegs, int
waitok);
- void
- uvm_pglistfree(struct
pglist *list);
- void
- uvm_page_physload(paddr_t
start, paddr_t end, paddr_t
avail_start, paddr_t avail_end,
int free_list);
uvm_pagealloc() allocates a page of memory at virtual address
off in either the object
uobj or
the anonymous memory
anon, which must be locked by the
caller. Only one of
uobj and
anon
can be non
NULL
. Returns
NULL
when no page can be found. The flags can be any of
#define UVM_PGA_USERESERVE 0x0001 /* ok to use reserve pages */
#define UVM_PGA_ZERO 0x0002 /* returned page must be zero'd */
UVM_PGA_USERESERVE
means to allocate a page even if that
will result in the number of free pages being lower than
uvmexp.reserve_pagedaemon
(if the current thread is
the pagedaemon) or
uvmexp.reserve_kernel
(if the
current thread is not the pagedaemon).
UVM_PGA_ZERO
causes the returned page to be filled with zeroes, either by allocating it
from a pool of pre-zeroed pages or by zeroing it in-line as necessary.
uvm_pagerealloc() reallocates page
pg to
a new object
newobj, at a new offset
newoff.
uvm_pagefree() frees the physical page
pg. If the content of the page is known to be
zero-filled, caller should set
PG_ZERO
in pg->flags
so that the page allocator will use the page to serve future
UVM_PGA_ZERO
requests efficiently.
uvm_pglistalloc() allocates a list of pages for size
size byte under various constraints.
low and
high describe the lowest
and highest addresses acceptable for the list. If
alignment is non-zero, it describes the required
alignment of the list, in power-of-two notation. If
boundary is non-zero, no segment of the list may cross
this power-of-two boundary, relative to zero.
nsegs is
the maximum number of physically contiguous segments. If
waitok is non-zero, the function may sleep until enough
memory is available. (It also may give up in some situations, so a non-zero
waitok does not imply that
uvm_pglistalloc() cannot return an error.) The allocated
memory is returned in the
rlist list; the caller has to
provide storage only, the list is initialized by
uvm_pglistalloc().
uvm_pglistfree() frees the list of pages pointed to by
list. If the content of the page is known to be
zero-filled, caller should set
PG_ZERO
in pg->flags
so that the page allocator will use the page to serve future
UVM_PGA_ZERO
requests efficiently.
uvm_page_physload() loads physical memory segments into VM
space on the specified
free_list. It must be called at
system boot time to set up physical memory management pages. The arguments
describe the
start and
end of the
physical addresses of the segment, and the available start and end addresses
of pages not already in use. If a system has memory banks of different speeds
the slower memory should be given a higher
free_list
value.
PROCESSES
- void
- uvm_pageout(void);
- void
- uvm_scheduler(void);
uvm_pageout() is the main loop for the page daemon.
uvm_scheduler() is the process zero main loop, which is to be
called after the system has finished starting other processes. It handles the
swapping in of runnable, swapped out processes in priority order.
PAGE LOAN
- int
- uvm_loan(struct vm_map
*map, vaddr_t start, vsize_t
len, void *v, int
flags);
- void
- uvm_unloan(void
*v, int npages, int
flags);
uvm_loan() loans pages in a map out to anons or to the kernel.
map should be unlocked,
start and
len should be multiples of
PAGE_SIZE
. Argument
flags should
be one of
#define UVM_LOAN_TOANON 0x01 /* loan to anons */
#define UVM_LOAN_TOPAGE 0x02 /* loan to kernel */
v should be pointer to array of pointers to
struct anon
or
struct vm_page
,
as appropriate. The caller has to allocate memory for the array and ensure
it's big enough to hold
len / PAGE_SIZE pointers.
Returns 0 for success, or appropriate error number otherwise. Note that wired
pages can't be loaned out and
uvm_loan() will fail in that
case.
uvm_unloan() kills loans on pages or anons. The
v must point to the array of pointers initialized by
previous call to
uvm_loan().
npages
should match number of pages allocated for loan, this also matches number of
items in the array. Argument
flags should be one of
#define UVM_LOAN_TOANON 0x01 /* loan to anons */
#define UVM_LOAN_TOPAGE 0x02 /* loan to kernel */
and should match what was used for previous call to
uvm_loan().
MISCELLANEOUS FUNCTIONS
- struct uvm_object *
- uao_create(vsize_t
size, int flags);
- void
- uao_detach(struct
uvm_object *uobj);
- void
- uao_reference(struct
uvm_object *uobj);
- bool
- uvm_chgkprot(void
*addr, size_t len, int
rw);
- void
- uvm_kernacc(void
*addr, size_t len, int
rw);
- int
- uvm_vslock(struct
vmspace *vs, void *addr,
size_t len, vm_prot_t
prot);
- void
- uvm_vsunlock(struct
vmspace *vs, void *addr,
size_t len);
- void
- uvm_meter(void);
- void
- uvm_proc_fork(struct
proc *p1, struct proc *p2,
bool shared);
- int
- uvm_grow(struct proc
*p, vaddr_t sp);
- void
- uvn_findpages(struct
uvm_object *uobj, voff_t offset,
int *npagesp, struct vm_page
**pps, int flags);
- void
- uvm_vnp_setsize(struct
vnode *vp, voff_t newsize);
The
uao_create(),
uao_detach(), and
uao_reference() functions operate on anonymous memory
objects, such as those used to support System V shared memory.
uao_create() returns an object of size
size with flags:
#define UAO_FLAG_KERNOBJ 0x1 /* create kernel object */
#define UAO_FLAG_KERNSWAP 0x2 /* enable kernel swap */
which can only be used once each at system boot time.
uao_reference() creates an additional reference to the named
anonymous memory object.
uao_detach() removes a reference
from the named anonymous memory object, destroying it if removing the last
reference.
uvm_chgkprot() changes the protection of kernel memory from
addr to
addr + len to the value of
rw. This is primarily useful for debuggers, for setting
breakpoints. This function is only available with options
KGDB
.
uvm_kernacc() checks the access at address
addr to
addr + len for
rw access in the kernel address space.
uvm_vslock() and
uvm_vsunlock() control the
wiring and unwiring of pages for process
p from
addr to
addr + len. These
functions are normally used to wire memory for I/O.
uvm_meter() calculates the load average.
uvm_proc_fork() forks a virtual address space for process'
(old)
p1 and (new)
p2. If the
shared argument is non zero, p1 shares its address space
with p2, otherwise a new address space is created. This function currently has
no return value, and thus cannot fail. In the future, this function will be
changed to allow it to fail in low memory conditions.
uvm_grow() increases the stack segment of process
p to include
sp.
uvn_findpages() looks up or creates pages in
uobj at offset
offset, marks them
busy and returns them in the
pps array. Currently
uobj must be a vnode object. The number of pages
requested is pointed to by
npagesp, and this value is
updated with the actual number of pages returned. The flags can be any bitwise
inclusive-or of:
UFP_ALL
- Zero pseudo-flag meaning return all pages.
UFP_NOWAIT
- Don't sleep — yield
NULL
for
busy pages or for uncached pages for which allocation would sleep.
UFP_NOALLOC
- Don't allocate — yield
NULL
for uncached pages.
UFP_NOCACHE
- Don't use cached pages — yield
NULL
instead.
UFP_NORDONLY
- Don't yield read-only pages — yield
NULL
for pages marked
PG_READONLY
.
UFP_DIRTYONLY
- Don't yield clean pages — stop early at the first
clean one. As a side effect, mark yielded dirty pages clean. Caller must
write them to permanent storage before unbusying.
UFP_BACKWARD
- Traverse pages in reverse order. If
uvn_findpages() returns early, it will have filled
*
npagesp entries at the end
of pps rather than the beginning.
uvm_vnp_setsize() sets the size of vnode
vp to
newsize. Caller must hold a
reference to the vnode. If the vnode shrinks, pages no longer used are
discarded.
MISCELLANEOUS MACROS
- paddr_t
- atop(paddr_t
pa);
- paddr_t
- ptoa(paddr_t
pn);
- paddr_t
- round_page(address);
- paddr_t
- trunc_page(address);
The
atop() macro converts a physical address
pa into a page number. The
ptoa()
macro does the opposite by converting a page number
pn
into a physical address.
round_page() and
trunc_page() macros return
a page address boundary from rounding
address up and
down, respectively, to the nearest page boundary. These macros work for either
addresses or byte counts.
SYSCTL
UVM provides support for the
CTL_VM
domain of the
sysctl(3) hierarchy. It handles
the
VM_LOADAVG
,
VM_METER
,
VM_UVMEXP
, and
VM_UVMEXP2
nodes, which return the current load averages, calculates current VM totals,
returns the uvmexp structure, and a kernel version independent view of the
uvmexp structure, respectively. It also exports a number of tunables that
control how much VM space is allowed to be consumed by various tasks. The load
averages are typically accessed from userland using the
getloadavg(3) function. The
uvmexp structure has all global state of the UVM system, and has the following
members:
/* vm_page constants */
int pagesize; /* size of a page (PAGE_SIZE): must be power of 2 */
int pagemask; /* page mask */
int pageshift; /* page shift */
/* vm_page counters */
int npages; /* number of pages we manage */
int free; /* number of free pages */
int paging; /* number of pages in the process of being paged out */
int wired; /* number of wired pages */
int reserve_pagedaemon; /* number of pages reserved for pagedaemon */
int reserve_kernel; /* number of pages reserved for kernel */
/* pageout params */
int freemin; /* min number of free pages */
int freetarg; /* target number of free pages */
int inactarg; /* target number of inactive pages */
int wiredmax; /* max number of wired pages */
/* swap */
int nswapdev; /* number of configured swap devices in system */
int swpages; /* number of PAGE_SIZE'ed swap pages */
int swpginuse; /* number of swap pages in use */
int nswget; /* number of times fault calls uvm_swap_get() */
int nanon; /* number total of anon's in system */
int nfreeanon; /* number of free anon's */
/* stat counters */
int faults; /* page fault count */
int traps; /* trap count */
int intrs; /* interrupt count */
int swtch; /* context switch count */
int softs; /* software interrupt count */
int syscalls; /* system calls */
int pageins; /* pagein operation count */
/* pageouts are in pdpageouts below */
int pgswapin; /* pages swapped in */
int pgswapout; /* pages swapped out */
int forks; /* forks */
int forks_ppwait; /* forks where parent waits */
int forks_sharevm; /* forks where vmspace is shared */
/* fault subcounters */
int fltnoram; /* number of times fault was out of ram */
int fltnoanon; /* number of times fault was out of anons */
int fltpgwait; /* number of times fault had to wait on a page */
int fltpgrele; /* number of times fault found a released page */
int fltrelck; /* number of times fault relock called */
int fltrelckok; /* number of times fault relock is a success */
int fltanget; /* number of times fault gets anon page */
int fltanretry; /* number of times fault retrys an anon get */
int fltamcopy; /* number of times fault clears "needs copy" */
int fltnamap; /* number of times fault maps a neighbor anon page */
int fltnomap; /* number of times fault maps a neighbor obj page */
int fltlget; /* number of times fault does a locked pgo_get */
int fltget; /* number of times fault does an unlocked get */
int flt_anon; /* number of times fault anon (case 1a) */
int flt_acow; /* number of times fault anon cow (case 1b) */
int flt_obj; /* number of times fault is on object page (2a) */
int flt_prcopy; /* number of times fault promotes with copy (2b) */
int flt_przero; /* number of times fault promotes with zerofill (2b) */
/* daemon counters */
int pdwoke; /* number of times daemon woke up */
int pdrevs; /* number of times daemon rev'd clock hand */
int pdfreed; /* number of pages daemon freed since boot */
int pdscans; /* number of pages daemon scanned since boot */
int pdanscan; /* number of anonymous pages scanned by daemon */
int pdobscan; /* number of object pages scanned by daemon */
int pdreact; /* number of pages daemon reactivated since boot */
int pdbusy; /* number of times daemon found a busy page */
int pdpageouts; /* number of times daemon started a pageout */
int pdpending; /* number of times daemon got a pending pageout */
int pddeact; /* number of pages daemon deactivates */
NOTES
uvm_chgkprot() is only available if the kernel has been
compiled with options
KGDB
.
All structure and types whose names begin with “vm_” will be renamed
to “uvm_”.
SEE ALSO
swapctl(2),
getloadavg(3),
kvm(3),
sysctl(3),
ddb(4),
options(4),
memoryallocators(9),
pmap(9),
ubc(9),
uvm_km(9),
uvm_map(9)
Charles D. Cranor and
Gurudatta M. Parulkar, The UVM
Virtual Memory System, Proceedings of the USENIX Annual
Technical Conference, USENIX Association,
http://www.usenix.org/event/usenix99/full_papers/cranor/cranor.pdf,
117-130, June 6-11,
1999.
HISTORY
UVM is a new VM system developed at Washington University in St. Louis
(Missouri). UVM's roots lie partly in the Mach-based
4.4BSD VM system, the
FreeBSD
VM system, and the SunOS 4 VM system. UVM's basic structure is based on the
4.4BSD VM system. UVM's new anonymous memory system is
based on the anonymous memory system found in the SunOS 4 VM (as described in
papers published by Sun Microsystems, Inc.). UVM also includes a number of
features new to
BSD including page loanout, map entry
passing, simplified copy-on-write, and clustered anonymous memory pageout. UVM
is also further documented in an August 1998 dissertation by Charles D.
Cranor.
UVM appeared in
NetBSD 1.4.
AUTHORS
Charles D. Cranor
<
chuck@ccrc.wustl.edu>
designed and implemented UVM.
Matthew Green
<
mrg@eterna.com.au>
wrote the swap-space management code and handled the logistical issues
involved with merging UVM into the
NetBSD source tree.
Chuck Silvers
<
chuq@chuq.com>
implemented the aobj pager, thus allowing UVM to support System V shared
memory and process swapping.