NAME
namei,
NDINIT,
NDAT,
namei_simple_kernel,
namei_simple_user,
relookup,
lookup_for_nfsd,
lookup_for_nfsd_index —
pathname
lookup
SYNOPSIS
#include <sys/namei.h>
#include <sys/uio.h>
#include <sys/vnode.h>
NDINIT(
struct
nameidata *ndp,
u_long
op,
u_long flags,
struct pathbuf *pathbuf);
NDAT(
struct
nameidata *ndp,
struct
vnode *dvp);
int
namei(
struct
nameidata *ndp);
int
namei_simple_kernel(
const
char *path,
namei_simple_flags_t
sflags,
struct vnode
**ret);
int
namei_simple_user(
const
char *path,
namei_simple_flags_t
sflags,
struct vnode
**ret);
int
relookup(
struct
vnode *dvp,
struct vnode
**vpp,
struct componentname
*cnp);
int
lookup_for_nfsd(
struct
nameidata *ndp,
struct
vnode *startdir,
int
neverfollow);
int
lookup_for_nfsd_index(
struct
nameidata *ndp);
DESCRIPTION
The
namei interface is used to convert pathnames to file
system vnodes. The name of the interface is actually a contraction of the
words
name and
inode for name-to-inode
conversion, in the days before the
vfs(9) interface was implemented.
All access to the
namei interface must be in process context.
Pathname lookups cannot be done in interrupt context.
In the general form of
namei, a caller must:
- Allocate storage for a
struct nameidata object
nd.
- Initialize
nd with NDINIT() and optionally
NDAT() to specify the arguments to a lookup.
- Call namei()
and handle failure if it returns a nonzero error code.
- Read the resulting vnode out
of nd
.ni_vp
. If requested
with LOCKPARENT
, read the directory vnode out of
nd.ni_dvp
.
- For directory operations, use
the struct componentname object stored at
nd
.ni_cnd
.
The other fields of
struct nameidata should not be
examined or altered directly.
Note that the
nfs(4) code misuses
struct nameidata and currently has an incestuous
relationship with the
namei code. This is gradually being
cleaned up.
The
struct componentname type has the following layout:
struct componentname {
/*
* Arguments to VOP_LOOKUP and directory VOP routines.
*/
uint32_t cn_nameiop; /* namei operation */
uint32_t cn_flags; /* flags to namei */
kauth_cred_t cn_cred; /* credentials */
const char *cn_nameptr; /* pointer to looked up name */
size_t cn_namelen; /* length of looked up comp */
/*
* Side result from VOP_LOOKUP.
*/
size_t cn_consume; /* chars to consume in lookup */
};
This structure contains the information about a single directory component name,
along with certain other information required by vnode operations. See
vnodeops(9) for more
information about these vnode operations.
The members:
- cn_nameiop
- The type of operation in progress; indicates the basic
operating mode of namei. May be one of
LOOKUP
,
CREATE
, DELETE
, or
RENAME
. These modes are described below.
- cn_flags
- Additional flags affecting the operation of namei. These
are described below as well.
- cn_cred
- The credentials to use for the lookup or other operation
the componentname is passed to. This may match the
credentials of the current process or it may not, depending on where the
original operation request came from and how it has been routed.
- cn_nameptr
- The name of this directory component, followed by the rest
of the path being looked up.
- cn_namelen
- The length of the name of this directory component. The
name is not in general null terminated, although the complete string (the
full remaining path) always is.
- cn_consume
- This field starts at zero; it may be set to a larger value
by implementations of
VOP_LOOKUP(9) to
indicate how many more characters beyond cn_namelen are
being consumed. New uses of this feature are discouraged and should be
discussed.
Operating modes
Each lookup happens in one of the following modes, specified by callers of
namei with
NDINIT() and specified
internally by
namei to
VOP_LOOKUP(9):
- Callers of
namei specify the mode for the last component of a
lookup.
- Internally,
namei recursively calls
VOP_LOOKUP(9) in
LOOKUP
mode for each directory component, and then
finally calls
VOP_LOOKUP(9) in the
caller-specified mode for the last component.
Each mode can fail in different ways — for example,
LOOKUP
mode fails with
ENOENT
if no entry exists, but
CREATE
mode succeeds with a
NULL
vnode.
-
-
LOOKUP
- Yield the vnode for an existing entry. Callers specify
LOOKUP
for operations on existing vnodes:
stat(2),
open(2) without
O_CREATE
, etc.
File systems:
- MUST refuse if user lacks
lookup permission for directory.
- SHOULD use
namecache(9) to cache
lookup results.
- [
ENOENT
]
- No entry exists.
-
-
CREATE
- Yield the vnode for an existing entry; or, if there is
none, yield
NULL
and hint that it will soon be
created. Callers specify CREATE
for operations
that may create directory entries:
mkdir(2),
open(2) with
O_CREATE
, etc.
File systems:
- MUST refuse if user lacks
lookup permission for directory.
- MUST refuse if no entry
exists and user lacks write permission for directory.
- MUST refuse if no entry
exists and file system is read-only.
- SHOULD NOT use
namecache(9) to cache
negative lookup results.
- SHOULD save lookup hints
internally in the directory for a subsequent operation to create a
directory entry.
- [
EPERM
]
- The user lacks lookup permission for the
directory.
- [
EPERM
]
- No entry exists and the user lacks write permission for
the directory.
- [
EROFS
]
- No entry exists and the file system is read-only.
-
-
DELETE
- Yield the vnode of an existing entry, and hint that it will
soon be deleted. Callers specify
DELETE
for
operations that delete directory entries:
unlink(2),
rmdir(2), etc.
File systems:
- MUST refuse if user lacks
lookup permission for directory.
- MUST refuse if entry
exists and user lacks write permission for directory.
- MUST refuse if entry
exists and file system is read-only.
- SHOULD NOT use
namecache(9) to cache
lookup results.
- SHOULD save lookup hints
internally in the directory for a subsequent operation to delete a
directory entry.
- [
ENOENT
]
- No entry exists.
- [
EPERM
]
- The user lacks lookup permission for the
directory.
- [
EPERM
]
- An entry exists and the user lacks write permission for
the directory.
- [
EROFS
]
- An entry exists and the file system is read-only.
-
-
RENAME
- Yield the vnode of an existing entry, and hint that it will
soon be overwritten; or, if there is none, yield
NULL
, and hint that it will soon be created.
Callers specify RENAME
for an entry that is about to
be created or overwritten, namely for the target of
rename(2).
File systems:
- MUST refuse if user lacks
lookup permission for directory.
- MUST refuse if user lacks
write permission for directory.
- MUST refuse if file system
is read-only.
- SHOULD NOT use
namecache(9) to cache
lookup results.
- SHOULD save lookup hints
internally in the directory for a subsequent operation to create or
overwrite a directory entry.
- [
EPERM
]
- The user lacks lookup permission for the
directory.
- [
EPERM
]
- The user lacks write permission for the directory.
- [
EROFS
]
- The file system is read-only.
If a caller decides not to perform an operation it hinted at by a destructive
operating mode (
CREATE
,
DELETE
,
or
RENAME
), it SHOULD call
VOP_ABORTOP(9) to release
the hints. If a file system fails to perform such an operation, it SHOULD call
VOP_ABORTOP(9) to release
the hints. However, the current code is inconsistent about this, and every
implementation of
VOP_ABORTOP(9) does
nothing.
Flags
The following flags may be specified by
callers of
namei, and MUST NOT be used by file systems:
-
-
FOLLOW
- Follow symbolic links in the last path component. Used by
operations that do not address symbolic links directly, such as
stat(2). (Does not affect
symbolic links found in the middle of a path.)
-
-
NOFOLLOW
- Do not follow symbolic links in the last path component.
Used by operations that address symbolic links directly, such as
lstat(2).
Note: The value of
NOFOLLOW
is 0. We define the
constant to let callers say either FOLLOW
or
NOFOLLOW
explicitly.
-
-
LOCKLEAF
- On successful lookup, lock the vnode, if any, in
ndp
->ni_vp
. Without this
flag, it would be unlocked.
-
-
LOCKPARENT
- On successful lookup, lock and return the directory vnode
in ndp
->ni_dvp
. Without
this flag, it is not returned at all.
-
-
TRYEMULROOT
- If set, the path is looked up in the emulation root of the
current process first. If that fails, the system root is used.
-
-
EMULROOTSET
- Indicates that the caller has set
ndp
->ni_erootdir
prior to
calling namei. This is only useful or permitted when the
emulation in the current process is partway through being set up.
-
-
NOCHROOT
- Bypass normal
chroot(8) handling for
absolute paths.
-
-
NOCROSSMOUNT
- Do not cross mount points.
-
-
RDONLY
- Enforce read-only behavior.
-
-
CREATEDIR
- Accept slashes after a component name that does not exist.
This only makes sense in
CREATE
mode and when
creating a directory.
-
-
NOCACHE
- Do not cache the lookup result for the last component name.
This is used only with the
RENAME
mode for the
target; the cache entry would be invalidated immediately.
The following flag may be set by a caller of
namei and tested
by a file system in
VOP_LOOKUP(9) or other
subsequent directory operations:
-
-
DOWHITEOUT
- Allow whiteouts to be seen as objects instead of
functioning as “nothing there”.
The following flags are set by namei for calling
VOP_LOOKUP(9):
-
-
ISDOTDOT
- The current pathname component is
“
..
”. May be tested by subsequent
directory operations too.
-
-
ISLASTCN
- The current pathname component is the last component found
in the pathname. Guaranteed to remain set in subsequent directory
operations.
-
-
REQUIREDIR
- The current object to be looked up must be a directory. May
not be used by subsequent directory operations.
-
-
MAKEENTRY
- The lookup result for the current pathname component should
be added to the
namecache(9). May be used
to make additional caching decisions, e.g. to store an mtime for
determining whether our cache for a remote vnode is stale. May not be used
by subsequent directory operatoins.
A file system may set the following flag on return from
VOP_LOOKUP(9) for use by
namei,
namecache(9), and subsequent
directory operations:
-
-
ISWHITEOUT
- The object at the current pathname component is a
whiteout.
The following additional historic flags have been removed from
NetBSD and should be handled as follows if porting
code from elsewhere:
-
-
INRENAME
- Part of a misbegotten and incorrect locking scheme. Any
file-system-level code using this is presumptively incorrect. File systems
should use the
genfs_rename(9)
interface to handle locking in VOP_RENAME().
-
-
INRELOOKUP
- Used at one point for signaling to
puffs(3) to work around a
protocol deficiency that was later rectified.
-
-
ISSYMLINK
- Useless internal state.
-
-
SAVESTART
- Unclean setting affect vnode reference counting. Now
effectively never in effect. Any code referring to this is suspect.
-
-
SAVENAME
- Unclean setting relating to responsibility for freeing
pathname buffers in the days before the pathbuf
structure. Now effectively always in effect; the caller of
namei owns the pathbuf structure and
is always responsible for destroying it.
-
-
HASBUF
- Related to SAVENAME. Any uses can be replaced with
“true”.
FUNCTIONS
-
-
- NDINIT(ndp,
op, flags,
pathbuf)
- Initialise a nameidata structure pointed to by
ndp for use by the namei
interface. The operating mode and flags (as documented above) are
specified by op and flags
respectively. The pathname is passed as a pathbuf structure, which should
be initialized using one of the
pathbuf(9) operations.
Destroying the pathbuf is the responsibility of the caller; this must not
be done until the caller is finished with all of the
namei results and all of the nameidata contents except
for the result vnode.
This routine stores the credentials of the calling thread
(curlwp) in ndp.
NDINIT() sets the credentials using
kauth_cred_get(9).
In the rare case that another set of credentials is required for the namei
operation, ndp->ni_cnd.cn_cred must be set manually
after NDINIT().
-
-
- NDAT(ndp,
dvp)
- This macro is used after NDINIT() to set
the starting directory. This supersedes the current process's current
working directory as the initial point of departure for looking up
relative paths. This mechanism is used by
openat(2) and related
calls.
-
-
- namei(ndp)
- Convert a pathname into a pointer to a vnode. The nameidata
structure pointed to by ndp should be initialized
with the NDINIT() macro, and perhaps also the
NDAT() macro. Direct initialization of members of struct
nameidata is not supported and may (will) break silently
in the future.
The vnode for the pathname is returned in ndp->ni_vp.
The parent directory is returned locked in
ndp->ni_dvp iff
LOCKPARENT
is
specified.
Any or all of the flags documented above as set by the caller can be enabled
by passing them (OR'd together) as the flags
argument of NDINIT(). As discussed above every such call
should explicitly contain either FOLLOW
or
NOFOLLOW
to control the behavior regarding final
symbolic links.
-
-
- namei_simple_kernel(path,
sflags, ret)
- Look up the path path and translate
it to a vnode, returned in ret. The
path argument must be a kernel
(
UIO_SYSSPACE
) pointer. The
sflags argument chooses the precise behavior. It may
be set to one of the following symbols:
NSM_NOFOLLOW_NOEMULROOT
-
NSM_NOFOLLOW_TRYEMULROOT
-
NSM_FOLLOW_NOEMULROOT
-
NSM_FOLLOW_TRYEMULROOT
-
These select (or not) the FOLLOW/NOFOLLOW
and
TRYEMULROOT
flags. Other flags are not available
through this interface, which is nonetheless sufficient for more than half
the namei() usage in the kernel. Note that the encoding
of sflags has deliberately been arranged to be
type-incompatible with anything else. This prevents various possible
accidents while the namei() interface is being
rototilled.
-
-
- namei_simple_user(path,
sflags, ret)
- This function is the same as
namei_simple_kernel() except that the
path argument shall be a user pointer
(
UIO_USERSPACE
) rather than a kernel pointer.
-
-
- relookup(dvp,
vpp, cnp)
- Reacquire a path name component is a directory. This is a
quicker way to lookup a pathname component when the parent directory is
known. The locked parent directory vnode is specified by
dvp and the pathname component by
cnp. The vnode of the pathname is returned in the
address specified by vpp. Note that one may only use
relookup() to repeat a lookup of a final path component
previously done by namei, and one must use the same
componentname structure that call produced. Otherwise
the behavior is undefined and likely adverse.
-
-
- lookup_for_nfsd(ndp,
startdir, neverfollow)
- This is a private entry point into namei
used by the NFS server code. It looks up a path starting from
startdir. If neverfollow is
set, any symbolic link (not just at the end of the path)
will cause an error. Otherwise, it follows symlinks normally. It should
not be used by new code.
-
-
- lookup_for_nfsd_index(ndp)
- This is a (second) private entry point into
namei used by the NFS server code. It looks up a single
path component. It should not be used by new code.
INTERNALS
The
nameidata structure has the following layout:
struct nameidata {
/*
* Arguments to namei.
*/
struct vnode *ni_atdir; /* startup dir, cwd if null */
struct pathbuf *ni_pathbuf; /* pathname container */
char *ni_pnbuf; /* extra pathname buffer ref (XXX) */
/*
* Internal starting state. (But see notes.)
*/
struct vnode *ni_rootdir; /* logical root directory */
struct vnode *ni_erootdir; /* emulation root directory */
/*
* Results from namei.
*/
struct vnode *ni_vp; /* vnode of result */
struct vnode *ni_dvp; /* vnode of intermediate directory */
/*
* Internal current state.
*/
size_t ni_pathlen; /* remaining chars in path */
const char *ni_next; /* next location in pathname */
unsigned int ni_loopcnt; /* count of symlinks encountered */
/*
* Lookup parameters: this structure describes the subset of
* information from the nameidata structure that is passed
* through the VOP interface.
*/
struct componentname ni_cnd;
};
These fields are:
- ni_atdir
- The directory to use for the starting point of relative
paths. If null, the current process's current directory is used. This is
initialized to
NULL
by NDINIT()
and set by NDAT().
- ni_pathbuf
- The abstract path buffer in use, passed as an argument to
NDINIT(). The name pointers that appear elsewhere, such
as in the componentname structure, point into this
buffer. It is owned by the caller and must not be destroyed until all
namei operations are complete. See
pathbuf(9).
- ni_pnbuf
- This is the name pointer used during
namei. It points into ni_pathbuf.
It is not initialized until entry into namei.
- ni_rootdir
- The root directory to use as the starting point for
absolute paths. This is retrieved from the current process's current root
directory when namei starts up. It is not initialized by
NDINIT().
- ni_erootdir
- The root directory to use as the emulation root, for
processes running in emulation. This is retrieved from the current
process's emulation root directory when namei starts up
and not initialized by NDINIT(). As described elsewhere,
it may be set by the caller if the
EMULROOTSET
flag is used, but this should only be done when the current process's
emulation root directory is not yet initialized. (And ideally in the
future things would be tidied so that this is not necessary.)
- ni_vp
-
- ni_dvp
- Returned vnodes, as described above. These only contain
valid values if namei returns successfully.
- ni_pathlen
- The length of the full current remaining path string in
ni_pnbuf. This is not initialized by
NDINIT() and is used only internally.
- ni_next
- The remaining part of the path, after the current component
found in the componentname structure. This is not
initialized by NDINIT() and is used only
internally.
- ni_loopcnt
- The number of symbolic links encountered (and traversed) so
far. If this exceeds a limit, namei fails with
ELOOP
. This is not initialized by
NDINIT() and is used only internally.
- ni_cnd
- The componentname structure holding the
current directory component, and also the mode, flags, and credentials.
The mode, flags, and credentials are initialized by
NDINIT(); the rest is not initialized until
namei runs.
There is also a
namei_state structure that is hidden within
vfs_lookup.c. This contains the following additional state:
- docache
- A flag indicating whether to cache the last pathname
component.
- rdonly
- The read-only state, initialized from the
RDONLY
flag.
- slashes
- The number of trailing slashes found after the current
pathname component.
- attempt_retry
- Set on some error cases (and not others) to indicate that a
failure in the emulation root should be followed by a retry in the real
system root.
The state in
namei_state is genuinely private to
namei. Note that much of the state in
nameidata should also be private, but is currently not
because it is misused in some fashion by outside code, usually
nfs(4).
The control flow within the
namei portions of
vfs_lookup.c is as follows.
-
-
- namei()
- does a complete path lookup by calling
namei_init(), namei_tryemulroot(), and
namei_cleanup().
-
-
- namei_init()
- sets up the basic internal state and makes some
(precondition-type) assertions.
-
-
- namei_cleanup()
- makes some postcondition-type assertions; it currently does
nothing besides this.
-
-
- namei_tryemulroot()
- handles
TRYEMULROOT
by calling
namei_oneroot() once or twice as needed, and attends to
making sure the original pathname is preserved for the second try.
-
-
- namei_oneroot()
- does a complete path search from a single root directory.
It begins with namei_start(), then calls
lookup_once() (and if necessary,
namei_follow()) repeatedly until done. It also handles
returning the result vnode(s) in the requested state.
-
-
- namei_start()
- sets up the initial state and locking; it calls
namei_getstartdir().
-
-
- namei_getstartdir()
- initializes the root directory state (both
ni_rootdir and ni_erootdir)
and picks the starting directory, consuming the leading slashes of an
absolute path and handling the magic “/../” string for
bypassing the emulation root. A different version
namei_getstartdir_for_nfsd() is used for lookups coming
from nfsd(8) as those are
required to have different semantics.
-
-
- lookup_once()
- calls VOP_LOOKUP() for one path
component, also handling any needed crossing of mount points (either up or
down) and coping with locking requirements.
-
-
- lookup_parsepath()
- is called prior to each lookup_once()
call to examine the pathname and find where the next component
starts.
-
-
- namei_follow()
- reads the contents of a symbolic link and updates both the
path buffer and the search directory accordingly.
As a final note be advised that the magic return value associated with
CREATE
mode is different for
namei
than it is for
VOP_LOOKUP(). The latter “fails”
with
EJUSTRETURN
.
namei translates
this into succeeding and returning a null vnode.
CODE REFERENCES
The name lookup subsystem is implemented within the file
sys/kern/vfs_lookup.c.
SEE ALSO
intro(9),
namecache(9),
vfs(9),
vnode(9),
vnodeops(9)
BUGS
There should be no such thing as operating modes. Only
LOOKUP
is actually needed. The behavior where removing
an object looks it up within
namei and then calls into the
file system (which must look it up again internally or cache state from
VOP_LOOKUP()) is particularly contorted.
Most of the flags are equally bogus.
Most of the contents of the
nameidata structure should be
private and hidden within
namei; currently it cannot be
because of abuse elsewhere.
The
EMULROOTSET
flag is messy.
There is no good way to support file systems that want to use a more elaborate
pathname schema than the customary slash-delimited components.