MBRTOC16(3) Library Functions Manual MBRTOC16(3)

mbrtoc16
Restartable multibyte to UTF-16 conversion

Standard C Library (libc, -lc)

#include <uchar.h>

size_t
mbrtoc16(char16_t * restrict pc16, const char * restrict s, size_t n, mbstate_t * restrict ps);

The mbrtoc16 function decodes multibyte characters in the current locale and converts them to UTF-16, keeping state so it can restart after incremental progress.

Each call to mbrtoc16:

  1. examines up to n bytes starting at s,
  2. yields a UTF-16 code unit if available by storing it at *pc16,
  3. saves state at ps, and
  4. returns either the number of bytes consumed if any or a special return value.

Specifically:

If pc16 is a null pointer, nothing is stored, but the effects on ps and the return value are unchanged.

If s is a null pointer, the mbrtoc16 call is equivalent to:

mbrtoc16(NULL, "", 1, ps);

This always returns zero, and has the effect of resetting ps to the initial conversion state, without writing to pc16, even if it is nonnull.

If ps is a null pointer, mbrtoc16 uses an internal mbstate_t object with static storage duration, distinct from all other mbstate_t objects (including those used by mbrtoc8(3), mbrtoc32(3), c8rtomb(3), c16rtomb(3), and c32rtomb(3)), which is initialized at program startup to the initial conversion state.

On well-formed input, the mbrtoc16 function yields either a Unicode scalar value in the Basic Multilingual Plane (BMP), i.e., a 16-bit Unicode code point that is not a surrogate code point, or, over two successive calls, yields the high and low surrogate code points (in that order) of a Unicode scalar value outside the BMP.

The mbrtoc16 function returns:
[null] if mbrtoc16 decoded a null multibyte character.
i
[code unit] where 1in, if mbrtoc16 consumed i bytes of input to decode the next multibyte character, yielding a UTF-16 code unit.
[continuation] if mbrtoc16 consumed no new bytes of input but yielded a UTF-16 code unit that was pending from previous input.
[incomplete] if mbrtoc16 found only an incomplete multibyte sequence after all n bytes of input and any previous input, and saved its state to restart in the next call with ps.
[error] if any encoding error was detected; errno(2) is set to reflect the error.

Print the UTF-16 code units of a multibyte string in hexadecimal text:
char *s = ...;
size_t n = ...;
mbstate_t mbs = {0};    /* initial conversion state */

while (n) {
        char16_t c16;
        size_t len;

        len = mbrtoc16(&c16, s, n, &mbs);
        switch (len) {
        case 0:         /* NUL terminator */
                assert(c16 == 0);
                goto out;
        default:        /* scalar value or high surrogate */
                printf("U+%04"PRIx16"\n", (uint16_t)c16);
                break;
        case (size_t)-3: /* low surrogate */
                printf("continue U+%04"PRIx16"\n", (uint16_t)c16);
                break;
        case (size_t)-2: /* incomplete */
                printf("incomplete\n");
                goto readmore;
        case (size_t)-1: /* error */
                printf("error: %d\n", errno);
                goto out;
        }
        s += len;
        n -= len;
}

[]
The multibyte sequence cannot be decoded in the current locale as a Unicode scalar value.
[]
An error occurred in loading the locale's character conversions.

c16rtomb(3), c32rtomb(3), c8rtomb(3), mbrtoc32(3), mbrtoc8(3), uchar(3)

The Unicode Standard, https://www.unicode.org/versions/Unicode15.0.0/UnicodeStandard-15.0.pdf, The Unicode Consortium, September 2022, Version 15.0 — Core Specification.

P. Hoffman and F. Yergeau, UTF-16, an encoding of ISO 10646, Internet Engineering Task Force, RFC 2781, https://datatracker.ietf.org/doc/html/rfc2781, February 2000.

The mbrtoc16 function conforms to ISO/IEC 9899:2011 (“ISO C11”).

The mbrtoc16 function first appeared in NetBSD 11.0.
August 14, 2024 NetBSD 10.1