Multibyte Characters

In multibyte character sets, each character is coded as a sequence of one or more bytes. Unlike wide characters, each of which is represented by a single object of the type wchar_t, individual multibyte characters may be represented by different numbers of bytes. However, the number of bytes that represent a multibyte character , including any necessary state-shift sequences, is never more than the value of the macro MB_CUR_MAX, which is defined in the header stdlib.h.
C provides standard functions to obtain the wide-character code, or wchar_t value, that corresponds to any given multibyte character, and to convert any wide character to its multibyte representation. Some multibyte encoding schemes are stateful; the interpretation of a given multibyte sequence may depend on its position with respect to control characters, called shift sequences, that are used in the multibyte stream or string. In such cases, the conversion of a multibyte character to a wide character, or the conversion of a multibyte string into a wide string, depends on the current shift state at the point where the first multibyte character is read. For the same reason, converting a wide character to a multibyte character, or a wide string to a multibyte string, may entail inserting appropriate shift sequences in the output.

Multibyte character functions

Purpose Functions in stdlib.h Functions in wchar.h
Find the length of a multibyte character mblen( ) mbrlen( )
Find the wide character corresponding to a given multibyte character mbtowc( ) mbrtowc( )
Find the multibyte character corresponding to a given wide character wctomb( ) wcrtomb( )
Convert a multibyte string into a wide string mbstowcs( ) mbsrtowcs( )
Convert a wide string into a multibyte string wcstombs( ) wcsrtombs( )
Convert between byte characters and wide characters   btowc( ), wctob( )
Test for the initial shift state   mbsinit( )

You may also like...