ImportC C23: char8_t and u8'c' character constants

Part of #23223 (M1 — standard-independent).

## Summary

C23 lets `u8` prefix a *character* constant (`u8'c'`), not just a string literal, giving it the new `char8_t` type. This is purely additive: `u8'...'` is a syntax error in C11, so accepting it cannot change the meaning of any valid C11 program. ImportC should lex `u8'c'` and map `char8_t` to D's `char` (a UTF-8 byte). No CLI flag or standard-selection is needed.

## Spec

[N3220][n3220] **6.4.4.5** Character constants (adds the `u8` encoding-prefix and the `char8_t` constant type); the `char8_t` type itself is normatively "an unsigned integer type ... the same type as `unsigned char`" per **7.30** `<uchar.h>`, within the integer/character type model of **6.2.5** Types.

### Grammar (N3220 6.4.4.5; `opt` = optional)

    character-constant:
                  encoding-prefix(opt) ' c-char-sequence '

    encoding-prefix: one of
                  u8     u     U     L

    c-char-sequence:
                  c-char
                  c-char-sequence c-char

    c-char:
                  any member of the source character set except
                              the single-quote ', backslash \, or new-line character
                  escape-sequence

    escape-sequence:
                  simple-escape-sequence
                  octal-escape-sequence
                  hexadecimal-escape-sequence
                  universal-character-name

**Constraints** (N3220 6.4.4.5 p9-10):
- The value of an octal/hexadecimal escape sequence shall be in range of the corresponding type; for the `u8` prefix the corresponding type is `char8_t`.
- "A UTF-8 ... character constant shall not contain more than one character. The value shall be representable with a single UTF-8 ... code unit, respectively." (footnote 73: "u8'ab' violates this constraint.")

**Semantics relevant to the parser** (N3220 6.4.4.5 p12): "A UTF-8 character constant has type `char8_t`. If the UTF-8 character constant is not produced through a hexadecimal or octal escape sequence, the value ... is equal to its ISO/IEC 10646 code point value, provided that the code point value can be encoded as a single UTF-8 code unit."

## Approach

Two independent, additive pieces:

1. `char8_t` type: provided as a typedef in `druntime/src/importc.h` (mapping to D `char`, i.e. a UTF-8 byte), matching how that header already supplies other implementation typedefs (`druntime/src/importc.h:74`). No new keyword and no frontend change is needed for the type name.

2. `u8'c'` lexing: extend the existing C prefix-scanning block in `compiler/src/dmd/lexer.d`. The `case 'u': case 'U': case 'L':` block already handles wide char constants (`compiler/src/dmd/lexer.d:497-499` calling `clexerCharConstant`) and already special-cases the `u8"..."` *string* literal (`compiler/src/dmd/lexer.d:511-516`). Add a sibling branch: when `p[0]=='u' && p[1]=='8' && p[2]=='\''`, advance past `u8` and lex a single-code-unit UTF-8 character constant. Because `char8_t` is `unsigned char`, a `u8'c'` constant fits in a `char` byte; it can be emitted as `TOK.charLiteral` (which `cparse.d` maps to `int`/`tint32`, matching the integer-promoted use of a `char8_t` value) so no new token kind is required. Enforce the 6.4.4.5 p10 constraint (single UTF-8 code unit, value range of `char8_t`) and diagnose `u8'ab'` per footnote 73.

`clexerCharConstant` (`compiler/src/dmd/lexer.d:2293`) is the natural home for the new prefix handling; note its doc comment currently cites `C11 6.4.4.4` — update it to `C23 6.4.4.5` since C23 renumbered this clause and added the `u8` prefix.

## Change sites

- `compiler/src/dmd/lexer.d:511-516` — alongside the existing `p[1]=='8' && p[2]=='"'` (u8 string) check, add a `p[1]=='8' && p[2]=='\''` (u8 char) check that advances past `u8` and lexes a UTF-8 character constant. // C23 6.4.4.5
- `compiler/src/dmd/lexer.d:2293` (`clexerCharConstant`) — add a `'8'`/u8 case that decodes exactly one UTF-8 code unit, enforces the single-code-unit constraint, and updates the doc comment from `C11 6.4.4.4` to `C23 6.4.4.5`. // C23 6.4.4.5
- `druntime/src/importc.h:74` — add `typedef unsigned char char8_t;` near the other implementation typedefs. // C23 6.2.5, 7.30

## Out of scope / deferred

- `u8"..."` UTF-8 *string* literals: already supported (`compiler/src/dmd/lexer.d:511-516`).
- `char16_t` / `char32_t` and `u`/`U` char constants: already handled by `clexerCharConstant`; no change here.
- Any `<uchar.h>` library surface (`mbrtoc8`, `char8_t` macro versioning): library, not the ImportC parser.
- Strict re-typing of `u8'c'` to a distinct `char8_t`-flavored token: deferred; integer-promoted `int`/`charLiteral` value suffices. Note here if a concrete need arises.

## Acceptance criteria

- [ ] `compilable` test exercising `u8'a'`, a `char8_t` variable initialized from it, and a `u8'\xNN'` escape, each with its `// C23 6.4.4.5` comment.
- [ ] `fail_compilation` test for the error path: `u8'ab'` (more than one code unit, 6.4.4.5 p10 / footnote 73), with `// C23 6.4.4.5`.
- [ ] Changelog entry (`changelog/dmd.importc-c23-char8.dd`).
- [ ] No regression for existing tests (notably the `u8"..."` string-literal and `u`/`U`/`L` char-constant paths).

[n3220]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ImportC C23: char8_t and u8'c' character constants #23230

Summary

Spec

Grammar (N3220 6.4.4.5; `opt` = optional)

Approach

Change sites

Out of scope / deferred

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

ImportC C23: char8_t and u8'c' character constants #23230

Description

Summary

Spec

Grammar (N3220 6.4.4.5; opt = optional)

Approach

Change sites

Out of scope / deferred

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Grammar (N3220 6.4.4.5; `opt` = optional)