Bug 24190 – Identifier tokenizer is greedy steals new line characters

Status
NEW
Severity
enhancement
Priority
P1
Component
dmd
Product
D
Version
D2
Platform
All
OS
All
Creation time
2023-10-18T00:03:16Z
Last change time
2024-12-13T19:31:11Z
Assigned to
No Owner
Creator
Richard (Rikki) Andrew Cattermole
Moved to GitHub: dmd#20341 →

Comments

Comment #0 by alphaglosined — 2023-10-18T00:03:16Z
Currently, the tokenizer for identifiers is quite greedy. It'll steal the non-ASCII character for new lines when it should probably defer to the outer loop to error. ```d $ cat lsps.d void main () { enum b = 8; mixin ("enum a1 =\u2028b; pragma (msg, a1);"); mixin ("enum a2\u2028= b; pragma (msg, a2);"); mixin ("enum\u2028a3 = b; pragma (msg, a3);"); } $ dmd lsps.d 8 lsps.d-mixin-5(5): Error: char 0x2028 not allowed in identifier lsps.d-mixin-6(6): Error: char 0x2028 not allowed in identifier ``` That character 0x2028 is a valid new line character.
Comment #1 by robert.schadek — 2024-12-13T19:31:11Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/dmd/issues/20341 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB