Bug 22473 – dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

Status
RESOLVED
Resolution
DUPLICATE
Severity
enhancement
Priority
P1
Component
druntime
Product
D
Version
D2
Platform
All
OS
All
Creation time
2021-11-04T02:22:59Z
Last change time
2021-11-07T10:10:56Z
Assigned to
No Owner
Creator
Walter Bright
See also
https://issues.dlang.org/show_bug.cgi?id=20134

Comments

Comment #0 by bugzilla — 2021-11-04T02:22:59Z
A simple foreach loop: void test(char[] a) { foreach (char c; a) { } } will throw a UtfException if `a` is not a valid UTF string. Instead, it should replace the invalid sequence with replacementDchar. The foreach code is compiled to call druntime/src/rt/aApply/_aApplycd1(), which calls druntime/src/core/internal/utf/decode() which throws the exceptions. replacementDchar is defined in std.utf as `\uFFFD` The reason to effect this change is it is the same problems autodecoding has. It can't be turned off, it throws, and it may allocate with the gc. Oh, and it's slow.
Comment #1 by acehreli — 2021-11-04T05:47:07Z
I think the foreach variable type must be dchar. The following program reproduces the issue: void test(char[] a) { foreach (dchar c; a) { } } void main() { char[] a = "\xf0\x90\x28\xbc".dup; test(a); }
Comment #2 by destructionator — 2021-11-04T11:53:33Z
I believe making it either wchar or dchar will cause it to try to convert, and throw on failure.
Comment #3 by bugzilla — 2021-11-05T21:42:56Z
Ali, you're right. I can't believe I made that mistake. I was even thinking about not making that mistake when I wrote the example.
Comment #4 by dlang-bugzilla — 2021-11-06T15:54:37Z
I think this is the same as issue 20134 (except foreach loops instead of Phobos autodecoding, but they should probably behave in the same way).
Comment #5 by dlang-bugzilla — 2021-11-07T07:25:07Z
Actually this is a duplicate of 14519, which describes foreach loops specifically. I'll update the component of that from dmd to druntime (as it is correctly indicated here). *** This issue has been marked as a duplicate of issue 14519 ***
Comment #6 by johan_forsberg_86 — 2021-11-07T10:10:56Z