Bug 19428 – std.string.indexOf wrong result with bad unicode

Status
NEW
Severity
normal
Priority
P3
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2018-11-23T22:39:35Z
Last change time
2024-12-01T16:34:33Z
Assigned to
No Owner
Creator
Vladimir Panteleev
Moved to GitHub: phobos#9766 →

Comments

Comment #0 by dlang-bugzilla — 2018-11-23T22:39:35Z
//////////////////// test.d /////////////////// import std.algorithm.comparison; import std.range; import std.string; void main() { assert(indexOf( only('\uFFFD', '\uFFFD', '\uFFFD'), "\x83\x84\x85", CaseSensitive.yes) == -1); } /////////////////////////////////////////////// Looks like it's replacing bad Unicode with replacement characters under the hood. This becomes worse when something causes the same thing to happen to the haystack, as in this unit test: https://github.com/dlang/phobos/blob/9bfc82130c0e4af4d1dc95bb261570c6e4f6f5d8/std/string.d#L887-L903 Note that this unittest is incorrectly annotated as nothrow/@nogc. We can't use the kind of decoding that substitutes errors with replacement characters, as that will introduce bugs like these.
Comment #1 by robert.schadek — 2024-12-01T16:34:33Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/phobos/issues/9766 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB