Bug 9173 – std.string.wrap should conform to Unicode line-breaking algorithm

Status
NEW
Severity
enhancement
Priority
P4
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2012-12-17T13:24:08Z
Last change time
2024-12-01T16:15:57Z
Assigned to
No Owner
Creator
hsteoh
Moved to GitHub: phobos#9944 →

Comments

Comment #0 by hsteoh — 2012-12-17T13:24:08Z
Currently, there are some issues with std.string.wrap: 1) It uses std.uni.isWhite as criterion for line-breaking opportunities, but isWhite includes such things as non-breaking space, which should *not* be wrapped. It also includes things like vowel mark separators, which shouldn't be wrapped, either. 2) It does not take zero-width characters and combining diacritics into account when counting columns, which means that it will sometimes wrap the line at the wrong place. 3) It does not wrap CJK text or Thai text correctly. For reference, here's the Unicode technical reference that describes proper line-breaking of Unicode text: http://www.unicode.org/reports/tr14/ (After having read through TR14, I was in awe at how insanely complicated line-wrapping in Unicode is. So I'd propose that, if nothing else, we should fix items (1) and (2) above, which should be within the reach of a relatively simple-to-implement European-centric line wrapping algorithm. People who want CJK wrapping or other complicated stuff probably want to be writing their own algo anyway.)
Comment #1 by robert.schadek — 2024-12-01T16:15:57Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/phobos/issues/9944 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB