Bug 14035 – string concatenation accepts ints in templates

Status
REOPENED
Severity
major
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
All
OS
All
Creation time
2015-01-23T21:16:16Z
Last change time
2024-12-13T18:39:19Z
Keywords
accepts-invalid, CTFE
Assigned to
Kenji Hara
Creator
Ketmar Dark
Moved to GitHub: dmd#18935 →

Comments

Comment #0 by ketmar — 2015-01-23T21:16:16Z
the following code happily compiles: template alice (usize ln=__LINE__) { enum alice = "{ int t_"~ln~" = 42; }"; } pragma(msg, alice!()); it uses `ln` value as character code, which seems to be wrong, as trying to do the same in ordinary function fails with the following message: Error: incompatible types for (("{ int t_") ~ (ln)): 'string' and 'uint'
Comment #1 by yebblies — 2015-03-22T04:28:46Z
'ln' is a template parameter, so its value is known when semantic is run on the enum's initializer. Because it is a constant value known to fit in a char type, it is implicitly converted to char and the concatenation succeeds. You can see the expected error if you move the pragma(msg) line below line 255. The implicit conversion from int to char is supported for cases like this: "string " ~ ('a' + 1) which would otherwise fail.
Comment #2 by ketmar — 2015-03-22T04:36:42Z
the same logic should allow this: string alice() (usize ln=__LINE__) { return "{ int t_"~ln~" = 42; }"; } pragma(msg, alice!()); yet somehow this is not working. but why? `ln` is known in compile time too! it's inconsistend and breaking type system. but ok, it's another arcane D knowledge, which can't be logically explained and can be only remembered. one more, one less...
Comment #3 by yebblies — 2015-03-22T04:40:14Z
(In reply to Ketmar Dark from comment #2) > the same logic should allow this: > > string alice() (usize ln=__LINE__) { > return "{ int t_"~ln~" = 42; }"; > } > pragma(msg, alice!()); > > yet somehow this is not working. but why? `ln` is known in compile time too! No, ln is a run-time argument here. Semantic analysis is run on 'alice' without knowing it will be called from a compile-time context.
Comment #4 by ketmar — 2015-03-22T04:50:56Z
ln type is known in both cases. `usize` is `size_t`, if it matters. yet somehow two `ln`s has *different* types. i can see how `(T ln=__LINE__)` can be converted to char, as compiler is free to choose any integral type for `T`, and it can be `ubyte`, if `__LINE__` is sufficiently small. but i specifically wrote `usize`, and i can't see why compiler wants to narrow it in one case, ignoring my explicit type definition, but doesn't want to narrow it in another case. from programmer's POV both samples should not work if type system is consistent and works as expected. but type system is clearly not consistent, and to use it successfully programmer must know corner cases and compiler internals. something wrong either with language design or with type system here.
Comment #5 by ketmar — 2015-03-22T04:57:55Z
i.e. having such template that somehow deduces type for it's argument, despite the explicitly written type, and doing that differently depending of the line where template was instantiated is... very strange. it's the source of hard-to-catch bugs, which type system should *help* to catch. instead it simply doing something i never asked it to do and carefully hides the bug from me. it reminds me INTERCAL language.
Comment #6 by yebblies — 2015-03-22T05:06:23Z
The type of ln doesn't change. The template expands to something like enum uint ln = 4; enum alice = "{ int t_"~ln~" = 42; }"; As the compiler knows the _value_ for ln, it expands alice's initializer to enum alice = "{ int t_"~4~" = 42; }"; Then, because it knows that 4 can fit in a char, it allows it to become this: enum alice = "{ int t_"~cast(char)4~" = 42; }"; And then the concatenation is semantically valid. As I said in my first comment, range propagation is allowed so that code like this will still work: enum str = "abc" ~ ('a' + 7); Here ('a' + 7) has type int, but because we know the value at compile time and know it will fit in a char, it is allowed to be narrowed.
Comment #7 by ketmar — 2015-03-22T05:16:08Z
the type IS changed. in no way compiler does runtime analysis to allow converting `size_t` to `char`, it simply rejects such code. yet for *explicitly* defined type in template compiler — for some unknown reason — narrows the type. so we actually have THREE types here: • "runtime `size_t`" • "compile-time `size_t` which fits to char" • "compile-time `size_t` which doesn't fit to char" for another unknown reason they all called `size_t`, which is obviously wrong. as for your sample with `'a'+1`: 'a' is *literal* here. as i told ealier, i can accept type deduction for literals (it's wrong too in this case, but ok, let's leave it as is). but i can't see why `size_t` has three definitions, and two of that definitions *depends* *on* *instantiation* *line*. it's insane.
Comment #8 by yebblies — 2015-03-22T05:45:01Z
Feel free to make your own version of the compiler that works differently.
Comment #9 by k.hara.pg — 2015-03-22T05:53:47Z
(In reply to yebblies from comment #6) > Then, because it knows that 4 can fit in a char, it allows it to become this: > > enum alice = "{ int t_"~cast(char)4~" = 42; }"; I think the problem is here. The constfold will change the value type uint to char unintentionally. I think the bare integer literal `4` and the expanded value `4` from a manifest constant `ln` should be handled differently. - A bare integer literal `4` should work as a polysemous expression, so implicit casting it to char is possible because 4 can fit the value range of char type. - An expanded integer `4` typed uint should work as an "interger", at least. Therefore converting it to char implicitly is unnatural behavior. (With the same logic, `ln` should not be implicitly convertible to bool.) ===== To support human natural consequence, I think D built-in type should be categorized as follows: - numeric types: u?byte, u?short, u?int, u?long, [ic]?float, [ic]?double, [ic]?real - character types: [wd]?char - boolean type: bool and disallow conversions beyond the categories for non-polymorphic literals (eg. `4` type with uint).
Comment #10 by ketmar — 2015-03-22T05:56:44Z
i did. yet i still can't see how this helps with three different `size_t` in D. but i clearly see why D will never be popular, with it's roots in old-fashioned "you have to know compiler internals to predict what compiler will do with your code". this makes me sad ketmar. ignoring human psychology is not the best way to make language popular, and i'm not talking about myself here. human tends to avoid unpredictable things if they have choice, and D has no killer app or software giant to support it, so the only way to win a fight is to be consistent and predictable. yet somethow the only consistent thing is consostent ignoring that facts. sad ketmar is sad.
Comment #11 by k.hara.pg — 2015-03-22T05:59:11Z
(In reply to Kenji Hara from comment #9) > and disallow conversions beyond the categories for non-polymorphic literals > (eg. `4` type with uint). Of course, even inside same category, loss of precision should also be disallowed. eg. A implicit conversion from a floating literal `3.14` typed real to int will loss the precision, so it should be disallowed.
Comment #12 by yebblies — 2015-03-22T06:08:29Z
(In reply to Kenji Hara from comment #9) > (In reply to yebblies from comment #6) > > Then, because it knows that 4 can fit in a char, it allows it to become this: > > > > enum alice = "{ int t_"~cast(char)4~" = 42; }"; > > I think the problem is here. The constfold will change the value type uint > to char unintentionally. I think the bare integer literal `4` and the > expanded value `4` from a manifest constant `ln` should be handled > differently. > > - A bare integer literal `4` should work as a polysemous expression, so > implicit casting it to char is possible because 4 can fit the value range of > char type. > > - An expanded integer `4` typed uint should work as an "interger", at least. > Therefore converting it to char implicitly is unnatural behavior. > (With the same logic, `ln` should not be implicitly convertible to bool.) > This is sort of the whole point of VRP, and I don't see how you can make it work without crippling it. The implicit conversion from int (etc) to char is useful and intentional. Even if you wanted to make 'explicitly' typed declarations differently, these would have to be treated the same: enum c = 'a' + 1; // no explicit type, but will be inferred as int enum int d = 'a' + 1; // explicitly typed as int enum x = "xxx" ~ c ~ d;
Comment #13 by ketmar — 2015-03-22T06:15:26Z
(In reply to yebblies from comment #12) > enum c = 'a' + 1; // no explicit type, but will be inferred as int as *integral*, which fits to any integral type. at least that is what ordinary human expects.
Comment #14 by k.hara.pg — 2015-03-22T06:24:00Z
(In reply to yebblies from comment #12) > This is sort of the whole point of VRP I think it's a problem in the current implemented semantics of VRP. Applying VRP beyond the "type categories" will introduce human unfriendly behavior. > The implicit conversion from int (etc) to char is useful and intentional. It's useful only when it does not break user intentional. Otherwise it would be harmful. > Even if you wanted to make 'explicitly' typed > declarations differently, these would have to be treated the same: > > enum c = 'a' + 1; // no explicit type, but will be inferred as int > enum int d = 'a' + 1; // explicitly typed as int > enum x = "xxx" ~ c ~ d; I think above code should be error. If user want to make a character by the expression 'a' + 1, c and d should be declared with char type. And it will be consistent with runtime code behavior. auto c = 'a' + 1; int d = 'a' + 1; auto x = "xxx" ~ c ~ d; // Error: incompatible types: 'string' and 'int'
Comment #15 by schveiguy — 2018-02-06T04:02:13Z
*** Issue 18346 has been marked as a duplicate of this issue. ***
Comment #16 by nick — 2022-10-28T17:39:20Z
*** Issue 17333 has been marked as a duplicate of this issue. ***
Comment #17 by robert.schadek — 2024-12-13T18:39:19Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/dmd/issues/18935 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB