Bug 786 – the \ EndOfFile EscapeSequence in double-quoted strings doesn't work

Status
RESOLVED
Resolution
INVALID
Severity
normal
Priority
P3
Component
dmd
Product
D
Version
D1 (retired)
Platform
x86
OS
Windows
Creation time
2007-01-02T20:35:00Z
Last change time
2014-02-15T13:21:16Z
Keywords
rejects-valid, spec
Assigned to
bugzilla
Creator
dlang-bugzilla

Comments

Comment #0 by dlang-bugzilla — 2007-01-02T20:35:45Z
Spec non-conformacy, I believe. Spec: http://www.digitalmars.com/d/lex.html#StringLiteral Program: void main() { char[] eof_literal = "\"; // the character after the backslash is \u001A, as per the specs } Compiler output: C:\...>dmd lexical.d lexical.d(3): unterminated string constant starting at lexical.d(3) lexical.d(3): semicolon expected, not 'EOF' lexical.d(3): found 'EOF' instead of statement lexical.d(3): found 'EOF' instead of statement lexical.d(3): found 'EOF' instead of statement lexical.d(3): found 'EOF' instead of statement lexical.d(3): found 'EOF' instead of statement lexical.d(3): found 'EOF' instead of statement lexical.d(3): found 'EOF' instead of statement lexical.d(3): found 'EOF' instead of statement lexical.d(3): found 'EOF' instead of statement lexical.d(3): found 'EOF' instead of statement lexical.d(3): found 'EOF' instead of statement lexical.d(3): found 'EOF' instead of statement lexical.d(3): found 'EOF' instead of statement lexical.d(3): found 'EOF' instead of statement lexical.d(3): found 'EOF' instead of statement lexical.d(3): found 'EOF' instead of statement lexical.d(3): found 'EOF' instead of statement lexical.d(3): found 'EOF' instead of statement lexical.d(3): found 'EOF' instead of statement (that's 19 repeating lines)
Comment #1 by smjg — 2007-01-03T04:01:30Z
"End of File EndOfFile: physical end of the file \u0000 \u001A " AIUI, locating the end of the code conceptually happens before tokenization. But indeed, the spec isn't crystal clear on this.
Comment #2 by thomas-dloop — 2007-01-06T15:46:14Z
Intermingling eof detection with tokenisation would cause quite a bit of changes within DMD and makes no sense to me as it would allow to read past the physical end of the file.
Comment #3 by bugzilla — 2007-02-02T21:34:28Z
0x1A is listed in lex.html as 'end of file', which trumps any token, I think the spec is reasonably clear on this: "The source text is terminated by whichever comes first." The reason for this is that some (old) text editors put out a 0x1A to mark end of file. Not a bug.
Comment #4 by dlang-bugzilla — 2007-02-02T21:37:12Z
In that case, why is "\ EndOfFile" listed as a valid EscapeSequence token?
Comment #5 by bugzilla — 2007-02-02T23:19:37Z
If a \ is the last character in a file, the escape sequence will resolve to the \ character, that's what that is for.
Comment #6 by smjg — 2007-02-03T08:10:18Z
But a StringLiteral can never be the last token of a syntactically valid D source file, or can it?
Comment #7 by bugzilla — 2007-02-03T12:13:43Z
Currently, no, it can't, hence the error message about semicolon expected instead of EOF. But the lexer doesn't (and shouldn't) know syntax, it just knows tokens.
Comment #8 by smjg — 2007-02-04T07:00:21Z
Exactly. So really, EscapeSequence: \ EndOfFile has no effect except perhaps on what error message the compiler throws. Moreover, UIMS the spec gives no meaning to this EscapeSequence form. Which is probably why we're all asking.