Comment #0 by stefan.zipproth — 2008-03-24T17:40:05Z
If the parameter of std.uri.decode contains %E4, which is a German umlaut ä, the exception "URI error" is thrown. This is wrong behaviour, as an URI may contains umlauts. Here's my C implementation which does the job:
void decode(char *src, char *last, char *dest)
{
for (; src != last; src++, dest++)
if (*src == '+')
*dest = ' ';
else if (*src == '%')
{
int code;
if (sscanf(src+1, "%2x", &code) != 1) code = '?';
*dest = code;
src +=2;
}
else
*dest = *src;
*dest = 0;
}
To my understanding, it's nothing else than a hex code to byte conversion, so there should be no reason to forbid certain codes and throw exceptions. Also reading the documentation I expected that at least std.uri.decodeComponent is a straightforward implementation, but it also throws exceptions.
Comment #1 by stefan.zipproth — 2008-03-24T18:51:54Z
My own web pages currently work with %E4 for umlaut 'ä', which is ASCII code, but standard is UTF-8 which makes it %C3%A4. So this issue seems to be invalid and I have to change things for the D port of my web application.
Comment #2 by stefan.zipproth — 2014-02-25T00:56:32Z
RFC 3986 states in section 2.1. Percent-Encoding
For example, "%20" is the percent-encoding for the binary octet
"00100000" (ABNF: %x20), which in US-ASCII corresponds to the space
character (SP).
So UTF-8 is not the standard used for URI percent encoding. Therefore, std.uri.decode should not throw an exception if its parameter contains %E4. It is allowed for browsers to encode German umlaut ä as %E4, which was the reason I ran into this problem (because my server side application crashed as soon as std.uri.decode was called).
Comment #3 by andrei — 2015-11-03T19:00:09Z
It's unlikely this D1 issue will get worked on. If this bug applies to D2 and/or if anyone plans to work on it, please reopen.