Comment #0 by andrej.mitrovich — 2013-12-18T11:37:06Z
-----
import std.regex;
import std.stdio;
void main()
{
// expected: [["3"]] - but got: [["2"]]]
writeln("123456789".match("[^1--[2]]"));
// the above is *currently* equivalent to:
writeln("123456789".match("[^[1--[2]]]"));
// which means: subtract "1 - 2" (equals 1),
// and then negate it (so "2" will match first in the string)
// but I expect the first case to be equivalent to:
writeln("123456789".match("[[^1]--[2]]"));
// which means: negate 1 (for discussion assume 2-9 range),
// subtract 2 and you get 3-9, which means "3" will match first.
}
-----
I'm not sure whether this is just how ECMAScript does it (since std.regex references it), but e.g. .NET does negation on the base class first (The "1" class above) and *then* it does subtraction with another class.
You can test this behavior here:
http://refiddle.com/
Using .net syntax:
[^01-[2]]
0123456789
It matches "3".
Either way if this report is invalid (e.g. expected behavior) then I think we should update the docs so they state the precedence of the negation.
Comment #1 by andrej.mitrovich — 2013-12-18T11:38:32Z
(In reply to comment #0)
> Using .net syntax:
> [^01-[2]]
> 0123456789
>
> It matches "3".
Nevermind the leading zero, I meant to use this simpler example:
[^1-[2]]
123456789
It matches "3".
Comment #2 by dmitry.olsh — 2013-12-18T11:56:14Z
(In reply to comment #0)
> I'm not sure whether this is just how ECMAScript does it (since std.regex
> references it), but e.g. .NET does negation on the base class first (The "1"
> class above) and *then* it does subtraction with another class.
ECMAScript doesn't even have it AFAIK ;)
I think you (and .NET) are right - the prioriy of unary '^' operator should be higher then that of any other binary ops.
Comment #3 by andrej.mitrovich — 2013-12-19T04:55:48Z
Is the following sample caused by the same issue?
writeln("abcdefghijklmnopqrstuvwxyz".match("[a-z&&[^aeiuo]]"));
It writes [["a"]], I was expecting the first non-vowel [["b"]]. It returns "b" in Ruby, as for .NET I haven't found the syntax it uses.
Comment #4 by dmitry.olsh — 2013-12-19T10:27:35Z
(In reply to comment #1)
> (In reply to comment #0)
> > Using .net syntax:
> > [^01-[2]]
> > 0123456789
> >
> > It matches "3".
>
> Nevermind the leading zero, I meant to use this simpler example:
>
> [^1-[2]]
> 123456789
>
> It matches "3".
Actually because of single dash it works as if all is fine...
This one is good case:
[^1--[2]]
Comment #5 by dmitry.olsh — 2013-12-19T10:31:23Z
(In reply to comment #3)
> Is the following sample caused by the same issue?
>
> writeln("abcdefghijklmnopqrstuvwxyz".match("[a-z&&[^aeiuo]]"));
>
> It writes [["a"]], I was expecting the first non-vowel [["b"]]. It returns "b"
> in Ruby, as for .NET I haven't found the syntax it uses.
From the look of it - an unrelated bug in set intersection.
Better split it off as a new issue.
Comment #6 by andrej.mitrovich — 2013-12-20T00:51:17Z
(In reply to comment #5)
> (In reply to comment #3)
> > Is the following sample caused by the same issue?
> >
> > writeln("abcdefghijklmnopqrstuvwxyz".match("[a-z&&[^aeiuo]]"));
> >
> > It writes [["a"]], I was expecting the first non-vowel [["b"]]. It returns "b"
> > in Ruby, as for .NET I haven't found the syntax it uses.
>
> From the look of it - an unrelated bug in set intersection.
> Better split it off as a new issue.
Filed as Issue 11784.
Comment #7 by dmitry.olsh — 2014-01-10T12:24:42Z
Ruby makes me nervous:
print /[^abc[e-f]&&[ybc]]/.match('~haystack')
Prints '~' meaning that ^ operator has _lower_ priority then '&&'.
I'm surprised but it's the precedent.
And indeed the following reports empty set and warnings about '-' without escape i.e. '--' is not supported...
print /[^1--[2]]/.match("0123456789")
re.rb:2: warning: character class has '-' without escape: /[^2--[1]]/
re.rb:2: empty range in char class: /[^2--[1]]/
> [^1-[2]]
> 123456789
>
> It matches "3".
And .NET is disappointing
[^[2]-1]
doesn't match anything. They somehow special cased only the form of [..-[set]]
and arbitrary nesting of it.
So we have no good precedents.
My thoughts are to make it proper operator precedence grammar
with priorities:
0 - implict union (pieces that stand together, evaluated first)
1 - ^ (negation)
2 - &&
3 - --
4 - || (explicit union, evaluated last)
Comment #8 by robert.schadek — 2024-12-01T16:19:37Z