Bug 7260 – "g" on default in std.regex

Status
RESOLVED
Resolution
WONTFIX
Severity
normal
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2012-01-09T13:52:00Z
Last change time
2013-09-22T01:12:35Z
Assigned to
nobody
Creator
bearophile_hugs

Comments

Comment #0 by bearophile_hugs — 2012-01-09T13:52:08Z
D2 code: import std.stdio: write, writeln; import std.regex: regex, match; void main() { string text = "abc312de"; foreach (c; text.match("1|2|3|4")) write(c, " "); writeln(); foreach (c; text.match(regex("1|2|3|4", "g"))) write(c, " "); writeln(); } It outputs (DMD 2.058 Head): ["3"] ["3"] ["1"] ["2"] In my code I have seen that usually the "g" option (that means "repeat over the whole input") is what I want. So what do you think about making "g" the default? Note: I have not marked this issue as "enhancement" because of this comment by Dmitry Olshansky (found by drey_ on IRC #D): http://dfeed.kimsufi.thecybershadow.net/discussion/thread/[email protected]#post-jc9mag:2430tq:241:40digitalmars.com > Yet I have to issue yet another warning about new std.regex compared > with old one: > > import std.stdio; > import std.regex; > > void main() { > string src = "4.5.1"; > foreach (c; match(src, regex(r"(\d+)"))) > writeln(c.hit); > } > > previously this will find all matches, now it finds only first one. To > get all of matches use "g" option. > > Seems like 100% compatibility was next to impossible.
Comment #1 by dmitry.olsh — 2012-02-24T12:21:44Z
I dunno how to "fix" this bug. "g" by default imples there is a way to override it. regex("blah","") ? Leaving it as is now breaks old codebases that rely on "g" (though there should be more of legacy std.regexp code out there). Making it "g" on affects old code only inside foreach and generic constructs that show all matches or iterate on them, it's rare but non-zero. Another way would be to ditch current API, which I is not ideal btw ;)
Comment #2 by bearophile_hugs — 2012-02-24T12:45:08Z
(In reply to comment #1) > I dunno how to "fix" this bug. "g" by default imples there is a way to override > it. regex("blah","") ? > Leaving it as is now breaks old codebases that rely on "g" (though there should > be more of legacy std.regexp code out there). > Making it "g" on affects old code only inside foreach and generic constructs > that show all matches or iterate on them, it's rare but non-zero. > > Another way would be to ditch current API, which I is not ideal btw ;) Fully ditching the currently used API is probably too much. A possible idea: regex("blah") <<== repeat over the whole input. regex("blah","") <<== repeat over the whole input. regex("blah","g") <<== repeat over the whole input. regex("blah","d") <<== doesn't repeat over the whole input. So far you have done good work on the regular expression implementation, so I trust your work. Thank you.
Comment #3 by bearophile_hugs — 2012-04-19T15:18:13Z
This is not an enhancement request (I consider it more like a little Phobos regression).
Comment #4 by bearophile_hugs — 2013-01-24T19:21:14Z
If changing std.regex.regex is not possible, then an alternative solution is to introduce the new little function "std.regex.re", that repeats on default, that is like: re(someString) === regex(someString, "g") re(someString, "d") === regex(someString, "dg")
Comment #5 by dmitry.olsh — 2013-01-25T12:22:46Z
(In reply to comment #4) > If changing std.regex.regex is not possible, then an alternative solution is to > introduce the new little function "std.regex.re", that repeats on default, that > is like: > > re(someString) === regex(someString, "g") > > re(someString, "d") === regex(someString, "dg") Frankly this is stupid (sorry). Obviously the wrong turn is that people (rightfully so) associate "find all" vs "find first" with operation that is "match"/"replace" not the "regex" as in the pattern itself. Personally I think that we better go with explicit overrides on "match"/"replace"/etc. and very slowly deprecate the "g" switch. Then how the override will look like is up for debate. match(someString, pattern).all //range of all matches match(someString, pattern).first //only the first one match(someString, pattern) // using the "g" flag to decide Or pass the override as optional parameter to match: match(someString, pattern, Regex.all); match(someString, pattern, Regex.first); match(someString, pattern); //use the flag I'll probably open a poll to pick the better one.
Comment #6 by dmitry.olsh — 2013-03-10T10:43:30Z
(In reply to comment #4) > If changing std.regex.regex is not possible, then an alternative solution is to > introduce the new little function "std.regex.re", that repeats on default, that > is like: > > re(someString) === regex(someString, "g") > > re(someString, "d") === regex(someString, "dg") Here is a plan based on one of my previous idea that I think is clean enough, given the circumstances and the fact that e.g. this Perl-ism is fairly popular in certain circles. (Namely attaching mode of operation to the pattern itself as in /`pattern`/`mode-suffix`). What we do is at first specify that "g" serves only as the intended default "mode" of this pattern. Then introduce simple and elegant way to explicitly specify what mode of matching to use: first, all or the default for this pattern. The your code looks like this (I'm still pondering better names/ways for overriding default): void main() { string text = "abc312de"; foreach (c; text.match("1|2|3|4").first) write(c, " "); writeln(); foreach (c; text.match(regex("1|2|3|4")).all) //could use string pattern as above write(c, " "); writeln(); } Then I'd try to do the same with replace. No overrides used would imply "use whatever the default mode is". How does it sound? Then we place nice bold warning that use of "g" option is discouraged and is provided only for compatibilty and is going be deprecated in future. A year later and depending on the mood of people it gets finally deprecated and slowly shifted towards oblivion. I'll probably cross-post this to NG to collect opinions since this is the largest pain point of the otherwise fine interface.
Comment #7 by bearophile_hugs — 2013-03-10T11:09:31Z
(In reply to comment #5) > match(someString, pattern).all //range of all matches > match(someString, pattern).first //only the first one > match(someString, pattern) // using the "g" flag to decide (In reply to comment #6) > No overrides used would imply "use whatever the default mode is". > > How does it sound? > > Then we place nice bold warning that use of "g" option is discouraged and is > provided only for compatibilty and is going be deprecated in future. > > A year later and depending on the mood of people it gets finally deprecated and > slowly shifted towards oblivion. Once "g" is deprecated what is match(someString, pattern) (without all and first) doing?
Comment #8 by dmitry.olsh — 2013-03-10T11:54:55Z
(In reply to comment #7) > (In reply to comment #5) > > > match(someString, pattern).all //range of all matches > > match(someString, pattern).first //only the first one > > match(someString, pattern) // using the "g" flag to decide > > > (In reply to comment #6) > > > No overrides used would imply "use whatever the default mode is". > > > > How does it sound? > > > > Then we place nice bold warning that use of "g" option is discouraged and is > > provided only for compatibilty and is going be deprecated in future. > > > > A year later and depending on the mood of people it gets finally deprecated and > > slowly shifted towards oblivion. > > Once "g" is deprecated what is match(someString, pattern) (without all and > first) doing? Could go both ways. The other posibility I just thought about is: match(...).first - is the same as current match(...).front i.e. simplify interface for the case when 1 match is needed match(...).all - the same as current match(... with "g" overrided) i.e. a range Then once "g" is off we could either make .all a nop. Alternative is to make it opaque object that has 2 methods only .first/.all. The third alternative to add alias this to make .first implicit. I feel it won't work reliably with range-based templates as it would make it "2 ranges in one". So only the first 2 are viable. I'd go with 1st that gets upgraded to the second once people forget about "g" switch entierly.
Comment #9 by dmitry.olsh — 2013-03-10T12:38:45Z
(In reply to comment #8) > > Then once "g" is off we could either make .all a nop. > > Alternative is to make it opaque object that has 2 methods only .first/.all. > > The third alternative to add alias this to make .first implicit. I feel it > won't work reliably with range-based templates as it would make it "2 ranges in > one". > > So only the first 2 are viable. I'd go with 1st that gets upgraded to the > second once people forget about "g" switch entierly. Typo - I've meant make it an opaque object then sometime later turn .all implicitly. It would still have potential to break code so it seems that just make .all implicit is better.
Comment #10 by dmitry.olsh — 2013-08-17T05:33:49Z
The problem now should is addressed by this pull https://github.com/D-Programming-Language/phobos/pull/1470 There is matchAll/matchFirst calls now that are the prefered way to go about matching. Currently they simply override global flag if present. Returning to the original example: foreach (c; text.matchAll("1|2|3|4")) //this spins over captures of each match write(c, " "); writeln(); foreach (c; text.matchFirst("1|2|3|4")) //this spins submatches of 1st match write(c, " "); writeln(); To me there is little else to do aside from slooowly deprecating old flag-based match/replace interface.
Comment #11 by dmitry.olsh — 2013-09-22T01:12:35Z
Flags are to be gone one day and "g" by default is not going to happen. This IMHO makes it won't fix. Anyhow the core issue should now be addressed by using new API that is more clear.