Bug 18336 – Add std.algorithm.findMatchingParen

Status
NEW
Severity
enhancement
Priority
P4
Component
phobos
Product
D
Version
D2
Platform
x86_64
OS
Linux
Creation time
2018-01-30T15:05:13Z
Last change time
2024-12-01T16:32:19Z
Assigned to
No Owner
Creator
Seb
Moved to GitHub: phobos#9740 →

Comments

Comment #0 by greensunny12 — 2018-01-30T15:05:13Z
Similar to balancedParens, but returning a range (i.e. it's a generalization of it and balancedParens should be able to use it) See also: https://github.com/dlang/dlang.org/pull/2060#discussion_r163830331 https://github.com/dlang/phobos/pull/6098 A naive, non-efficient implementation: ``` // a range until the next ')', nested () are ignored auto untilClosingParentheses(R)(R rs) { return rs.cumulativeFold!((count, r){ switch(r) { case '(': count++; break; case ')': count--; break; default: } return count; })(1).zip(rs).until!(e => e[0] == 0).map!(e => e[1]); } unittest { import std.algorithm.comparison : equal; assert("aa $(foo $(bar)foobar)".untilClosingParentheses.equal("aa $(foo $(bar)foobar)")); assert("$(FOO a, b, $(ARGS e, f)))".untilClosingParentheses.equal("$(FOO a, b, $(ARGS e, f))")); } ```
Comment #1 by jack — 2018-01-30T15:49:52Z
I'm wondering how useful something like this would be in real world parsing situations like math statements or languages. If you have nested parens you'd have to do recursive calls to untilClosingParentheses to properly break up a statement like "(5 + (10 - 2))". Maybe it's be more useful to have some function that returns the following ------ assert("(5 + (10 - 2))".func().equals( [tuple(1UL, "5 + "), tuple(2UL, "10 - 2")] )); assert("7 + (4 * (2^^(-1)))".func().equals( [tuple(0UL, "7 + "), tuple(1UL, "4 * "), tuple(2UL, "2^^"), tuple(3UL, "-1")] )); ------ But at that point I don't know if it would be a good fit for Phobos.
Comment #2 by greensunny12 — 2018-01-30T16:10:13Z
I just opened this issue because balancedParens is quite limited and could be generalized. For the archive. Here's Andrei's response: > Yah we should have an algorithm findMatchingParen that assumes r.front is the opening paren and advances the range until it's positioned on the corresponding closing paren. An optional rParen can be passed for unusual closing parens. @Jack: > I'm wondering how useful something like this would be in real world There's the example from dlang.org to parse ddoc strings. Here's another for finding everything within one HTML tag or all it's enclosing ones: "<foo foo=bar/>".findMatchingParen('<', '>').writeln; // foo foo=bar/ "<foo><bar/><foobar><bar/></foobar></foo>".findMatchingParen('<', '>', 2).writeln; // foo><bar/><foobar><bar/></foobar></foo> https://run.dlang.io/is/ah7wmd Of course, it would require a better, intuitive API to allow both use cases nicely.
Comment #3 by hsteoh — 2018-01-30T16:29:31Z
As Andrei suggested, one way to implement this would be something like this: ----- R untilMatchingParens(R)(R range, dchar rightParen=')') if (isInputRange!R) { if (range.empty) return range; // base case auto leftParen = range.front; // <-- this is how we know what the starting parens is range.popFront; int nesting = 1; foreach (ch; range) { if (ch == leftParen) nesting++; else if (ch == rightParen) nesting--; if (nesting == 0) break; } return range; } ----- Basically, the start of the range determines what the opening parens is, and the optional argument specifies what the closing parens is. Of course, the above implementation could be improved (e.g., using memchr or whatever to find the parens characters), but this is just a proof-of-concept.
Comment #4 by andrei — 2018-01-30T16:43:43Z
(In reply to hsteoh from comment #3) > As Andrei suggested, one way to implement this would be something like this: > > ----- > R untilMatchingParens(R)(R range, dchar rightParen=')') > if (isInputRange!R) > { > if (range.empty) return range; // base case > auto leftParen = range.front; // <-- this is how we know what the > starting parens is > range.popFront; > > int nesting = 1; > foreach (ch; range) > { > if (ch == leftParen) nesting++; > else if (ch == rightParen) nesting--; > if (nesting == 0) break; > } > return range; > } > ----- > > Basically, the start of the range determines what the opening parens is, and > the optional argument specifies what the closing parens is. Yes, with the amendment there is no default for the closing paren. Instead, there's an overload without the closing paren: R findMatchingParen(R)(R range) { ... } That looks at range.front and then infers the closing paren as follows: ")" for "(", "}" for "{", "[" for "]", ">" for "<", and perhaps a few Unicode fancy parens too.
Comment #5 by robert.schadek — 2024-12-01T16:32:19Z
THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/phobos/issues/9740 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB