← Back to index | Original Bugzilla link

Bug 18336 – Add std.algorithm.findMatchingParen

Status: NEW
Severity: enhancement
Priority: P4
Component: phobos
Product: D
Version: D2
Platform: x86_64
OS: Linux
Creation time: 2018-01-30T15:05:13Z
Last change time: 2024-12-01T16:32:19Z
Assigned to: No Owner
Creator: Seb

Comments

Comment #0 by greensunny12 — 2018-01-30T15:05:13Z

Similar to balancedParens, but returning a range (i.e. it's a generalization of it and balancedParens should be able to use it) See also: https://github.com/dlang/dlang.org/pull/2060#discussion_r163830331 https://github.com/dlang/phobos/pull/6098 A naive, non-efficient implementation: ``` // a range until the next ')', nested () are ignored auto untilClosingParentheses(R)(R rs) { return rs.cumulativeFold!((count, r){ switch(r) { case '(': count++; break; case ')': count--; break; default: } return count; })(1).zip(rs).until!(e => e[0] == 0).map!(e => e[1]); } unittest { import std.algorithm.comparison : equal; assert("aa $(foo $(bar)foobar)".untilClosingParentheses.equal("aa $(foo $(bar)foobar)")); assert("$(FOO a, b, $(ARGS e, f)))".untilClosingParentheses.equal("$(FOO a, b, $(ARGS e, f))")); } ```

Comment #1 by jack — 2018-01-30T15:49:52Z

I'm wondering how useful something like this would be in real world parsing situations like math statements or languages. If you have nested parens you'd have to do recursive calls to untilClosingParentheses to properly break up a statement like "(5 + (10 - 2))". Maybe it's be more useful to have some function that returns the following ------ assert("(5 + (10 - 2))".func().equals( [tuple(1UL, "5 + "), tuple(2UL, "10 - 2")] )); assert("7 + (4 * (2^^(-1)))".func().equals( [tuple(0UL, "7 + "), tuple(1UL, "4 * "), tuple(2UL, "2^^"), tuple(3UL, "-1")] )); ------ But at that point I don't know if it would be a good fit for Phobos.

Comment #2 by greensunny12 — 2018-01-30T16:10:13Z

I just opened this issue because balancedParens is quite limited and could be generalized. For the archive. Here's Andrei's response: > Yah we should have an algorithm findMatchingParen that assumes r.front is the opening paren and advances the range until it's positioned on the corresponding closing paren. An optional rParen can be passed for unusual closing parens. @Jack: > I'm wondering how useful something like this would be in real world There's the example from dlang.org to parse ddoc strings. Here's another for finding everything within one HTML tag or all it's enclosing ones: "<foo foo=bar/>".findMatchingParen('<', '>').writeln; // foo foo=bar/ "<foo><bar/><foobar><bar/></foobar></foo>".findMatchingParen('<', '>', 2).writeln; // foo><bar/><foobar><bar/></foobar></foo> https://run.dlang.io/is/ah7wmd Of course, it would require a better, intuitive API to allow both use cases nicely.

Comment #3 by hsteoh — 2018-01-30T16:29:31Z

As Andrei suggested, one way to implement this would be something like this: ----- R untilMatchingParens(R)(R range, dchar rightParen=')') if (isInputRange!R) { if (range.empty) return range; // base case auto leftParen = range.front; // <-- this is how we know what the starting parens is range.popFront; int nesting = 1; foreach (ch; range) { if (ch == leftParen) nesting++; else if (ch == rightParen) nesting--; if (nesting == 0) break; } return range; } ----- Basically, the start of the range determines what the opening parens is, and the optional argument specifies what the closing parens is. Of course, the above implementation could be improved (e.g., using memchr or whatever to find the parens characters), but this is just a proof-of-concept.

Comment #4 by andrei — 2018-01-30T16:43:43Z

(In reply to hsteoh from comment #3) > As Andrei suggested, one way to implement this would be something like this: > > ----- > R untilMatchingParens(R)(R range, dchar rightParen=')') > if (isInputRange!R) > { > if (range.empty) return range; // base case > auto leftParen = range.front; // <-- this is how we know what the > starting parens is > range.popFront; > > int nesting = 1; > foreach (ch; range) > { > if (ch == leftParen) nesting++; > else if (ch == rightParen) nesting--; > if (nesting == 0) break; > } > return range; > } > ----- > > Basically, the start of the range determines what the opening parens is, and > the optional argument specifies what the closing parens is. Yes, with the amendment there is no default for the closing paren. Instead, there's an overload without the closing paren: R findMatchingParen(R)(R range) { ... } That looks at range.front and then infers the closing paren as follows: ")" for "(", "}" for "{", "[" for "]", ">" for "<", and perhaps a few Unicode fancy parens too.

Comment #5 by robert.schadek — 2024-12-01T16:32:19Z

THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/phobos/issues/9740 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB