← Back to index | Original Bugzilla link

Bug 9842 – std.algorithm.hashGroup

Status: NEW
Severity: enhancement
Priority: P4
Component: phobos
Product: D
Version: D2
Platform: All
OS: All
Creation time: 2013-03-30T14:32:38Z
Last change time: 2024-12-01T16:17:12Z
Assigned to: No Owner
Creator: bearophile_hugs

Comments

Comment #0 by bearophile_hugs — 2013-03-30T14:32:38Z

A little Strems/LINQ challenge, "Grouping by a Criterium": Group the elements of a collection of strings by their length. A C# LINQ solution: string[] names = {"Sam", "Samuel", "Samu", "Ravi", "Ratna", "Barsha"}; var groups = names.GroupBy(c => c.Length); Using the groupBy written by Andrei (https://github.com/D-Programming-Language/phobos/pull/1186 ) a D solution is: auto names = ["Sam", "Samuel", "Samu", "Ravi", "Ratna", "Barsha"]; auto groups = names .schwartzSort!q{ a.length } .groupBy!q{ a.length == b.length }; But group/groupBy work by sorting. Hash-based O(n) hashGroup/hashGroupBy are also conceivable, potentially faster, and leading to simpler code, because you don't need to sort the items first: auto names = ["Sam", "Samuel", "Samu", "Ravi", "Ratna", "Barsha"]; auto groups = names.hashGroupBy!q{ a.length == b.length }; (Alternative names for hashGroup/hashGroupBy are possible.)

Comment #1 by hsteoh — 2014-11-05T05:49:57Z

Isn't this just the same as data.hashSort.group? For hash sort to work, the key must map to an integral value and must be bounded by known min/max values.

Comment #2 by bearophile_hugs — 2014-11-05T10:01:42Z

(In reply to hsteoh from comment #1) > Isn't this just the same as data.hashSort.group? For hash sort to work, the > key must map to an integral value and must be bounded by known min/max > values. The output and working is different. The hashGroupBy doesn't return a range of pairs as the function "group". If you write: auto names = ["Sam", "Samuel", "Samu", "Ravi", "Ratna", "Barsha"]; auto grps = names.hashGroupBy!(n => n.length); Now grps is: assert(grps == [3:["Sam"], 4:["Samu", "Ravi"], 5:["Ratna"], 6:["Samuel", "Barsha"]]); The keys can be anything, not just integers in an interval.

Comment #3 by bearophile_hugs — 2014-12-17T09:54:27Z

Another example usage. Here the "properDivs" function returns the proper divisors of the given number n. The function "classify" returns one of the three classes (Deficient number, Perfect number, Abundant number). Then a hashGroup is handy to create an associative array to count how many numbers there are of each class in the [1,10000] interval: void main() { import std.stdio, std.algorithm, std.range; static immutable properDivs = (in uint n) pure nothrow @safe => iota(1, (n + 1) / 2 + 1).filter!(x => n % x == 0 && n != x); enum Class { deficient, perfect, abundant } static Class classify(in uint n) pure nothrow @safe { immutable p = properDivs(n).sum; with (Class) return (p < n) ? deficient : ((p == n) ? perfect : abundant); } iota(1, 10000).map!classify.hashGroup.writeln; }

Comment #4 by bearophile_hugs — 2015-01-06T16:13:14Z

Now I think a "hashGroup" suffices (hashGroup can also be used without template argument, it uses an x=>x identity function on default). In Phobos we have std.array.assocArray that allows to build a built-in associative array from a range of key-value Phobos Tuples (it will become mostly legacy when we will have a good associative array in Phobos and built-in tuples in D, that is the opposite of the data structures we have today). But there is another very common pattern of creation of associative arrays, where you want to "accumulate" inside the values. The basic form of accumulation is counting: //A const a = [1, 2, 1, 3, 5, 1, 2]; uint[int] aa1; foreach (x; a) aa1[x]++; aa1.writeln; // [1:3, 5:1, 2:2, 3:1] Another example is just emumeration in an array: //B const b = [-1, 2, 1, 3, 5, -1, -2]; int[][int] aa2; foreach (x; b) aa2[x.abs] ~= x; aa2.writeln; // [1:[-1, 1, -1], 5:[5], 2:[2, -2], 3:[3]] Or even in a set (here implemented with another associative array of int-bools): //C bool[int][int] aa3; foreach (x; b) aa3[x.abs][x] = true; aa3.writeln; // [1:[1:true, -1:true], 5:[5:true], 2:[2:true, -2:true], 3:[3:true]] There are ways to perform those operations with Phobos today, but the code is so bad looking that it's better to skip this. So I suggest to add a std.array.hashGroup function to Phobos that as argument takes a range, and as template argument takes one or two functions. The first function is a mapping key unary function. The second function is optional and takes as argument the sub-range of grouped values. The patterns above become: //A a.hashGroup!(id, count).writeln; //B a.hashGroup!abs.writeln; //C a.hashGroup!(abs, items => items.zip(true.repeat).assocArray)).writeln; Where "id" is the identity function (not present in Phobos): //A a.hashGroup!(x => x, count).writeln; Once Phobos has a Set data structure with a constructor helper function that accepts a range as argument you can also write just: //C a.hashGroup!(abs, set).writeln;

Comment #5 by robert.schadek — 2024-12-01T16:17:12Z

THIS ISSUE HAS BEEN MOVED TO GITHUB https://github.com/dlang/phobos/issues/9966 DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB