Bug 17094 – std.container.binaryheap doesn't manage store length consistently when inserting
Status
RESOLVED
Resolution
INVALID
Severity
normal
Priority
P3
Component
phobos
Product
D
Version
D2
Platform
All
OS
All
Creation time
2017-01-16T05:21:56Z
Last change time
2018-01-30T07:43:27Z
Assigned to
No Owner
Creator
Jon Degenhardt
Comments
Comment #0 by jrdemail2000-dlang — 2017-01-16T05:21:56Z
When inserting into a BinaryHeap, the underlying data "store" length may or may not be updated, depending the type of container used and whether there is already capacity.
This leaves the data store in a state where cannot be used. This appears to be a bug, described used cases indicate the store can be accessed, implying the length should be correct.
A unit test example from the documentation:
import std.algorithm.comparison : equal;
int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
auto h = heapify(a);
assert(h.front == 16);
assert(equal(a, [ 16, 14, 10, 8, 7, 9, 3, 2, 4, 1 ]));
The last line is accessing the data store. This program has the above, then follows with the same notion, but done by inserting elements into an existing heap instead:
void main(string[] args)
{
import std.container.binaryheap;
import std.algorithm.comparison : equal;
/* Current unit test. */
int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
auto h = heapify(a);
assert(h.front == 16);
assert(equal(a, [ 16, 14, 10, 8, 7, 9, 3, 2, 4, 1 ]));
/* Same, but using insert. Fails on last line. */
int[] vals = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
int[] b;
auto h2 = heapify(b);
foreach (v; vals) h2.insert(v);
assert(h2.front == 16);
assert(equal(b, [ 16, 14, 10, 8, 7, 9, 3, 2, 4, 1 ])); // Fails
}
Documentation for binaryheap implies throughout that structure is being applied to the underlying data store, and that this can be accessed.
A specific use case where this is problematic is when extracting the elements to reorder the underlying data store. From the documentation:
Extracting elements from the BinaryHeap to completion leaves the
underlying store sorted in ascending order but, again, yields
elements in descending order.
This is useful when identifying a top-k set of elements, then want to access in ascending order, which is the order of the data store in the description above. However, because the data store length is not correctly set, it cannot be accessed normally.
The below program has some diagnostics. It inserts in heaps built with both dynamics arrays and std.container.array.Array. It also shows the difference when capacity has and has not been preallocated.
When run, it shows that the underlying data store gets updated when an std.container.array.Array is used and preallocated with reserve(), not otherwise the lengths are not updated.
------bug_binaryheap.h-----
void main(string[] args)
{
import std.container.binaryheap;
import std.container.array;
import std.stdio;
int[] vals = [3, 8, 9, 2, 6, 4, 5, 1, 7];
int[] storeX;
int[] storeXRes;
Array!int storeY;
Array!int storeYRes;
storeXRes.reserve(vals.length);
storeYRes.reserve(vals.length);
writeln(" X XRes Y YRes");
writeln("Before heapify");
writefln(" Store capacity: %2d %2d %2d %2d",
storeX.capacity, storeXRes.capacity, storeY.capacity, storeYRes.capacity);
auto heapX = heapify(storeX, 0);
auto heapXRes = heapify(storeXRes, 0);
auto heapY = heapify(storeY, 0);
auto heapYRes = heapify(storeYRes, 0);
writeln("After heapify");
writefln(" Heap lengths: %2d %2d %2d %2d",
heapX.length, heapXRes.length, heapY.length, heapYRes.length);
writefln(" Store lengths: %2d %2d %2d %2d",
storeX.length, storeXRes.length, storeY.length, storeYRes.length);
writefln(" Heap capacity: %2d %2d %2d %2d",
heapX.capacity, heapXRes.capacity, heapY.capacity, heapYRes.capacity);
writefln(" Store capacity: %2d %2d %2d %2d",
storeX.capacity, storeXRes.capacity, storeY.capacity, storeYRes.capacity);
foreach (v; vals)
{
heapX.insert(v);
heapXRes.insert(v);
heapY.insert(v);
heapYRes.insert(v);
}
writeln("After inserts");
writefln(" Heap lengths %2d %2d %2d %2d",
heapX.length, heapXRes.length, heapY.length, heapYRes.length);
writefln(" Store lengths: %2d %2d %2d %2d",
storeX.length, storeXRes.length, storeY.length, storeYRes.length);
writefln(" Heap capacity: %2d %2d %2d %2d",
heapX.capacity, heapXRes.capacity, heapY.capacity, heapYRes.capacity);
writefln(" Store capacity: %2d %2d %2d %2d",
storeX.capacity, storeXRes.capacity, storeY.capacity, storeYRes.capacity);
auto storeX2 = heapX.release;
auto storeX2Res = heapXRes.release;
auto storeY2 = heapY.release;
auto storeY2Res = heapYRes.release;
writeln("After release:");
writefln(" Heap lengths %2d %2d %2d %2d",
heapX.length, heapXRes.length, heapY.length, heapYRes.length);
writefln(" Store lengths: %2d %2d %2d %2d",
storeX.length, storeXRes.length, storeY.length, storeYRes.length);
writefln(" Returned lengths: %2d %2d %2d %2d",
storeX2.length, storeX2Res.length, storeY2.length, storeY2Res.length);
}
---------------------------
Note: X is built in array, Y is std.container.array.Array. XRes and YRes have been pre-allocated with with reserve().
$ dmd bug_binaryheap.d
$ ./bug_binaryheap
X XRes Y YRes
Before heapify
Store capacity: 0 15 0 9
After heapify
Heap lengths: 0 0 0 0
Store lengths: 0 0 0 0
Heap capacity: 0 15 0 9
Store capacity: 0 15 0 9
After inserts
Heap lengths 9 9 9 9
Store lengths: 0 0 0 9
Heap capacity: 15 15 11 9
Store capacity: 0 0 0 9
After release:
Heap lengths 0 0 0 0
Store lengths: 0 0 0 9
Returned lengths: 9 9 9 9
$ dmd --version
DMD64 D Compiler v2.073.0-b1
Copyright (c) 1999-2016 by Digital Mars written by Walter Bright
Comment #1 by jrdemail2000-dlang — 2018-01-29T21:54:44Z
Bump - Since it has been a year and no comments. Not suggesting raising the bug priority, mostly want to be sure it had been seen. Eventually I'll likely take a shot at this myself, but haven't had time yet.
Essentially, the 'length' property of the data store behind a binary heap is left with an incorrect value in some circumstances. It's problematic if you run into it. It applies to a common use of a binary heap: identifying a top-k set of elements, followed by accessing the top-k elements in the top-k order.
Looking at the code further - I suspect it's not all that hard to fix, but that it'll take some familiarity with the different types of possible data stores (random access ranges) to get it correct.
Comment #2 by simen.kjaras — 2018-01-29T22:37:38Z
Sorry to disappoint you, but this is not actually a bug.
According to the documentation for heapify: "initialized with s" (s being the array).
And from BinaryHeap: "If Store is a container that supports insertBack, the BinaryHeap may grow by adding elements to the container."
That is, the passed array is used simply to create the initial heap, and the heap will keep using it until it needs to reallocate. Since you gave it an empty array, that's going to happen immediately.
Basically, what you're doing is equivalent to this code:
int[] a = [];
int[] b = a;
a ~= 2;
assert(b == [2]); // Will fail since b == [].
Supposing the heap kept a reference to the exact slice that was used to initialize it, this would cause problems:
auto foo() {
int[] a = [1,2,3,4];
return heapify(a);
}
Once the scope is exited, the reference to a would no longer be valid (but the array holding the elements would).
Another part of the explanation is that slices are weird. They behave a bit like ranges and a bit like containers, and in this case that duality is tripping you up.
Comment #3 by jrdemail2000-dlang — 2018-01-29T22:58:44Z
(In reply to Simen Kjaeraas from comment #2)
> Sorry to disappoint you, but this is not actually a bug.
>
> According to the documentation for heapify: "initialized with s" (s being
> the array).
> And from BinaryHeap: "If Store is a container that supports insertBack, the
> BinaryHeap may grow by adding elements to the container."
>
> That is, the passed array is used simply to create the initial heap, and the
> heap will keep using it until it needs to reallocate. Since you gave it an
> empty array, that's going to happen immediately.
>
> Basically, what you're doing is equivalent to this code:
>
> int[] a = [];
> int[] b = a;
> a ~= 2;
> assert(b == [2]); // Will fail since b == [].
>
> Supposing the heap kept a reference to the exact slice that was used to
> initialize it, this would cause problems:
>
> auto foo() {
> int[] a = [1,2,3,4];
> return heapify(a);
> }
>
> Once the scope is exited, the reference to a would no longer be valid (but
> the array holding the elements would).
>
> Another part of the explanation is that slices are weird. They behave a bit
> like ranges and a bit like containers, and in this case that duality is
> tripping you up.
Reserve is used to preallocate the necessary space. It is not necessary to reallocate.
Comment #4 by jrdemail2000-dlang — 2018-01-29T23:33:42Z
A bit of a follow-up to the last pair of comments: The discussion of whether size was reserved, the implied notions of reference vs value semantics, etc., is in my view an incorrect focus. The key issue involved is this paragraph in the documentation:
Simply extracting elements from a $(D BinaryHeap) container is
tantamount to lazily fetching elements of $(D Store) in descending
order. Extracting elements from the $(D BinaryHeap) to completion
leaves the underlying store sorted in ascending order but, again,
yields elements in descending order.
The second sentence is the key issue - A key use case of a binary heap is to take a top-n set of items, then generate the top-n order by extracting as described. This is what fails, and why this is a bug. The documentation indicates binary heap supports this, as it should. If there are cases where it does not support it, it should be clear about this. But really, this use case should be supported.
Comment #5 by simen.kjaras — 2018-01-30T07:43:27Z
(In reply to Jon Degenhardt from comment #3)
> Reserve is used to preallocate the necessary space. It is not necessary to
> reallocate.
Try this:
int[] a;
int[] b = a;
assert(a.ptr == b.ptr);
a.reserve(10);
assert(a.ptr != b.ptr);
Then tell me that reserve() doesn't reallocate. For good measure, here's a case where a isn't initialized to null:
int[] a = new int[1];
int[] b = a;
assert(a.ptr == b.ptr);
a.reserve(256);
assert(a.ptr != b.ptr);
(In reply to Jon Degenhardt from comment #4)
> A bit of a follow-up to the last pair of comments: The discussion of whether
> size was reserved, the implied notions of reference vs value semantics,
> etc., is in my view an incorrect focus. The key issue involved is this
> paragraph in the documentation:
>
> Simply extracting elements from a $(D BinaryHeap) container is
> tantamount to lazily fetching elements of $(D Store) in descending
> order. Extracting elements from the $(D BinaryHeap) to completion
> leaves the underlying store sorted in ascending order but, again,
> yields elements in descending order.
The slice from which you initialize the heap is not its store. This is why it's not a bug.
The source of confusion, as mentioned in comment #2, is that slices aren't really containers, but the language sometimes treats them as if they are. Slices in D are basically this structure:
struct Slice(T) {
T* ptr;
size_t length;
ref T opIndex(size_t index) {
assert(index < length);
return ptr[index];
}
Slice opSlice(size_t start, size_t end) {
assert(start <= end);
assert(end < length);
return Slice(ptr + start, end - start);
}
}
When you write 'int[] a;', you allocate the above structure on the stack. Once it goes out of scope, there's no way to update its ptr or length. Since it's not even passed by ref to heapify, heapify has no idea where the original slice lives, and is utterly incapable of updating it even if it wanted to. So yeah, the binaryHeap can't, shouldn't, and doesn't update the slice it's initialized from, and if you somehow make it do that, only bad things will come of it. Nasal demons will be the least of your problems.
> The second sentence is the key issue - A key use case of a binary heap is to
> take a top-n set of items, then generate the top-n order by extracting as
> described. This is what fails, and why this is a bug. The documentation
> indicates binary heap supports this, as it should.
It does support that, but not when it's initialized from a slice, since a slice is not a container.
If you want a store that's updated to reflect the contents of the heap, you need to use a store with reference semantics, like std.container.array:
import std.algorithm.comparison;
import std.container.array;
import std.container.binaryHeap;
import std.stdio;
auto a = Array!int(new int[0]);
auto h = a.heapify;
h.insert(1);
h.insert(2);
h.insert(3);
h.insert(4);
h.insert(5);
assert(equal(a[], [5,4,2,1,3]));
> If there are cases where
> it does not support it, it should be clear about this. But really, this use
> case should be supported.
Now here we might have found some common ground. All the examples for binaryHeap use slices. As pointed out ad nauseam above, this is a Bad Idea™ if you want to look at what the binaryHeap does to that slice when you insert into it. However, there's nothing inherently wrong in initializing a binaryHeap from a slice.
The documentation does not adequately convey this information, and should be amended to more clearly point this out. Filed this as bug 18333.