Bug 2255 – AA.remove() doesn't remove AA element when iterating using foreach

Status
RESOLVED
Resolution
WONTFIX
Severity
enhancement
Priority
P3
Component
phobos
Product
D
Version
D2
Platform
x86
OS
Windows
Creation time
2008-07-30T16:28:00Z
Last change time
2015-06-09T01:20:00Z
Assigned to
nobody
Creator
dsimcha

Comments

Comment #0 by dsimcha — 2008-07-30T16:28:18Z
It appears that using the .remove function to remove an element from an associative array does not always work when iterating over the AA with a foreach loop. Some elements are removed and some aren't. This issue appears on both DMD 2.017 and DMD 1.033. It also occurs on GDC 0.24, indicating that it's a front end issue. Here are two test cases: //This one works. import std.stdio; void main () { uint[uint] stuff; for(uint i = 0; i < 10_000; i++) { stuff[i] = i; } writefln(stuff.length); //Should be 10_000. foreach(k; stuff.keys) { //Only this line is different between cases. stuff.remove(k); } writefln(stuff.length); //Should be 0. Is. } //This one doesn't. import std.stdio; void main () { uint[uint] stuff; for(uint i = 0; i < 10_000; i++) { stuff[i] = i; } writefln(stuff.length); //Should be 10_000. foreach(k, v; stuff) { //Only this line is different between cases. stuff.remove(k); } writefln(stuff.length); //Should be 0. Actually is about 4_000. }
Comment #1 by shro8822 — 2008-07-30T16:35:04Z
http://www.digitalmars.com/d/1.0/statement.html#ForeachStatement "The aggregate must be loop invariant, meaning that elements to the aggregate cannot be added or removed from it in the NoScopeNonEmptyStatement."
Comment #2 by dsimcha — 2008-07-30T16:42:39Z
Sorry, didn't think of that. My apologies.
Comment #3 by davidl — 2008-07-30T22:46:47Z
Couldn't the compiler jump out and bark? It can emit something like : you can't modify variable 'stuff'
Comment #4 by shro8822 — 2008-07-31T11:55:36Z
there are about 3 cases where it could pick up on that and about 2 billion that it can't. I'd rather see time spent on bug fixes than this.
Comment #5 by andrej.mitrovich — 2011-05-24T22:39:57Z
Is it agreed that removing keys while traversing hashes is something you should never do? I could add some documentation on the AA page about this, if that's what everyone agrees to.
Comment #6 by issues.dlang — 2011-05-24T22:53:28Z
As I understand it, it is unsafe to remove members of an AA while iterating over it - and that includes unsafe in the memory sense. Perhaps something needs to be done so that it doesn't risk memory issues, but I wouldn't expect it to ever become something that would be expected to work (just not cause problems with memory). So, adding it to the docs would probably be a good idea.
Comment #7 by bearophile_hugs — 2011-05-25T03:20:31Z
This is similar Python2.6 code: stuff = {} for i in xrange(10000): stuff[i] = i print len(stuff) # Should be 10_000 for k in stuff: # Only this line is different between cases del stuff[k] print len(stuff) # Should be 0 If you try to run it Python prints: 10000 Traceback (most recent call last): File "...\test.py", line 5, in <module> for k in stuff: # Only this line is different between cases RuntimeError: dictionary changed size during iteration The underlying C implementation of this is simple: Python initializes a boolean flag when you scan a dict/set with a for. And the del statement tests that flag every time it is called. Such flag and test were added to Python because this is a common mistake for new Python programmers, and Python tries hard (successfully) to prevent the most common bugs. That Python error message is useful not just to avoid a bug, but also to teach new programmers to be aware of this problem more in general, even in other languages that don't give this error message. In D it's probably possible to set a flag of the associative array every time the associative array iteration functions are used. But testing this flag every time AA.remove()/AA.clear() get called is a bit costly for a system language. A compromise is to set and test this flag only when you compile your code in non-release mode, using a runtime assert. I think this may be acceptable and it helps avoid some bugs (but to do this the associative array code needs to be recompiled every time, it can't be in a statically compiled library).
Comment #8 by hsteoh — 2014-09-19T21:54:17Z
Modifying a container while iterating over it will always have counterintuitive semantics (not to mention potential memory-unsafety), because it violates the very idea of iteration from beginning to end. The only case where something like this will even remotely work, is if (1) you only ever remove the current element being iterated over; (2) the container is explicitly implemented to support this operation. Consider a simple linked list, root -> A -> B -> C -> D. Intuitively speaking, the iteration order should be A, B, C, D. Now suppose you're iterating over the list, and you're on item A. Suppose you delete item A. Intuitively speaking, the next iteration of the loop should continue with B. However, that isn't what happens, because when A was removed from the list, the .next pointer is nulled, so your overall iteration is actually only A (instead of A, B, C, D). You may attempt to fix this by caching the .next pointer before executing the loop body. However, this still exhibits counterintuitive behaviour: suppose while you're iterating over A, you decide to delete B. Intuitively speaking, the next iteration should be on C. However, since .next was cached, the next iteration is actually on B, and since B was removed from the list and B.next has been nulled, your overall iteration sequence is actually A, B (instead of the expected A, C, D). You may attempt to fix this by caching the entire iteration sequence in an array before running the loop body on each item, but even that still doesn't save you: if while you're iterating over A, you decide to delete C and D, then your iteration sequence is still A, B, C, D (as opposed to the expected sequence A, B), because C and D were cached in the array. Now, this is just with linked lists alone, which, relatively speaking, is rather tame. When you're dealing with AA's, a whole can o' worms is opened when you modify the AA in the middle of iteration. For example, consider what happens if a new key is inserted into the AA while you're iterating over it. Should the new key be included in the current iteration over the AA, or not? Well, regardless of what it *should* be, what actually happens will depend on how it's implemented. If the new key hashes to a position past your current iteration point, then it will be included in your current walk over the AA; otherwise it won't. And by definition of hash functions, this is unpredictable. Even worse yet, consider what happens if during iteration over an AA, you add a new key, and that triggers a rehash. Now what happens will depend on the implementation details of the AA. If you're lucky, you'll end up with the current walk over the AA encountering items that were previously encountered, and perhaps skipping over other items that should've been encountered eventually. If you're unlucky, your loop will crash because the iteration pointer got invalidated by the rehash. Same thing goes for deleting keys from AA's while iterating over them. Basically, there is no way to get intuitive behaviour out of iterating over a container that's changing mid-iteration, except for the one case described in the beginning -- which requires the container to explicitly support such an operation. The above analysis applies to virtually all containers, not just AA's, so I submit that this bug is unfixable.
Comment #9 by yebblies — 2014-11-04T08:08:13Z
It's actually fairly easy to check for the most common cases (item removed or inserted while iterating) by saving the length at the start of iteration, and checking it hasn't changed before visiting each item. The check needs to be before visiting, because it's safe to mutate the AA if the loop is exited right after. eg diff --git a/src/rt/aaA.d b/src/rt/aaA.d index cf4f139..5991393 100644 --- a/src/rt/aaA.d +++ b/src/rt/aaA.d @@ -545,10 +545,12 @@ int _aaApply(AA aa, in size_t keysize, dg_t dg) immutable alignsize = aligntsize(keysize); //printf("_aaApply(aa = x%llx, keysize = %d, dg = x%llx)\n", aa.impl, keysi + auto startLength = aa.impl.nodes; foreach (e; aa.impl.buckets[aa.impl.firstUsedBucketCache .. $]) { while (e) { + assert(aa.impl.nodes == startLength, "AA was modified during iteration"); auto result = dg(cast(void *)(e + 1) + alignsize); if (result) return result; The big problem is - this check should only be on in debug mode, but nobody ever uses a debug build of druntime, so it would be useless. I don't even know how to make debug build of druntime on win32.