Bug 10876 – foreach and remove over associative array causes invalid data to appear

Status
RESOLVED
Resolution
DUPLICATE
Severity
major
Priority
P2
Component
dmd
Product
D
Version
D2
Platform
x86_64
OS
Linux
Creation time
2013-08-23T09:35:00Z
Last change time
2014-02-21T10:06:44Z
Assigned to
nobody
Creator
maximechevalierb

Comments

Comment #0 by maximechevalierb — 2013-08-23T09:35:40Z
This issue shows up using DMD64 D Compiler v2.063 on Linux. I have a foreach loop in which I remove values from an associative array. The behavior is completely broken. Items which are not in the associative array (including the null value!) end up appearing in the iteration. The following code snippet from my compiler produces the issue: writeln("associative array length: ", allocState.length); writeln("keys length: ", allocState.keys.length); // Remove dead values from the alloc state foreach (value, allocSt; allocState) { writeln("key: ", value); writeln((value in allocState)? true:false); if (value is null) writeln(allocState[null]); if (liveInfo.liveAtEntry(value, block) is false) allocState.remove(value); } The output is as follows: associative array length: 4 keys length: 4 key: $22 = arg 6 "v" true key: $20 = arg 4 "o" true key: $8 = add_i32 $1, $21 true key: $21 = arg 5 "i" true key: $7 = phi [call_reg(5662):0 => $14, call_cont(565C):0 => $12] false key: $12 = load_u32 $20, $11 false key: $12 = load_u32 $20, $11 false key: $1 = add_i32 32, $0 false key: $1 = add_i32 32, $0 false key: $0 = mul_i32 8, $7 false key: $0 = mul_i32 8, $7 false key: null false [email protected](1410): Range violation As you can see, most of the keys listed are not in the associative array, and the null value appears as a key, out of nowhere, which causes a crash.
Comment #1 by andrej.mitrovich — 2013-08-23T09:36:54Z
This will be marked as invalid, you can't safely iterate over an associative array and remove its keys while iterating.
Comment #2 by hsteoh — 2013-08-23T14:11:55Z
Modifying a container that you're iterating over has undefined results, because you're iterating over a set of keys that changes from under you as you go. In the case of AAs, you may end up dereferencing invalid pointers. The safe way to do it is to get an array of keys and iterate over that: foreach (k; aa.keys) { auto value = aa[k]; ... if (...) aa.remove(k); } N.B. you can only use .keys, not .byKey, because the latter also amounts to walking the data structure while it is being modified, which will cause problems. Using .keys is OK because it creates an array of keys (a snapshot of the set of keys at that point in time) before starting to modify the data structure.
Comment #3 by andrej.mitrovich — 2013-08-23T14:16:26Z
(In reply to comment #2) > N.B. you can only use .keys, not .byKey, because the latter also amounts to > walking the data structure while it is being modified, which will cause > problems. Using .keys is OK because it creates an array of keys (a snapshot of > the set of keys at that point in time) before starting to modify the data > structure. Yeah, as I learned in Issue 10821. Perhaps one could do an "!is null" check against the key if iterating via .byKey and removing at the same time? Well anyway, the safest bet is to use .keys.
Comment #4 by hsteoh — 2013-08-23T14:24:07Z
The problem with trying to iterate over a container that's being modified at the same time is that you can't guarantee much of anything. For example, you could insert !is null checks, but what happens if removing the key causes the AA to rehash itself (e.g., to optimize for space as the number of keys shrinks)? Then your subsequent iteration will be completely screwed up. Basically, such hacks can only work by relying on implementation details, which are not guaranteed by the spec. IOW, it's undefined behaviour.
Comment #5 by hsteoh — 2013-08-23T14:31:39Z
Some relevant references: http://stackoverflow.com/questions/3346696/python-why-is-it-not-safe-to-modify-sequence-being-iterated-on http://stackoverflow.com/questions/13981886/simultaneously-iterating-over-and-modifying-an-unordered-set The bottomline is that modifying the container while you're iterating over it is a bad idea (unless the container was specifically designed to support it). It almost always doesn't do what you think it should do.
Comment #6 by andrej.mitrovich — 2013-08-23T14:34:18Z
(In reply to comment #5) > The bottomline is that modifying the container while you're iterating over it > is a bad idea (unless the container was specifically designed to support it). > It almost always doesn't do what you think it should do. Btw, we should put this info up on dlang.org if it's not already there.
Comment #7 by dlang-bugzilla — 2013-12-17T15:56:25Z
Duplicate of issue 4179?
Comment #8 by hsteoh — 2013-12-17T16:58:47Z
*** This issue has been marked as a duplicate of issue 4179 ***