← Back to index | Original Bugzilla link

Bug 1832 – reading/writing an archive causes data loss; std.zip horribly broken

Status: RESOLVED
Resolution: FIXED
Severity: regression
Priority: P2
Component: phobos
Product: D
Version: D2
Platform: All
OS: All
Creation time: 2008-02-13T19:09:00Z
Last change time: 2014-02-16T12:07:26Z
Keywords: bounty
Assigned to: nobody
Creator: default_357-line

Attachments

ID	Filename	Summary	Content-Type	Size
1290	patch.txt	My patch as explained in bug's comment	text/plain	2840

Comments

Comment #0 by default_357-line — 2008-02-13T19:09:13Z

Consider, if you will, the following source. import std.zip, std.file; void main() { auto arch = new ZipArchive; auto test = new ArchiveMember; test.name = "foob"; test.expandedData = [cast(ubyte) 2, 3, 4, 5]; arch.addMember(test); auto data = arch.build; write("test.zip", data); auto arch2 = new ZipArchive(data); // arch2.expand(arch2.directory["foob"]); // this fixes it auto data2 = arch2.build; write("test2.zip", data2); // test2.zip contains a single file, "foob", zero bytes. It's horribly empty in there. } It seems when Phobos builds the archive data without decompressing in-between, it doesn't consider the already existing compressed data to be of any particular relevance and just recompresses the (empty) uncompressed data. This bug just cost me two days of work. Glee! --feep

Comment #1 by advmail — 2013-11-16T15:13:36Z

It seems it's a bug on build() function. build() function assumes that all ArchiveEntry as expandedData to be compressed. So it cycle over elements and compress expandedData to compressedData. If one of ArchiveEntry is just compressed and never expanded, expandedData will be empty, and build() tries to compress empty data. I've fixed it in this way: For each element I check if compressedSize > 0. If true it means that we have compressed data available, so we can copy it from buffer to field .compressedData and we can skip compression for that element (usually it read data from .expandedData, compress it and then copy to .compressedData). Moreover we have not to check for expandedSize or CRC: we just have them from header and we can't rely on expandedData field that is empty. IMHO std.zip design it's not perfect. Every element copy compressData[] from "global" data[] (so we have memory filled x 2). Moreover it's not useful to store expandedData[] inside every element. It could be memory expensive (hey, we're trying go compress it!), and probably it's just a copy of data we store elsewhere. I think we should compress data on the fly, and don't store internally expandedData. And that we just have to store compressedData one time.

Comment #2 by advmail — 2013-11-16T15:54:04Z

Created attachment 1290 My patch as explained in bug's comment See previous comment

Comment #3 by andrei — 2013-11-16T16:58:10Z

Please redo your patch as a pull request at https://github.com/D-Programming-Language/. Thanks!

Comment #4 by advmail — 2013-11-17T00:53:12Z

(In reply to comment #3) > Please redo your patch as a pull request at > https://github.com/D-Programming-Language/. Thanks! https://github.com/D-Programming-Language/phobos/pull/1697

Comment #5 by github-bugzilla — 2013-11-21T20:04:42Z

Commits pushed to master at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/95af4e3c8c78020109af75c98ed5478c1b813ba7 Fixed phobos bug 1832 https://github.com/D-Programming-Language/phobos/commit/289c10521ae8f4c36cf00cbf71dac8ae9265809e Merge pull request #1697 from trikko/master Issue 1832 - reading/writing an archive causes data loss; std.zip horribly broken

Comment #6 by bugzilla — 2013-11-24T15:52:09Z

Resolved for D2, not for D1.

Comment #7 by code — 2013-12-05T20:46:34Z

I now get a segfault for the following code. import std.zip; void main() { auto a1 = new ZipArchive(); auto m1 = new ArchiveMember(); m1.expandedData = new ubyte[](10); a1.addMember(m1); a1.build(); auto a2 = new ZipArchive(); a2.addMember(m1); a2.build(); // segfaults }

Comment #8 by advmail — 2013-12-06T00:47:18Z

This is another bug. m1 is owned by a1. m1._compressedData refers to data inside a1. So you're mixing data from two different objects/archives. I don't think it's a good idea. When you build a2, it tries to read compressed data from a2 data, but it should read from a1. There's a simply workaround, but again, there's a problem on library design. If m1 is just compressed, we can read content it has inside _compressedData, if any. It filled it before with right reference to a1 data. This solve your bug, but probalby don't solve the mixing problem and of course it's not an elegant solution! (In reply to comment #7) > I now get a segfault for the following code. > > import std.zip; > > void main() > { > auto a1 = new ZipArchive(); > auto m1 = new ArchiveMember(); > m1.expandedData = new ubyte[](10); > a1.addMember(m1); > a1.build(); > auto a2 = new ZipArchive(); > a2.addMember(m1); > a2.build(); // segfaults > }

Comment #9 by code — 2013-12-06T01:07:58Z

(In reply to comment #8) > This is another bug. m1 is owned by a1. m1._compressedData refers to data > inside a1. So you're mixing data from two different objects/archives. I don't > think it's a good idea. When you build a2, it tries to read compressed data > from a2 data, but it should read from a1. > I see. > There's a simply workaround, but again, there's a problem on library design. If > m1 is just compressed, we can read content it has inside _compressedData, if > any. It filled it before with right reference to a1 data. This solve your bug, > but probalby don't solve the mixing problem and of course it's not an elegant > solution! > I think m1._compressedData is simply a slice to (constant) GC memory which was allocated when reading a1. So here shouldn't be any problem to copy this data into a new zip archive.

Comment #10 by advmail — 2013-12-06T01:22:46Z

(In reply to comment #9) > (In reply to comment #8) > > This is another bug. m1 is owned by a1. m1._compressedData refers to data > > inside a1. So you're mixing data from two different objects/archives. I don't > > think it's a good idea. When you build a2, it tries to read compressed data > > from a2 data, but it should read from a1. > > > I see. > > > There's a simply workaround, but again, there's a problem on library design. If > > m1 is just compressed, we can read content it has inside _compressedData, if > > any. It filled it before with right reference to a1 data. This solve your bug, > > but probalby don't solve the mixing problem and of course it's not an elegant > > solution! > > > I think m1._compressedData is simply a slice to (constant) GC memory which was > allocated when reading a1. So here shouldn't be any problem to copy this data > into a new zip archive. Yes it is. But we're going to use data compressed/stored in/owned by another archive. That seems a mistake to me. I know we can do it, of course. Tonight I'll fix it then.

Comment #11 by advmail — 2013-12-06T12:54:09Z

I think it's not so easy to fix this "cross-archive" member management. Consider for example expand() function. It uses data[] from ZipArchive, but uses offset from ArchiveMember, that was created from another data[]. BTW it's not the same bug as the original one, I guess. My vote goes for a full replacement of std.zip

Comment #12 by code — 2014-01-12T12:11:55Z

(In reply to comment #11) > I think it's not so easy to fix this "cross-archive" member management. The regression was fixed with https://github.com/D-Programming-Language/phobos/pull/1775.

Comment #13 by mk — 2014-01-12T12:59:44Z

Was this really fixed ? I still get zero length data (code from comment #0) mk@bid:~/work/bugs$ unzip -lv ziptest1.zip Archive: ziptest1.zip Length Method Size Cmpr Date Time CRC-32 Name -------- ------ ------- ---- ---------- ----- -------- ---- 4 Stored 4 0% 1980-00-00 00:00 9d0d9845 ziptest.bin -------- ------- --- ------- 4 4 0% 1 file mk@bid:~/work/bugs$ unzip -lv ziptest2.zip Archive: ziptest2.zip Length Method Size Cmpr Date Time CRC-32 Name -------- ------ ------- ---- ---------- ----- -------- ---- 4 Stored 0 100% 1980-00-00 00:00 00000000 ziptest.bin -------- ------- --- ------- 4 0 100% 1 file

Comment #14 by andrej.mitrovich — 2014-02-16T05:56:28Z

New pull: https://github.com/D-Programming-Language/phobos/pull/1870

Comment #15 by github-bugzilla — 2014-02-16T12:02:03Z

Commits pushed to master at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/1a3f9ab273dd6280cb9b4eda4a0da50d8c89fa88 fix Issue 1832 - rebuilding ZipArchive looses data - Already need to set compressedData slice when reading the directory index not only during expand. https://github.com/D-Programming-Language/phobos/commit/4bdd5f2c6ada5bc0ed37214f0e9d017f50ca8a83 Merge pull request #1870 from MartinNowak/fix1832 fix Issue 1832 - rebuilding ZipArchive looses data

Comment #16 by andrej.mitrovich — 2014-02-16T12:07:26Z

Marking as D2 only since Phobos D1 bugs will not be fixed.