Comment #0 by default_357-line — 2008-02-13T19:09:13Z
Consider, if you will, the following source.
import std.zip, std.file;
void main() {
auto arch = new ZipArchive;
auto test = new ArchiveMember;
test.name = "foob";
test.expandedData = [cast(ubyte) 2, 3, 4, 5];
arch.addMember(test);
auto data = arch.build;
write("test.zip", data);
auto arch2 = new ZipArchive(data);
// arch2.expand(arch2.directory["foob"]); // this fixes it
auto data2 = arch2.build;
write("test2.zip", data2); // test2.zip contains a single file, "foob", zero bytes. It's horribly empty in there.
}
It seems when Phobos builds the archive data without decompressing in-between, it doesn't consider the already existing compressed data to be of any particular relevance and just recompresses the (empty) uncompressed data.
This bug just cost me two days of work. Glee!
--feep
Comment #1 by advmail — 2013-11-16T15:13:36Z
It seems it's a bug on build() function.
build() function assumes that all ArchiveEntry as expandedData to be compressed.
So it cycle over elements and compress expandedData to compressedData.
If one of ArchiveEntry is just compressed and never expanded, expandedData will be empty, and build() tries to compress empty data.
I've fixed it in this way:
For each element I check if compressedSize > 0. If true it means that we have compressed data available, so we can copy it from buffer to field .compressedData and we can skip compression for that element (usually it read data from .expandedData, compress it and then copy to .compressedData). Moreover we have not to check for expandedSize or CRC: we just have them from header and we can't rely on expandedData field that is empty.
IMHO std.zip design it's not perfect. Every element copy compressData[] from "global" data[] (so we have memory filled x 2). Moreover it's not useful to store expandedData[] inside every element. It could be memory expensive (hey, we're trying go compress it!), and probably it's just a copy of data we store elsewhere.
I think we should compress data on the fly, and don't store internally expandedData. And that we just have to store compressedData one time.
Comment #2 by advmail — 2013-11-16T15:54:04Z
Created attachment 1290
My patch as explained in bug's comment
See previous comment
I now get a segfault for the following code.
import std.zip;
void main()
{
auto a1 = new ZipArchive();
auto m1 = new ArchiveMember();
m1.expandedData = new ubyte[](10);
a1.addMember(m1);
a1.build();
auto a2 = new ZipArchive();
a2.addMember(m1);
a2.build(); // segfaults
}
Comment #8 by advmail — 2013-12-06T00:47:18Z
This is another bug. m1 is owned by a1. m1._compressedData refers to data inside a1. So you're mixing data from two different objects/archives. I don't think it's a good idea. When you build a2, it tries to read compressed data from a2 data, but it should read from a1.
There's a simply workaround, but again, there's a problem on library design. If m1 is just compressed, we can read content it has inside _compressedData, if any. It filled it before with right reference to a1 data. This solve your bug, but probalby don't solve the mixing problem and of course it's not an elegant solution!
(In reply to comment #7)
> I now get a segfault for the following code.
>
> import std.zip;
>
> void main()
> {
> auto a1 = new ZipArchive();
> auto m1 = new ArchiveMember();
> m1.expandedData = new ubyte[](10);
> a1.addMember(m1);
> a1.build();
> auto a2 = new ZipArchive();
> a2.addMember(m1);
> a2.build(); // segfaults
> }
Comment #9 by code — 2013-12-06T01:07:58Z
(In reply to comment #8)
> This is another bug. m1 is owned by a1. m1._compressedData refers to data
> inside a1. So you're mixing data from two different objects/archives. I don't
> think it's a good idea. When you build a2, it tries to read compressed data
> from a2 data, but it should read from a1.
>
I see.
> There's a simply workaround, but again, there's a problem on library design. If
> m1 is just compressed, we can read content it has inside _compressedData, if
> any. It filled it before with right reference to a1 data. This solve your bug,
> but probalby don't solve the mixing problem and of course it's not an elegant
> solution!
>
I think m1._compressedData is simply a slice to (constant) GC memory which was allocated when reading a1. So here shouldn't be any problem to copy this data into a new zip archive.
Comment #10 by advmail — 2013-12-06T01:22:46Z
(In reply to comment #9)
> (In reply to comment #8)
> > This is another bug. m1 is owned by a1. m1._compressedData refers to data
> > inside a1. So you're mixing data from two different objects/archives. I don't
> > think it's a good idea. When you build a2, it tries to read compressed data
> > from a2 data, but it should read from a1.
> >
> I see.
>
> > There's a simply workaround, but again, there's a problem on library design. If
> > m1 is just compressed, we can read content it has inside _compressedData, if
> > any. It filled it before with right reference to a1 data. This solve your bug,
> > but probalby don't solve the mixing problem and of course it's not an elegant
> > solution!
> >
> I think m1._compressedData is simply a slice to (constant) GC memory which was
> allocated when reading a1. So here shouldn't be any problem to copy this data
> into a new zip archive.
Yes it is. But we're going to use data compressed/stored in/owned by another archive. That seems a mistake to me. I know we can do it, of course. Tonight I'll fix it then.
Comment #11 by advmail — 2013-12-06T12:54:09Z
I think it's not so easy to fix this "cross-archive" member management.
Consider for example expand() function.
It uses data[] from ZipArchive, but uses offset from ArchiveMember, that was created from another data[].
BTW it's not the same bug as the original one, I guess.
My vote goes for a full replacement of std.zip