Bug 3848 – functions in std.file don't take symbolic links into account

Status
RESOLVED
Resolution
FIXED
Severity
normal
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
Other
OS
Linux
Creation time
2010-02-24T00:24:00Z
Last change time
2015-06-09T01:27:39Z
Keywords
patch, wrong-code
Assigned to
nobody
Creator
issues.dlang

Attachments

IDFilenameSummaryContent-TypeSize
572symlink.patchChanges for std.file.d to make it work with symlinks.text/plain5274
573file.dNext attempt at fixing problem.text/plain53097
574symlink_fix_file_d.patchPatch file for next attempt at fixing problem (instead of the full file).text/plain33274
577file_new.dNow fixes DirIterator and listdir() as well.text/x-dsrc54261
578std_file_symlinks_fix.patchPatch version of "Now fixes DirIterator and listdir() as well."text/plain35210
579file_new.dCurrent fix with DirIterator and listdir() fixed.text/x-dsrc54009
580std_file_symlinks_fix.patchPatch for "Current fix with DirIterator and listdir() fixed."text/plain34952
670file.dChanges based off of dmd 2.047text/x-dsrc56263
671std_file_symlinks_fix_047.patchPatch with changes based off of dmd 2.047 without needing dos2unix firsttext/plain35096
672file.dChanges based off of dmd 2.047text/x-dsrc56263
705file.dChanges based off of phobos revision 1817text/x-dsrc55375
706file_d_symlink_patch.txtPatch with changes based off of phobos revision 1817text/plain24524
769file_d_symlink_patch.txtPatch with changes based off of phobos revision 2046text/plain24196
770file.dChanges based off of phobos revision 2046text/x-dsrc55541
795file.dChanges based off of dmd 2.050.text/x-dsrc56893
796file_d_symlink_patch.txtPatch with changes based off of dmd 2.050.text/plain24930

Comments

Comment #0 by issues.dlang — 2010-02-24T00:24:31Z
Created attachment 572 Changes for std.file.d to make it work with symlinks. At present, if you call isfile() or isdir() on a symbolic link, it reports on what the link points to rather than the link itself. Presumably this is fine on Windows since as far as I know, Windows doesn't have symbolic links. But it is a big problem for Linux. The other problem is rmdirRecursive(). At present, I believe that it currently would follow symbolic links to directories rather than just deleting the symlink itself. Ideally, it would not follow symlinks. I have included a patch which I believe fixes the problems. Whether it's the best solution to the problem, I don't know. I added getLinkAttributes() which uses lstat instead of stat and make isdir() and isfile() use getLinkAttributes() rather than getAttributes(). I also added issymlink() to go with isdir() and isfile(). DirEntry was changed to use lstat as well. That should fix, isfile(), isdir(), and rmdirRecursive(). Most of the other functions like lastModified() continue to use fstat, so they continue to give information on the file which is pointed to rather than the symlink, which I think is what we want. The one wart is that having DirEntry use lstat means that all of the attributes that it gives are for the symlink itself which is not always what we want. So, I added a followLink() function which returns a DirEntry with the values for stat if it's a symbolic link or just returns a copy of the DirEntry if it's not. followLink() can almost certainly been done in a cleaner manner. I'd have used init, but I don't know how to get dirent for a single file rather than a whole directory. So, it uses stat and then checks the mode to figure out which value to set d_type for. DT_WHT is currently not used (it's commented out) because I have no clue what it represents, so I don't know what mode value from stat it corresponds to. In any case, if my patch isn't quite what you'd want to use, hopefully it's at least a step in the right direction. Hopefully the patch is in an appropriate format. It has changes for std.file.d
Comment #1 by andrei — 2010-02-24T05:11:32Z
Thanks, I'll look into it. Just like you, I'm unclear whether isdir() vs. isfile() should automatically dereference the link. I think we should keep the current behavior and add an islink() test that tells whether the thing is actually a link. I think removal should remove the actual link.
Comment #2 by issues.dlang — 2010-02-24T08:46:30Z
The main problem with having isdir() and isfile() continuing to use stat, and having the programmer just call issymlink() to check whether it's a link is that that's a call to stat _and_ a call to lstat, and from what I understand stat and lstat are a bit expensive (presumably because they have to talk to disk). Of course, if there's a high chance that a programmer will have to check whether a symlink points to a file or a directory, then they'll end up calling both isdir()/isfile() and issymlink() and so will have a call to both stat and lstat anyway, so it may be better to keep isdir() and isfile() using stat. Really, it depends on the most likely use case. rmdirRecursive() is definitely more efficient using just lstat though. Personally, I wish that there were a function which returned information for the file which is pointed to while telling you whether it's a symlink or not, but unfortunately, from what I've found, there is no such beast. From an efficiency perspective, it would be nice if there were a function which returned a DirEntry for a single file so that you can query that as to whether a file is a file, dir, or symlink without calling stat or lstat every time you call isfile(), isdir(), or issymlink(), but I couldn't figure out how to do that, so I don't know if it can be reasonably done (since presumably it would be silly and expensive to get the contents of the directory holding the file just to get the DirEntry for the file itself).
Comment #3 by issues.dlang — 2010-02-24T23:31:35Z
Actually, after playing around with this a bit more, I think that I'm leaning towards it being more useful for isdir() and isfile() to use stat. It can result in less efficient code, but it's a lot easier to deal with the symlinks that way. If isdir() and isfile() use lstat, you have to call getAttributes() on the symlink to find out what it is, and that means anding the return value with the appropriate flags, and that's not terribly user-friendly. In fact, it would actually be nice if getAttributes() returned an enum or struct rather a uint. I believe that it's quite OS-dependent as it is, and you have to look at file.d to have any clue what the return value represents. Ideally, you wouldn't have to look a function's source code to figure out how to deal with its return value. But the whole issue of getAttributes() is more or less a separate (albeit connected) issue. In any case, I think that from a user-friendliness perspective, you're right in your assessment that it would be better to have isdir() and isfile() use stat rather than lstat. It does, however, mean that you have to be that much more careful with isdir() or we risk recursive issues or removing stuff in the directory that it points to rather than just the link in functions like rmdirRecursive().
Comment #4 by issues.dlang — 2010-02-26T00:32:57Z
Created attachment 573 Next attempt at fixing problem. Okay. I tried to fix this more cleanly this time. The result is that DirEntry has changed a fair bit, but now isdir() and isfile() use stat rather than lstat, and you can now get a DirEntry for a specific file without getting all the DirEntries for a directory. I don't know whether it will be to your liking, and I have not tested the Windows code at all, but from the testing I did of the linux code, it appears to work (though it could always use more unittests). By making DirEntry as lazy/efficient as possible and making it possible to get one for any file, it should fix the efficiency issue of calling both stat and lstat as much as is possible (though it would still be nice if there were a posix function which did stat but also told you whether it was a symlink). There are also versions of isdir(), isfile(), and issymlink() which take an attributes (uint) value to make them the attributes values easier to use and facilitate rmdirRecursive() only calling lstat. I didn't add any for block devices and such, but perhaps they would be good to add as well. DirEntry is using properties now with its member variables being private, allowing for guarantees as to the state of its members. I don't know whether you'd consider that a problem or not, but I think that it's cleaner. I have attached the updated std.file.d.
Comment #5 by issues.dlang — 2010-02-26T00:35:59Z
Created attachment 574 Patch file for next attempt at fixing problem (instead of the full file). Here's a patch if you'd prefer that to the fully changed file. You'd need to apply dos2unix to the original file.d first for it work. Hopefully this at least comes close to solving the problem for you and saves you some work. Thanks.
Comment #6 by issues.dlang — 2010-02-28T02:43:05Z
I would point out that my changes appear to fix bug 1635, so fixing this bug could result in that one getting fixed at the same time. I also noticed that I should have named the direntry() function dirEntry(). I was not paying enough attention to the capitalization for dirEntries() and thought that it was direntries(), so my attempt to make their capitalization match failed.
Comment #7 by dfj1esp02 — 2010-03-01T07:04:11Z
(In reply to comment #1) > Thanks, I'll look into it. Just like you, I'm unclear whether isdir() vs. > isfile() should automatically dereference the link. I think we should keep the > current behavior and add an islink() test that tells whether the thing is > actually a link. Actually windows gone the same way: if you want to examine symlink, you should state it explicitly. (In reply to comment #2) > The main problem with having isdir() and isfile() continuing to use stat, and > having the programmer just call issymlink() to check whether it's a link is > that that's a call to stat _and_ a call to lstat, and from what I understand > stat and lstat are a bit expensive (presumably because they have to talk to > disk). They don't talk to disk, they talk to disk cache.
Comment #8 by issues.dlang — 2010-03-02T23:39:06Z
Created attachment 577 Now fixes DirIterator and listdir() as well. I don't know anything about the whole reading from disk reading from disk cache thing, but it's still I/O of some variety and, as I understand it, is not a cheap operation, so it's desirable to avoid extraneous stat and lstat calls regardless. Also, I figured out that DirIterator and listdir() did not deal with symlinks correctly. In particular, they always followed them, which could be very bad in some situations. So, I'm attaching another version which fixes that problem. It also makes DirEntry on Linux a bit better. It occurred to me that we might as well just save the result from stat rather than copying some of its values to member variables to be used for the properties. That way, we can make the resulting struct available. It makes the DirEntry struct a bit larger, but it seems useful enough to me that it should be worth it. However, I didn't add any more of stat's properties (like inode or block size) to DirEntry since those are likely used rarely enough that the programmer can just access the stat struct in such cases. That and it allows DirEntry's API to be mostly the same between platforms. As far as I've determined, this fix works on Linux (certainly, all of the unit tests pass, and all of the other testing that I've done looks good), but I haven't tested the Windows code at all. Most of the changes are to the Posix side though, so odds are that the Windows side is okay.
Comment #9 by issues.dlang — 2010-03-02T23:40:53Z
Created attachment 578 Patch version of "Now fixes DirIterator and listdir() as well." Here's the patch version of the most recent fix. Again, you'll need to run dos2unix on the original std.file.d before applying it.
Comment #10 by issues.dlang — 2010-03-02T23:51:13Z
Created attachment 579 Current fix with DirIterator and listdir() fixed. My apologies. I forgot to change rmdirRecurse() back to use SpanMode.depth after fixing DirIterator. Here's the version with it back at SpanMode.depth.
Comment #11 by issues.dlang — 2010-03-02T23:52:20Z
Created attachment 580 Patch for "Current fix with DirIterator and listdir() fixed." Patch version.
Comment #12 by issues.dlang — 2010-06-21T22:25:35Z
Created attachment 670 Changes based off of dmd 2.047 File after changes using version from dmd 2.047 as the base without running dos2uinx on it this time.
Comment #13 by issues.dlang — 2010-06-21T22:27:13Z
Created attachment 671 Patch with changes based off of dmd 2.047 without needing dos2unix first Here's the patch version based off of dmd 2.047 but without needing to run dos2unix on the original first.
Comment #14 by issues.dlang — 2010-06-21T22:31:22Z
Created attachment 672 Changes based off of dmd 2.047 Okay. It's mildly dumb to attach this again, but apparently I obsoleted the wrong file, so I'm reattaching the changed file based on dmd 2.047 in order to obsolete the correct file (I think that I obsoleted this file when attaching the patch).
Comment #15 by issues.dlang — 2010-08-07T23:19:37Z
Created attachment 705 Changes based off of phobos revision 1817 Various changes have been made to std.file since the release of dmd 2.047, so it seems prudent to update my patch. Here's the fully updated file, merging my changes and the changes since dmd 2.047 as of phobos revision 1817.
Comment #16 by issues.dlang — 2010-08-07T23:22:38Z
Created attachment 706 Patch with changes based off of phobos revision 1817 Here's the patch version of the changes. By the way, is there any particular reason that these changes have not been applied to Phobos beyond no one taking the time to look at the issue? That is, is there anything definitively wrong with my patch that I would need to fix? Or is it just that the Phobos devs are busy with other things and haven't looked at my patch?
Comment #17 by clugdbug — 2010-08-08T13:12:55Z
(In reply to comment #16) > Created an attachment (id=706) [details] > Patch with changes based off of phobos revision 1817 > > Here's the patch version of the changes. > > By the way, is there any particular reason that these changes have not been > applied to Phobos beyond no one taking the time to look at the issue? That is, > is there anything definitively wrong with my patch that I would need to fix? Or > is it just that the Phobos devs are busy with other things and haven't looked > at my patch? The latter. There's a huge backlog of quality patches -- for the compiler as well. Unfortunately a couple of issues have been grabbing all the attention.
Comment #18 by issues.dlang — 2010-08-08T16:05:16Z
I figured as much, but I didn't want to keep trying to keep my patch up-to-date if there were definitely a problem with it that made it unacceptable. So, I'll just keep making sure it's at least reasonably up-to-date until someone gets around to it. Thanks.
Comment #19 by issues.dlang — 2010-09-23T20:12:14Z
Created attachment 769 Patch with changes based off of phobos revision 2046 Okay, I've updated the patch for dmd 2.049 (the diff is off of phobos svn r2046, but there aren't any changes between 2.049 and r2046). I also added a test for bug #4929, which this patch fixes without any new changes.
Comment #20 by issues.dlang — 2010-09-23T20:13:21Z
Created attachment 770 Changes based off of phobos revision 2046 Here's the full file with my changes.
Comment #21 by issues.dlang — 2010-10-29T19:15:08Z
Created attachment 795 Changes based off of dmd 2.050. Here's the file updated for dmd 2.050. I also added a deprecated alias from fullname to name (which I should have done in the beginning). Other than that, it's essentially the same as before except that it's been updated to reflect the changes in file.d in the official release.
Comment #22 by issues.dlang — 2010-10-29T19:17:04Z
Created attachment 796 Patch with changes based off of dmd 2.050. Here's the updated patch.
Comment #23 by issues.dlang — 2011-01-19T03:15:17Z
Fixed in revision 2351.