Bug 8967 – dirEntries throws when encountering a "long path" on windows

Status
RESOLVED
Resolution
INVALID
Severity
normal
Priority
P2
Component
phobos
Product
D
Version
D2
Platform
All
OS
Windows
Creation time
2012-11-06T09:56:10Z
Last change time
2022-09-17T16:55:21Z
Assigned to
No Owner
Creator
Regan Heath

Comments

Comment #0 by regan — 2012-11-06T09:56:10Z
Exception thrown: Bypasses std.file.FileException@std\file.d(2434) === Bypassed === std.file.FileException@std\file.d(2434): E:\basic\2012-10-16_0\abcdefghi1abcdefghi2abcdefghi3abcdefghi4abcdefghi5abcdefghi6abcdefghi7abcdefghi8abcdefghi9abcdefg hi0\abcdefghi1abcdefghi2abcdefghi3abcdefghi4abcdefghi5abcdefghi6abcdefghi7abcdefghi8abcdefghi9abcdefghi0\abcdefghi1abcdefghi2abcdefghi3abcdefghi4abcdefghi5abcde fghi6abcdefghi7abcdefghi8abcdefghi9abcdefghi0: The system cannot find the path specified. The solution is simple/trivial. FindFirstFileW will handle "long paths" in the following format: \\?\<path> where <path> may be C:\folder\... or \\host\share\... So, the simplest fix is to alter DirIteratorImpl.stepIn, line: string search_pattern = buildPath(directory, "*.*"); to read: string search_pattern = r"\\?\" ~ buildPath(directory, "*.*"); (I have tried/tested this fix and it works for me) Alternately (better) have buildPath (on windows) detect paths longer than 256 characters and automatically pre-pend \\?\ itself. This should work ok as long as W functions are used, but will cause issues when some A functions, and other windows functions are used as long paths are not supported throughout. But, in those cases a too long short path would error - just in a different way.
Comment #1 by regan — 2012-11-06T10:01:39Z
I should add, the call to direntries used SpanMode.depth, and was passed a short path which contained the long path shown in the exception.
Comment #2 by dlang-bugzilla — 2012-12-21T21:24:30Z
I'm not sure whether the standard library should be adding the \\?\ prefix internally. If we are to change dirEntries, then we should also change all I/O routines that deal with paths. And even then, the resulting paths may not be usable by other components (used directly in the user's program), such as OS or C functions, and external programs. To work around this problem, the user could also use a function such as the one here [1], and pass path strings through it at the point where they enter the program's I/O logic layer. [1]: https://github.com/CyberShadow/RABCDAsm/blob/master/common.d#L25
Comment #3 by bugzilla — 2014-03-17T19:49:46Z
Comment #4 by dlang-bugzilla — 2014-03-17T20:29:07Z
It's quite simple, the path simply must be an absolute one with all forward slashes replaced with backslashes (so pretty standard normalization). Which "notes and exceptions and caveats" are you referring to, in particular?
Comment #5 by bugzilla — 2014-03-17T21:09:36Z
(In reply to comment #4) > It's quite simple, the path simply must be an absolute one with all forward > slashes replaced with backslashes (so pretty standard normalization). That's a big one. > Which "notes and exceptions and caveats" are you referring to, in particular? All of them; I quote: 1. The "\\?\" prefix can also be used with paths constructed according to the universal naming convention (UNC). To specify such a path using UNC, use the "\\?\UNC\" prefix. For example, "\\?\UNC\server\share", where "server" is the name of the computer and "share" is the name of the shared folder. These prefixes are not used as part of the path itself. They indicate that the path should be passed to the system with minimal modification, which means that you cannot use forward slashes to represent path separators, or a period to represent the current directory, or double dots to represent the parent directory. 2. Because you cannot use the "\\?\" prefix with a relative path, relative paths are always limited to a total of MAX_PATH characters. 3. For file I/O, the "\\?\" prefix to a path string tells the Windows APIs to disable all string parsing and to send the string that follows it straight to the file system. 4. Because it turns off automatic expansion of the path string, the "\\?\" prefix also allows the use of ".." and "." in the path names, which can be useful if you are attempting to perform operations on a file with these otherwise reserved relative path specifiers as part of the fully qualified path. 5. Many but not all file I/O APIs support "\\?\"; you should look at the reference topic for each API to be sure. 6. The "\\.\" prefix will access the Win32 device namespace instead of the Win32 file namespace. 7. If you're working with Windows API functions, you should use the "\\.\" prefix to access devices only and not files. 8. This was accomplished by adding the symlink named "GLOBALROOT" to the Win32 namespace, which you can see in the "Global??" subdirectory of the WinObj browser tool previously discussed, and can access via the path "\\?\GLOBALROOT". This prefix ensures that the path following it looks in the true root path of the system object manager and not a session-dependent path. So, no, I don't think this is so simple.
Comment #6 by dlang-bugzilla — 2014-03-17T21:18:30Z
(In reply to comment #5) > 1. The "\\?\" prefix can also be used with paths constructed according to the > universal naming convention (UNC). To specify such a path using UNC, use the > "\\?\UNC\" prefix. For example, "\\?\UNC\server\share", where "server" is the > name of the computer and "share" is the name of the shared folder. These > prefixes are not used as part of the path itself. They indicate that the path > should be passed to the system with minimal modification, which means that you > cannot use forward slashes to represent path separators, or a period to > represent the current directory, or double dots to represent the parent > directory. OK, so special case if the path starts with \\ but not \\?\ > 2. Because you cannot use the "\\?\" prefix with a relative path, relative > paths are always limited to a total of MAX_PATH characters. Absolute path as mentioned earlier > 3. For file I/O, the "\\?\" prefix to a path string tells the Windows APIs to > disable all string parsing and to send the string that follows it straight to > the file system. Does not apply on its own > 4. Because it turns off automatic expansion of the path string, the "\\?\" > prefix also allows the use of ".." and "." in the path names, which can be > useful if you are attempting to perform operations on a file with these > otherwise reserved relative path specifiers as part of the fully qualified > path. Path normalization as mentioned earlier > 5. Many but not all file I/O APIs support "\\?\"; you should look at the > reference topic for each API to be sure. I don't see this as a concern. D unit tests will reveal any Windows APIs that don't support this syntax > 6. The "\\.\" prefix will access the Win32 device namespace instead of the > Win32 file namespace. Does not apply. Win32 devices are akin to Posix /dev/ and are rarely accessed directly > 7. If you're working with Windows API functions, you should use the "\\.\" > prefix to access devices only and not files. Same as above, does not apply > 8. This was accomplished by adding the symlink named "GLOBALROOT" to the Win32 > namespace, which you can see in the "Global??" subdirectory of the WinObj > browser tool previously discussed, and can access via the path > "\\?\GLOBALROOT". This prefix ensures that the path following it looks in the > true root path of the system object manager and not a session-dependent path. Same as above, does not apply
Comment #7 by jayn — 2014-03-17T21:41:57Z
As has been stated in prior comments, it appears the intended solution is to prepend "\\?\" to the path string. It appears from the documentation that prepending should be done if path is > MAX_PATH-12 for directory names or MAX_PATH-1 for file paths. Maybe to simplify this, just use MAX_PATH-12. In addition, it appears from the documentation that two cases you would not want to prepend are if the path already begins with "\\?\" or "\\.". The code appears pretty consistently to use toUTF16z(name) in places where path strings are being converted to Windows paths for calling the unicode versions of the functions. I looked through the 26 references to toUTF16z, and the only one I couldn't confirm was a call to _wfopen. So, I think perhaps these toUTF16z calls could be changed to call a modified version that checks the path length and optionally prepends. we are using toUTF16z in the libraries when calling CreateFileW, SetCurrentDirectoryW, CopyFileW, FindFirstFileW, CreateDirectoryW, GetFileAttributesW, DeleteFileW, MoveFileExW, RemoveDirectoryW,SetFileAttributesW, GetEnvironmentVariableW, SetEnvironmentVariableW, Searching for LPCWSTR brings up a number of windows api defs that are unused in the libraries, but that would be candidates for use of toUTF16z. tempDir() uses MAX_PATH, where the larger 32K limit should be used for the buffer. same for thisExePath(), and probably its call to GetModuleFileNameW should use the toUTF16z WIN32_FIND_DATAW should probably have the 32K buffer to be consistent, but perhaps the operations that use it don't support the longer paths.
Comment #8 by jayn — 2014-03-18T09:50:23Z
2. Because you cannot use the "\\?\" prefix with a relative path, relative paths are always limited to a total of MAX_PATH characters. So, yes, relative paths don't work when you use that prefix (I tried). But this below did work for me, where e is a DirEntry. nm = r"\\?\" ~ absolutePath(e.name); There is also the issue of read-only status needing to be cleared if files or directories are to be removed. Our remove() and rmdir() don't take care of this for you, so if you are removing items with long paths the above expansion needs to be done before these getAttributes and setAttributes calls can succeed. uint att = getAttributes(fn); att ^= FILE_ATTRIBUTE_READONLY; setAttributes(fn, att);
Comment #9 by bugzilla — 2014-03-18T12:46:04Z
I don't agree with the "does not apply" comments. The \\?\ has different semantics, and having those semantics suddenly shift when the path gets long will be a surprising change to a user. My take is that if the user wants \\?\, they should prepend it themselves as a deliberate action, rather than hiding it in a conventional API. After all, the Windows functions themselves don't automatically add it, either. If it was straightforward they would have done it.
Comment #10 by dlang-bugzilla — 2014-03-18T16:14:47Z
Just to clarify, there are two prefixes: "\\?\" and "\\.\". The latter is used to access the Win32 device namespace, which is a fairly under-the-hood thing. The only way I see how this applies to our problem is to not prepend "\\?\" if the path already has a "\\.\" prefix.
Comment #11 by jayn — 2014-03-18T18:01:59Z
More surprising is attempting to remove a long directory path and having an exception occur. The libraries are already copying the user's string and adding the 0 termination prior to calling the windows api, so it seems to me to be a reasonable place to make other modifications if they are needed to accomplish the intended operation.
Comment #12 by StefanLiebig — 2021-08-18T20:44:09Z
I am currently struggling with this limitation and I wonder how e.g. the JVM deals with this limitation. As you probably know when using Java IO this limitation does not exist although the JVM uses the win32 API. A little bit of research (google) leads to https://stackoverflow.com/questions/10094365/how-does-java-circumvent-the-windows-max-path-winapi-limitation And it seems that the JVM internally does the required prefixing with "\\?\" of path names before calling the win32 API.
Comment #13 by dfj1esp02 — 2022-09-14T15:28:31Z
Another option is long path awareness in Windows 10 1607: https://docs.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation
Comment #14 by Ajieskola — 2022-09-17T16:55:21Z
Resolved as invalid, as requested from this forum post: https://forum.dlang.org/thread/[email protected] Quoting the rationale here: > The fix was to use an application manifest file and declare the application as being long path aware (see [another forum post](https://forum.dlang.org/thread/[email protected])). The issue should be closed IMO as dirEntries should throw an exception when your program encounters a long path while not being declared as long path aware (it would then match the behavior of std::filesystem:::recursive_directory_iterator when using Visual C++ which throw a runtime_error when it encounters a long path unless your program is declared to be long path aware in its manifest).