Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os.path.abspath() & posixpath.realpath() don't work for relative paths on the other platform #118345

Closed
nineteendo opened this issue Apr 27, 2024 · 19 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@nineteendo
Copy link
Contributor

nineteendo commented Apr 27, 2024

Bug report

Bug description:

Windows

>>> import posixpath
>>> posixpath.abspath('foo')
'C:\\Users\\wanne/foo'
>>> posixpath.realpath('foo')
'C:\\Users\\wanne/foo'
>>> posixpath.isabs('C:\\Users\\wanne/foo')
False

Expected an absolute path: /foo (or /Users/wanne/foo, which is more work). For example (when #117855 lands):

try:
    from posix import _path_abspath
except ImportError: # not running on Unix - mock up something sensible
    def abspath(path):
        """Return an absolute path."""
        path = os.fspath(path)
        sep = _get_sep(s)
        if not path.startswith(sep):
            path = sep + path
        return normpath(path)
else:  # use native Unix method on Unix
    def abspath(path):
        """Return an absolute path."""
        path = os.fspath(path)
        if isinstance(path, bytes):
            return os.fsencode(_path_abspath(os.fsdecode(path)))
        return _path_abspath(path)

Unix

>>> import ntpath
>>> ntpath.abspath('foo')
'\\Users\\wanne\\foo'
>>> ntpath.isabs('\\Users\\wanne\\foo')
False
>>> ntpath.realpath('foo', strict=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: realpath() got an unexpected keyword argument 'strict'

Expected an absolute path: C:\\foo (or C:\\Users\\wanne\\foo, which is more work for drive relative paths). For example:

try:
    from nt import _getfullpathname
except ImportError: # not running on Windows - mock up something sensible
    def abspath(path):
        """Return the absolute version of a path."""
        path = os.fspath(path)
        if not isabs(path):
            if isinstance(path, bytes):
                path = join(b'C:\\', path)
            else:
                path = join('C:\\', path)
        return normpath(path)
else:  # use native Windows method on Windows
    def abspath(path):
        """Return the absolute version of a path."""
        try:
            return _getfullpathname(normpath(path))
        except (OSError, ValueError):
            path = os.fspath(path)
            if not isabs(path):
                if isinstance(path, bytes):
                    path = join(os.getcwdb(), path)
                else:
                    path = join(os.getcwd(), path)
            return normpath(path)

CPython versions tested on:

CPython main branch

Operating systems tested on:

macOS, Windows

Linked PRs

@nineteendo nineteendo added the type-bug An unexpected behavior, bug, or error label Apr 27, 2024
@nineteendo
Copy link
Contributor Author

If you agree I'll make a pull request. At least for posixpath, the change doesn't lead to more code.

@eryksun
Copy link
Contributor

eryksun commented Apr 27, 2024

For comparison, note that pathlib.PurePath instances don't provide absolute() because it's not meaningful for cross-platform use. For posixpath and ntpath, I think assuming the root directory is fine, and drive "C:" for ntpath.

Here's an implementation of ntpath.abspath() with correct behavior for drive-relative paths:

try:
    from nt import _getfullpathname
except ImportError:
    # Non-Windows platforms
    def abspath(path):
        """Return the absolute version of a path."""
        path = os.fspath(path)
        if not isabs(path):
            if isinstance(path, bytes):
                drive = splitroot(path)[0] or b'C:'
                path = join(drive, b'\\', path)
            else:
                drive = splitroot(path)[0] or 'C:'
                path = join(drive, '\\', path)
        return normpath(path)
else:
    def abspath(path):
        """Return the absolute version of a path."""
        try:
            return _getfullpathname(normpath(path))
        except (OSError, ValueError):
            path = os.fspath(path)
            if not isabs(path):
                if isinstance(path, bytes):
                    cwd = os.getcwdb()
                else:
                    cwd = os.getcwd()
                cwd_drive = splitroot(cwd)[0]
                path_drive = splitroot(path)[0]
                if path_drive and path_drive != cwd_drive:
                    try:
                        cwd = _getfullpathname(path_drive)
                    except (OSError, ValueError):
                        pass
                path = join(cwd, path)
            return normpath(path)

@nineteendo
Copy link
Contributor Author

I improved your snippet: now it's a bit faster and works correctly when it fails to determine the current working directory.

@nineteendo
Copy link
Contributor Author

posixpath.realpath() also doesn't return an absolute path for relative paths on Windows:

>>> import posixpath
>>> posixpath.realpath('foo')
'C:\\Users\\wanne/foo'
>>> posixpath.isabs('C:\\Users\\wanne/foo')
False

@nineteendo nineteendo changed the title ntpath.abspath() & posixpath.abspath() don't work for relative paths on the other platform os.path.abspath() & posixpath.realpath() don't work for relative paths on the other platform Apr 27, 2024
@eryksun
Copy link
Contributor

eryksun commented Apr 27, 2024

I improved your snippet: now it's a bit faster and works correctly when it fails to determine the current working directory.

Great. If _getfullpathname(drive) fails, barring some catastrophic API failure, it's a ValueError because the drive string has an embedded null (e.g. '\x00:'), or because it's a bytes path that can't be decoded with the filesystem encoding (e.g. b'\xff:' given the filesystem encoding is UTF-8).

Actually, the latter UnicodeDecodeError is raised earlier in 3.13, by splitroot(). Maybe that's a bug. It's certainly new behavior compared to 3.12. The same applies to normpath() in 3.10 vs 3.11+.

@ronaldoussoren
Copy link
Contributor

Why do you think there's a bug here? Posixpath is documented to work on POSIX style paths and that will cause problems when working with Windows style paths. Likewise when working with POSIX style paths on using ntpath.

In particular APIs that end up interacting with the filesystem, like abspath and realpath cannot do the right thing when running on a "foreign" system. It is possible to define semantics, but I'm not convinced that's worth the extra code.

@nineteendo
Copy link
Contributor Author

Why do you think there's a bug here?

We're literally testing that ntpath.realpath(), ntpath.relpath() & posixpath.relpath() work partially for relative paths on the wrong platform. These tests broke when I tried to optimise os.path.abspath() & os.path.relpath().

And you would expect that os.path.abspath() & os.path.realpath() can't return relative paths under any circumstances. (especially since we're passing a valid path to them)

It is possible to define semantics, but I'm not convinced that's worth the extra code.

The only extra code is this function (a wrapper that adds the strict argument to posixpath.abspath()), matching ntpath.realpath():

def realpath(path, *, strict=False):
    """Return an absolute path."""
    if strict:
        raise NotImplementedError('realpath: strict unavailable on this platform')
    return abspath(path)

When we have a C implementation for posixpath.abspath(), all other functions will need a fallback anyway.
The one for ntpath.abspath() must be implemented separately to handle drive relative paths correctly, in case of an error.
It would be nice if these fallback functions (which are only used on the other platform) also followed the correct semantics.

@ronaldoussoren
Copy link
Contributor

Why do you think there's a bug here?

We're literally testing that ntpath.realpath(), ntpath.relpath() & posixpath.relpath() work partially for relative paths on the wrong platform. These tests broke when I tried to optimise os.path.abspath() & os.path.relpath().

And you would expect that os.path.abspath() & os.path.realpath() can't return relative paths under any circumstances. (especially since we're passing a valid path to them)

The implementation IMHO rightfully assumes that os.getcwd() returns an absolute path. Using ntpath on Unix or posixpath on Windows historically is only valid for functions that can work without interacting with the filesystem, e.g. APIs like dirname that only use string manipulation.

Other's don't work, and often cannot work in an expected way due to the difference in path separators on Windows and POSIX.

It is possible to define semantics, but I'm not convinced that's worth the extra code.

The only extra code is this function (a wrapper that adds the strict argument to posixpath.abspath()), matching ntpath.realpath():

def realpath(path, *, strict=False):
    """Return an absolute path."""
    if strict:
        raise NotImplementedError('realpath: strict unavailable on this platform')
    return abspath(path)

When we have a C implementation for posixpath.abspath(), all other functions will need a fallback anyway. The one for ntpath.abspath() must be implemented separately to handle drive relative paths correctly, in case of an error. It would be nice if these fallback functions (which are only used on the other platform) also followed the correct semantics.

IMHO There are no "correct" semantics when calling realpath on a foreign path. I'd say "don't do that than" is correct answer here, on Windows posixpath.abspath('d:foo') will give a nonsensical answer, as will posixpath.abpath(r'c:\foo').

Note that teaching posixpath about drive letters when used on Windows breaks the API contract because POSIX file system paths don't have drive letters.

@nineteendo
Copy link
Contributor Author

Other's don't work, and often cannot work in an expected way due to the difference in path separators on Windows and POSIX.

That still doesn't explain why we're calling these functions with relative paths in the tests. (absolute paths are fine, though)

IMHO There are no "correct" semantics when calling realpath on a foreign path.

I agree, but that's NOT what I want to change here. I want posixpath to work on posix style paths and ntpath to work on Windows style paths:

posixpath.abspath('foo/bar')  -> '/foo/bar'
posixpath.realpath('foo/bar') -> '/foo/bar'
ntpath.abspath(r'foo\bar')    -> 'C:\\foo\\bar'
ntpath.abspath(r'\foo\bar')   -> 'C:\\foo\\bar'
ntpath.abspath(r'D:foo\bar')  -> 'D:\\foo\\bar'

@nineteendo
Copy link
Contributor Author

I copied the fix for drive relative paths to the other pull request. If this gets rejected, we don't lose it.

@barneygale
Copy link
Contributor

In particular APIs that end up interacting with the filesystem, like abspath and realpath cannot do the right thing when running on a "foreign" system. It is possible to define semantics, but I'm not convinced that's worth the extra code.

This is exactly right.

Perhaps we should work towards making abspath, realpath etc unavailable from "foreign" path modules, e.g. raise a warning when a user calls posixpath.realpath() from Windows.

@nineteendo
Copy link
Contributor Author

That's too strict. ntpath.relpath() & posixpath.relpath() work correctly for absolute paths. We could only raise a warning for relative paths, though:

def abspath(path):
    """Return the absolute version of a path."""
    path = os.fspath(path)
    if not isabs(path):
        warn("passing relative paths is deprecated", DeprecationWarning, stacklevel=2)
        drive, _, path = splitroot(path)
        if isinstance(path, bytes):
            path = join(drive or b'C:', b'\\' + path)
        else:
            path = join(drive or 'C:', '\\' + path)
    return normpath(path)

@ronaldoussoren
Copy link
Contributor

In particular APIs that end up interacting with the filesystem, like abspath and realpath cannot do the right thing when running on a "foreign" system. It is possible to define semantics, but I'm not convinced that's worth the extra code.

This is exactly right.

Perhaps we should work towards making abspath, realpath etc unavailable from "foreign" path modules, e.g. raise a warning when a user calls posixpath.realpath() from Windows.

Maybe, although I'm not sure if adding warnings is worth the effort because the majority of uses of these APIs is through os.path. That said, I did a quick GitHub search on posixpath.abspath and that has more hits than I'd expect, and at least one of them is likely broken.

That's too strict. ntpath.relpath() & posixpath.relpath() work correctly for absolute paths. We could only raise a warning for relative paths, though:

relpath calls abspath and hence won't correctly when used on "foreign" paths. More importantly, I don't think we should suggest that these function do work correctly or try to define semantics for them (when using posixpath on Windows or ntpath on Unix)

@nineteendo
Copy link
Contributor Author

relpath calls abspath and hence won't correctly when used on "foreign" paths.

It DOES work correctly for absolute paths ('/foo/bar' for posixpath.relpath() & C:\\foo\\bar for ntpath.relpath()).
When a path is absolute, it's returned unchanged:

cpython/Lib/posixpath.py

Lines 409 to 418 in f5b7e39

def abspath(path):
"""Return an absolute path."""
path = os.fspath(path)
if isinstance(path, bytes):
if not path.startswith(b'/'):
path = join(os.getcwdb(), path)
else:
if not path.startswith('/'):
path = join(os.getcwd(), path)
return normpath(path)

cpython/Lib/ntpath.py

Lines 604 to 618 in f5b7e39

def _abspath_fallback(path):
"""Return the absolute version of a path as a fallback function in case
`nt._getfullpathname` is not available or raises OSError. See bpo-31047 for
more.
"""
path = os.fspath(path)
if not isabs(path):
if isinstance(path, bytes):
cwd = os.getcwdb()
else:
cwd = os.getcwd()
path = join(cwd, path)
return normpath(path)

ntpath.relpath() isn't that useful, but posixpath.relpath() can be used for urls.
People that are passing absolute paths are already doing what they need to prevent garbage output.

@nineteendo
Copy link
Contributor Author

More importantly, I don't think we should suggest that these function do work correctly or try to define semantics for them (when using posixpath on Windows or ntpath on Unix)

Again, the fallback functions I want to change are ONLY used when on a foreign platform. It's a small effort to assume the cwd is / for posixpath & C:/ for ntpath, which doesn't lead to extra code.

That being said, if you don't want ANY extra code, we could not fix posixpath.realpath() and not add a fallback function.

@barneygale
Copy link
Contributor

barneygale commented Apr 28, 2024

ntpath.relpath() isn't that useful, but posixpath.relpath() can be used for urls. People that are passing absolute paths are already doing what they need to prevent garbage output.

Relative URLs work differently to relative paths, so users doing this are asking for trouble.

It's a small effort to assume the cwd is / for posixpath & C:/ for ntpath, which doesn't lead to extra code.

This is not going to happen.

@nineteendo
Copy link
Contributor Author

Sighs, that one time I waited for some sign of approval before making a pull request.

@nineteendo nineteendo closed this as not planned Won't fix, can't repro, duplicate, stale Apr 28, 2024
@nineteendo
Copy link
Contributor Author

Hmm, this might not even have been possible when not using the c implementation of Python...

@novaTopFlex
Copy link

On my system (Ubuntu 22.04 LTS, Python 3.10.12 [GCC 11.4.0]) on linux), I am a POSIX user, not an NT user, but I still have a similar situation. In my case, I have imported both posixpath and ntpath, and when I have attempted to utilize the function ntpath.realpath() drive letters are not automatically printed. I believe that my situation is the proper and expected situation as the drive letter is unknown (would the file be on the C: drive? D: drive?). But I do agree that the POSIX path conversion on Microsoft Windows is unacceptable, as the POSIX formula seemed to remain on your interpretation. Also, the os.path.abspath() function is not intended for path conversion between POSIX and NT systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants