Is there a convenient way to map a file uri to os.path?
Use urllib.parse.urlparse
to get the path from the URI:
import osfrom urllib.parse import urlparsep = urlparse('file://C:/test/doc.txt')final_path = os.path.abspath(os.path.join(p.netloc, p.path))
The solution from @Jakob Bowyer doesn't convert URL encoded characters to regular UTF-8 characters. For that you need to use urllib.parse.unquote
.
>>> from urllib.parse import unquote, urlparse>>> unquote(urlparse('file:///home/user/some%20file.txt').path)'/home/user/some file.txt'
Of all the answers so far, I found none that catch edge cases, doesn't require branching, are both 2/3 compatible, and cross-platform.
In short, this does the job, using only builtins:
try: from urllib.parse import urlparse, unquote from urllib.request import url2pathnameexcept ImportError: # backwards compatability from urlparse import urlparse from urllib import unquote, url2pathnamedef uri_to_path(uri): parsed = urlparse(uri) host = "{0}{0}{mnt}{0}".format(os.path.sep, mnt=parsed.netloc) return os.path.normpath( os.path.join(host, url2pathname(unquote(parsed.path))) )
The tricky bit (I found) was when working in Windows with paths specifying a host. This is a non-issue outside of Windows: network locations in *NIX can only be reached via paths after being mounted to the root of the filesystem.
From Wikipedia: A file URI takes the form of file://host/path
, where host is the fully qualified domain name of the system on which the path is accessible [...]. If host is omitted, it is taken to be "localhost".
With that in mind, I make it a rule to ALWAYS prefix the path with the netloc
provided by urlparse
, before passing it to os.path.abspath
, which is necessary as it removes any resulting redundant slashes (os.path.normpath
, which also claims to fix the slashes, can get a little over-zealous in Windows, hence the use of abspath
).
The other crucial component in the conversion is using unquote
to escape/decode the URL percent-encoding, which your filesystem won't otherwise understand. Again, this might be a bigger issue on Windows, which allows things like $
and spaces in paths, which will have been encoded in the file URI.
For a demo:
import osfrom pathlib import Path # This demo requires pip install for Python < 3.4import systry: from urllib.parse import urlparse, unquote from urllib.request import url2pathnameexcept ImportError: # backwards compatability: from urlparse import urlparse from urllib import unquote, url2pathnameDIVIDER = "-" * 30if sys.platform == "win32": # WINDOWS filepaths = [ r"C:\Python27\Scripts\pip.exe", r"C:\yikes\paths with spaces.txt", r"\\localhost\c$\WINDOWS\clock.avi", r"\\networkstorage\homes\rdekleer", ]else: # *NIX filepaths = [ os.path.expanduser("~/.profile"), "/usr/share/python3/py3versions.py", ]for path in filepaths: uri = Path(path).as_uri() parsed = urlparse(uri) host = "{0}{0}{mnt}{0}".format(os.path.sep, mnt=parsed.netloc) normpath = os.path.normpath( os.path.join(host, url2pathname(unquote(parsed.path))) ) absolutized = os.path.abspath( os.path.join(host, url2pathname(unquote(parsed.path))) ) result = ("{DIVIDER}" "\norig path: \t{path}" "\nconverted to URI:\t{uri}" "\nrebuilt normpath:\t{normpath}" "\nrebuilt abspath:\t{absolutized}").format(**locals()) print(result) assert path == absolutized
Results (WINDOWS):
------------------------------orig path: C:\Python27\Scripts\pip.execonverted to URI: file:///C:/Python27/Scripts/pip.exerebuilt normpath: C:\Python27\Scripts\pip.exerebuilt abspath: C:\Python27\Scripts\pip.exe------------------------------orig path: C:\yikes\paths with spaces.txtconverted to URI: file:///C:/yikes/paths%20with%20spaces.txtrebuilt normpath: C:\yikes\paths with spaces.txtrebuilt abspath: C:\yikes\paths with spaces.txt------------------------------orig path: \\localhost\c$\WINDOWS\clock.aviconverted to URI: file://localhost/c%24/WINDOWS/clock.avirebuilt normpath: \localhost\c$\WINDOWS\clock.avirebuilt abspath: \\localhost\c$\WINDOWS\clock.avi------------------------------orig path: \\networkstorage\homes\rdekleerconverted to URI: file://networkstorage/homes/rdekleerrebuilt normpath: \networkstorage\homes\rdekleerrebuilt abspath: \\networkstorage\homes\rdekleer
Results (*NIX):
------------------------------orig path: /home/rdekleer/.profileconverted to URI: file:///home/rdekleer/.profilerebuilt normpath: /home/rdekleer/.profilerebuilt abspath: /home/rdekleer/.profile------------------------------orig path: /usr/share/python3/py3versions.pyconverted to URI: file:///usr/share/python3/py3versions.pyrebuilt normpath: /usr/share/python3/py3versions.pyrebuilt abspath: /usr/share/python3/py3versions.py