Extract filename and path from URL in bash script Extract filename and path from URL in bash script bash bash

Extract filename and path from URL in bash script


There are built-in functions in bash to handle this, e.g., the string pattern-matching operators:

  1. '#' remove minimal matching prefixes
  2. '##' remove maximal matching prefixes
  3. '%' remove minimal matching suffixes
  4. '%%' remove maximal matching suffixes

For example:

FILE=/home/user/src/prog.cecho ${FILE#/*/}  # ==> user/src/prog.cecho ${FILE##/*/} # ==> prog.cecho ${FILE%/*}   # ==> /home/user/srcecho ${FILE%%/*}  # ==> nilecho ${FILE%.c}   # ==> /home/user/src/prog

All this from the excellent book: "A Practical Guide to Linux Commands, Editors, and Shell Programming by Mark G. Sobell (http://www.sobell.com/)


In bash:

URL='http://login:password@example.com/one/more/dir/file.exe?a=sth&b=sth'URL_NOPRO=${URL:7}URL_REL=${URL_NOPRO#*/}echo "/${URL_REL%%\?*}"

Works only if URL starts with http:// or a protocol with the same lengthOtherwise, it's probably easier to use regex with sed, grep or cut ...


This uses bash and cut as another way of doing this. It's ugly, but it works (at least for the example). Sometimes I like to use what I call cut sieves to whittle down the information that I am actually looking for.

Note: Performance wise, this may be a problem.

Given those caveats:

First let's echo the the line:

echo 'http://login:password@example.com/one/more/dir/file.exe?a=sth&b=sth'

Which gives us:

http://login:password@example.com/one/more/dir/file.exe?a=sth&b=sth

Then let's cut the line at the @ as a convenient way to strip out the http://login:password:

echo 'http://login:password@example.com/one/more/dir/file.exe?a=sth&b=sth' | \cut -d@ -f2

That give us this:

example.com/one/more/dir/file.exe?a=sth&b=sth

To get rid of the hostname, let's do another cut and use the / as the delimiter while asking cut to give us the second field and everything after (essentially, to the end of the line). It looks like this:

echo 'http://login:password@example.com/one/more/dir/file.exe?a=sth&b=sth' | \cut -d@ -f2 | \cut -d/ -f2-

Which, in turn, results in:

one/more/dir/file.exe?a=sth&b=sth

And finally, we want to strip off all the parameters from the end. Again, we'll use cut and this time the ? as the delimiter and tell it to give us just the first field. That brings us to the end and looks like this:

echo 'http://login:password@example.com/one/more/dir/file.exe?a=sth&b=sth' | \cut -d@ -f2 | \cut -d/ -f2- | \cut -d? -f1

And the output is:

one/more/dir/file.exe

Just another way to do it and this approach is one way to whittle away that data you don't need in an interactive way to come up with something you do need.

If I wanted to stuff this into a variable in a script, I'd do something like this:

#!/bin/bashurl="http://login:password@example.com/one/more/dir/file.exe?a=sth&b=sth"file_path=$(echo ${url} | cut -d@ -f2 | cut -d/ -f2- | cut -d? -f1)echo ${file_path}

Hope it helps.