Trouble formulating a regular expression for use with sed to extract column values
Use the right tool for the job. If you're processing columns, awk
is a better solution:
ls -la | awk '{print $5}'
Given your ls -la
output, that should generate:
Size264096
If, for some bizarre reason you cannot use the correct tool, the following sed
command will work, but it's rather ugly:
sed 's/[ \t]*[0-9][0-9][0-9][0-9]-.*//;s/[ \t]*Date.*//;s/^.*[ \t]//'
It works by removing from the year column (9999-
) and preceding tabs/spaces, to the end of the line.
Then it does something similar for the header.
Then it just removes everything from line start to the final tab/space, which is now just before the size column.
I know which one I'd prefer to write and maintain :-)
The general caveat applies: awk
is the better tool for the job.
Here's a simpler sed
solution:
ls -la | sed -E 's/^(([^[:space:]]+)[[:space:]]+){5}.*/\2/'
- works with both spaces and tabs between columns
- takes advantage of repeating capture groups only reporting the last captured instance - in this case, the 5th column
- caveat: will not work correctly with filenames with embedded spaces
In case only spaces separate the columns - which is the case with ls
output, the command simplifies to:
ls -la | sed -E 's/^(([^ ]+)[ ]+){5}.*/\2/'
To skip the first input line you have several options, but the simplest is to prepend 1d
to your sed
program:
ls -la | sed -E '1d; s/^(([^ ]+)[ ]+){5}.*/\2/'
(Other options:
Use tail
to skip the first line:
ls -la | tail +2 | sed -E 's/^(([^ ]+)[ ]+){5}.*/\2/'
More generically, use sed
to ignore lines that do not have at least 5 columns:
ls -la | sed -E -n 's/^(([^ ]+)[ ]+){5}.*/\2/p'
-n
suppresses default output- appending
p
to the substitution command only produces output if a substitution was made
)
To show only the 3 largest files (a requirement added later by the OP), courtesy of @JS웃:
ls -la | sed -E '2d; s/^(([^ ]+)[ ]+){5}.*/\2/' | sort -nr | head -3
The above will not output the header line, however.To include the header line, use (courtesy of this unix.stackexchange.com answer):
ls -la | sed -E '1d; s/^(([^ ]+)[ ]+){5}.*/\2/' | { IFS= read -r l; echo "$l"; sort -nr | head -3; }
Here is another way with GNU sed
:
ls -la | sed -r '1d;s/([^ ]+ *){4}([^ ]+).*/\2/'
If your version of sed
does not support -r
option, then use -E
.