How can I get unique values from an array in Bash? How can I get unique values from an array in Bash? bash bash

How can I get unique values from an array in Bash?


A bit hacky, but this should do it:

echo "${ids[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '

To save the sorted unique results back into an array, do Array assignment:

sorted_unique_ids=($(echo "${ids[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '))

If your shell supports herestrings (bash should), you can spare an echo process by altering it to:

tr ' ' '\n' <<< "${ids[@]}" | sort -u | tr '\n' ' '

A note as of Aug 28 2021:

According to ShellCheck wiki 2207 a read -a pipe should be used to avoid splitting.Thus, in bash the command would be:

IFS=" " read -r -a ids <<< "$(echo "${ids[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' ')"

or

IFS=" " read -r -a ids <<< "$(tr ' ' '\n' <<< "${ids[@]}" | sort -u | tr '\n' ' ')"

Input:

ids=(aa ab aa ac aa ad)

Output:

aa ab ac ad

Explanation:

  • "${ids[@]}" - Syntax for working with shell arrays, whether used as part of echo or a herestring. The @ part means "all elements in the array"
  • tr ' ' '\n' - Convert all spaces to newlines. Because your array is seen by shell as elements on a single line, separated by spaces; and because sort expects input to be on separate lines.
  • sort -u - sort and retain only unique elements
  • tr '\n' ' ' - convert the newlines we added in earlier back to spaces.
  • $(...) - Command Substitution
  • Aside: tr ' ' '\n' <<< "${ids[@]}" is a more efficient way of doing: echo "${ids[@]}" | tr ' ' '\n'


If you're running Bash version 4 or above (which should be the case in any modern version of Linux), you can get unique array values in bash by creating a new associative array that contains each of the values of the original array. Something like this:

$ a=(aa ac aa ad "ac ad")$ declare -A b$ for i in "${a[@]}"; do b["$i"]=1; done$ printf '%s\n' "${!b[@]}"ac adacaaad

This works because in any array (associative or traditional, in any language), each key can only appear once. When the for loop arrives at the second value of aa in a[2], it overwrites b[aa] which was set originally for a[0].

Doing things in native bash can be faster than using pipes and external tools like sort and uniq, though for larger datasets you'll likely see better performance if you use a more powerful language like awk, python, etc.

If you're feeling confident, you can avoid the for loop by using printf's ability to recycle its format for multiple arguments, though this seems to require eval. (Stop reading now if you're fine with that.)

$ eval b=( $(printf ' ["%s"]=1' "${a[@]}") )$ declare -p bdeclare -A b=(["ac ad"]="1" [ac]="1" [aa]="1" [ad]="1" )

The reason this solution requires eval is that array values are determined before word splitting. That means that the output of the command substitution is considered a single word rather than a set of key=value pairs.

While this uses a subshell, it uses only bash builtins to process the array values. Be sure to evaluate your use of eval with a critical eye. If you're not 100% confident that chepner or glenn jackman or greycat would find no fault with your code, use the for loop instead.


I realize this was already answered, but it showed up pretty high in search results, and it might help someone.

printf "%s\n" "${IDS[@]}" | sort -u

Example:

~> IDS=( "aa" "ab" "aa" "ac" "aa" "ad" )~> echo  "${IDS[@]}"aa ab aa ac aa ad~>~> printf "%s\n" "${IDS[@]}" | sort -uaaabacad~> UNIQ_IDS=($(printf "%s\n" "${IDS[@]}" | sort -u))~> echo "${UNIQ_IDS[@]}"aa ab ac ad~>