How to match for multiple patterns in the specific column?

Use a regular expression:

awk '$1 ~ /^chr(1?[0-9]|2[0-2]|X|Y)$/' file

This uses $1 ~ /^pattern$/ to chose the good lines consisting in exactly pattern (note ^ for beginning and $ for end).

The pattern is on the form chr(..|..|..), meaning: match chr followed by either of the |-separated conditions within ().

These conditions can be either of:

a number (possible 1 followed by a digit) (1?[0-9])
a number being 2 + any of 0, 1, 2 (2[0-2])
X
Y

Demo automatically explained: https://regex101.com/r/gH1kS4/2

bash unix awk grep pattern-matching

If you want something easier to maintain (e.g. editing or adding new lines/patterns to match) and also something easier to understand, especially if you just started engaging with regular expressions, use the grep -f match.list input.txt format:

Create a file with the patterns you want to match (match.list):

^chr[1-9][[:space:]]\|      # this matches chr1-chr9^chr1[0-9][[:space:]]\|     # this matches chr10-chr19^chr2[12][[:space:]]\|      # this matches chr21-22^chr[XY][[:space:]]\|       # this matches chrX and chrYnew_string_or_pattern\|     # ... your new pattern ...

Then just call grep like this:

grep -f match.list input.txt

As you can see above, you can even add comments to the list of patterns, using the \| trick (ending each pattern with \|), so you can remember what you did yesterday or where did you find the regex. And you may add new fixed strings or patterns by just adding new lines. Also, if you find it difficult to create a complex regex, you may just create a pattern file with the fixed strings you want to match:

^chrX^chrY...

Another benefit of this approach is that you may maintain several pattern files, representing different sub-queries you may need to run daily. E.g.

grep -f chromosomes_n input.txtgrep -f chromosomes_xy input.txtgrep -f chromosomes_random input.txt

The only drawback of the approach is that grep will get slower if you add more than a dozen patterns in each file. But that will be a problem only if your input file has hundreds of thousands of lines.

bash unix awk grep pattern-matching

You can use this simplified regex with grep:

grep "^chr\(1\?[0-9]\|2[012]\|[XY]\)[[:space:]]" filename

The logic is contained within the parentheses $..$

1\?[0-9] - match 0-9 optionally preceded by 1
2[012] - match 2 followed by 0, 1 or 2
[XY] - match X or Y

CodeHunter

How to match for multiple patterns in the specific column?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last