Problems using awk to select group of sequences from fasta file

I would advise you to use an available fasta format parser.

For instance, in python3, you can use the pretty efficient parser from pyGATB.

You could use it as follows:

#!/usr/bin/env python3import sysfrom gatb import Bankfasta_file = sys.argv[1]pop_name = sys.argv[2]def get_pop(header):    """Extracts the population name from the fasta header."""    return header.decode("utf-8").split(" ")[1].split("_")[0][1:]for seq in Bank(fasta_file):    if get_pop(seq.comment) == pop_name:        print(">%s\n%s" % (            seq.comment.decode("utf-8"),            seq.sequence.decode("utf-8")))sys.exit(0)

Running this with your example file (which strangely has an empty sequence for CLocus_12706_Sample_46_Locus_34641_Allele_0 [JoJo_s115.fq; groupI, 125578, +]):

./extract_pop.py test.fa "JoJo">CLocus_12706_Sample_44_Locus_36326_Allele_0 [JoJo_s113.fq; groupI, 125578, +]TGCAGCATGCTGGTGAACGCGTCATCATAAGCCTGTTGGCGAGCCAGCAGAAGGCGGCATGGGCAGCACTTAATAGGACGCACGTCCTCTGTGTCA>CLocus_12706_Sample_46_Locus_34641_Allele_0 [JoJo_s115.fq; groupI, 125578, +]

If you don't have python3, you can use Biopython's SeqIO module:

#!/usr/bin/env pythonimport sysfrom Bio import SeqIOfasta_file = sys.argv[1]pop_name = sys.argv[2]def get_pop(header):    """Extracts the population name from the fasta header."""    return header.split(" ")[1].split("_")[0][1:]for seq in SeqIO.parse(fasta_file, format="fasta"):    if get_pop(seq.description) == pop_name:        print(">%s\n%s" % (seq.description, seq.seq))sys.exit(0)

CodeHunter

Problems using awk to select group of sequences from fasta file

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last