Merge two large text files by common row to one mapping file

In my opinion, the easiest way would be to use BLAST+...

Set up the larger file as a BLAST database and use the smaller file as the query...

Then just write a small script to analyse the output - I.e. Take the top hit or two to create the mapping file.

BTW. You might find SequenceServer (Google it) helpful in setting up a custom Blast database and your BLAST environment...

python regex unix merge bioinformatics

BioPython should be able to read in large FASTA files.

from Bio import SeqIOfrom collections import defaultdictmapping = defaultdict(list)for stool_record in SeqIO.parse('stool.fasta', 'fasta'):    stool_seq = str(stool_record.seq)    for lib_record in SeqIO.parse('libs.fasta', 'fasta'):        lib_seq = str(lib_record.seq)        if stool_seq.startswith(lib_seq):            mapping[lib_record.id.split(';')[0]].append(stool_record.id)

CodeHunter

Merge two large text files by common row to one mapping file

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last