Merge two large text files by common row to one mapping file
In my opinion, the easiest way would be to use BLAST+...
Set up the larger file as a BLAST database and use the smaller file as the query...
Then just write a small script to analyse the output - I.e. Take the top hit or two to create the mapping file.
BTW. You might find SequenceServer (Google it) helpful in setting up a custom Blast database and your BLAST environment...
BioPython
should be able to read in large FASTA files.
from Bio import SeqIOfrom collections import defaultdictmapping = defaultdict(list)for stool_record in SeqIO.parse('stool.fasta', 'fasta'): stool_seq = str(stool_record.seq) for lib_record in SeqIO.parse('libs.fasta', 'fasta'): lib_seq = str(lib_record.seq) if stool_seq.startswith(lib_seq): mapping[lib_record.id.split(';')[0]].append(stool_record.id)