A colleague needed to remove some individual fastas from a multi-fasta file. Googling didn’t reveal a canned way to do it so I hacked up this script.

8.29.12 – As Jason Gallant pointed out, if your fasta is very small you don’t need to index your fasta file. Just  use the simple biopython code he mentions in the comments.

  1. Biopython also does this pretty easily. Found this online, and used it successfully for extracting Trinity contigs from a Fasta file.

    %prog some.fasta wanted-list.txt
    from Bio import SeqIO
    import sys

    wanted = [line.strip() for line in open(sys.argv[2])]
    seqiter = SeqIO.parse(open(sys.argv[1]), ‘fasta’)
    SeqIO.write((seq for seq in seqiter if in wanted), sys.stdout, “fasta”)

