Filtering contigs/chromosomes from a multi-fasta file

A colleague needed to remove some individual fastas from a multi-fasta file. Googling didn’t reveal a canned way to do it so I hacked up this script.

8.29.12 – As Jason Gallant pointed out, if your fasta is very small you don’t need to index your fasta file. Just  use the simple biopython code he mentions in the comments.

2 thoughts on “Filtering contigs/chromosomes from a multi-fasta file

  1. Biopython also does this pretty easily. Found this online, and used it successfully for extracting Trinity contigs from a Fasta file.

    %prog some.fasta wanted-list.txt
    from Bio import SeqIO
    import sys

    wanted = [line.strip() for line in open(sys.argv[2])]
    seqiter = SeqIO.parse(open(sys.argv[1]), ‘fasta’)
    SeqIO.write((seq for seq in seqiter if in wanted), sys.stdout, “fasta”)

Leave a Reply

Your email address will not be published. Required fields are marked *