A colleague needed to remove some individual fastas from a multi-fasta file. Googling didn’t reveal a canned way to do it so I hacked up this script.

8.29.12 – As Jason Gallant pointed out, if your fasta is very small you don’t need to index your fasta file. Just  use the simple biopython code he mentions in the comments.

Tagged with →  
Share →

2 Responses to Filtering contigs/chromosomes from a multi-fasta file

  1. Biopython also does this pretty easily. Found this online, and used it successfully for extracting Trinity contigs from a Fasta file.

    “”"
    %prog some.fasta wanted-list.txt
    “”"
    from Bio import SeqIO
    import sys

    wanted = [line.strip() for line in open(sys.argv[2])]
    seqiter = SeqIO.parse(open(sys.argv[1]), ‘fasta’)
    SeqIO.write((seq for seq in seqiter if seq.id in wanted), sys.stdout, “fasta”)

    • Nick says:

      Biopython works well if you have a small concatenated fasta file. If it’s huge you’ll need to use an index to find contigs quickly.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" cssfile="">