A colleague needed to remove some individual fastas from a multi-fasta file. Googling didn’t reveal a canned way to do it so I hacked up this script. 8.29.12 – As Jason Gallant pointed out, if your fasta is very small you don’t need to index your fasta file. Just use the simple biopython code he mentions in […]
I been working with a lot of very large files and it has become increasing obvious that using a single processor core is a major bottleneck to getting my data processed in a timely fashion. A MapReduce style algorithm seemed like the way to go, but I had a hard time finding a useful example. […]
Here’s a simple script for interleaving paired-end fastq files. You’ll need to do this if you want to create input files for velvet. Unlike the velvet’s shuffleSequences_fastq.pl perl script, this script handles gzipped input and output. It requires python 2.7.
Just wanted to recommend Blue Collar Bioinformatics a slick blog with lots of useful bioinformatics scripts. Everything is written in python and the full working source is typically available on GIT.
I just spent an hour figuring out how to emulate Blast’s “Short Sequence Search Parameters” in BioPython 1.48. To use PAM30 as your matrix you must use existence and extension parameters (e.g. gap costs) of 9 and 1. Here’s what I’ve currently got: result_handle = NCBIWWW.qblast( “blastp”, “nr”, seq_record.seq.tostring(), matrix_name = ‘BLOSUM62′, word_size=’2′, expect=’30000′, gapcosts =’9 […]