This looks neato. One of the first papers to use the Agilent Tech to do targeted re-sequencing.  I can’t wait to get my hands on a PDF.

The impressive economy of this paper is that they targeted (using Agilent chips) less than 30Mb of the human genome, which is less than 1%. They also worked with very few samples; only about 30 cases of Miller Syndrome have been reported in the literature. While I’ve expressed some reservations about “exome sequencing”, this paper does illustrate why it can be very cost effective and my objections (perhaps not made clear enough before) is more a worry about being too restricted to “exomes” and less about targeting.

More @ Omics! Omics!

via Targeted Sequencing Bags a Rare Disease.

RlogoI’ve got a bunch of RNA-seq reads I need to analyze and for the the most part I’ve been writing my own code to do the analysis.  However, a recent paper in BioInformatics (Wang et al. 2009) describes a new R package for the identification of differentially expressed genes in RNA-seq datasets.  R is a pretty straightforward language with a built-in installation system so I should just have to type two lines of code…

source("http://bioconductor.org/biocLite.R")
biocLite("DEGseq")

Not so quick. When I ran this code R tells me it can’t find the DEGseq library. A bit more poking around on the internets and I discover that there’s an alternate download site:

source("http://bioinfo.au.tsinghua.edu.cn/software/degseq/DEGseqInstall.R")

But after installing some dependancies it also spits out a bunch of errors.  I compare the errors… Hmmm… Both installs appear to by dying on the tcl/tk install, but tcltk is a default R library.  I can see it right there in “/Library/Frameworks/R.framework/Resources/library”.  Two hours later and after trying a bunch of crap I find this helpful website:

http://cran.r-project.org/bin/macosx/tools/

A quick and dirty install of the tcltk-8.5.5-x11.dmg and now “library(tcltk”) works like a charm.  No errors.

I install DEGseq with the following set of commands:

source("http://bioconductor.org/biocLite.R")
biocLite("DEGseq")

Now, a day an a half later I can see if it’s useful. Woo.

Citations:

L Wang, Z Feng, X Wang, X Wang, X Zhang. DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics (2009)

Just wanted to thank my students, all ~44 of you, for putting up with my stammering explanations of genetics. I sure learned a ton and I hope you did too. Good luck with your finals.

I’m not all that great at RegEx, but I needed split a line of text on commas followed by spaces and/or by spaces (including tabs). 30 minutes later after swearing and sweating with iPython, I produced the following little expression. Who needs the CSV module?

re.compile(',(?:\s*)|\s*')

Example Usage:

line = 'one, two three          four,           five'
pattern = re.compile(',(?:\s*)|\s*')
line.pattern(split)
>>> ['one', 'two', 'three', 'four', 'five']

No quiz for my Monday classes until March 23rd.  My Tuesday class should be prepared for a quiz on chapters 7 and 8 on Tuesday the 17th.

textmate-12-14-07I wrote a very simple textmate bundle for working with nexus files.

Version 1 Functionality: folds NEXUS blocks, highlighting bayes block mcmc line and typing command-B will automatically calculate burnin at 25%, more to come as I think of it…. probably contains bugs

Download nexus bundle version 1. (03/05/09)