Blue Collar Bioinformatics
Just wanted to recommend Blue Collar Bioinformatics a slick blog with lots of useful bioinformatics scripts. Everything is written in python and the full working source is typically available on GIT.
F$@%ing R: Adventures with Tcltk in OSX
I've got a bunch of RNA-seq reads I need to analyze and for the the most part I've been writing my own code to do the analysis. However, a recent paper in BioInformatics (Wang et al. 2009) describes a new R package for the identification of differentially expressed genes in RNA-seq datasets. R is a pretty straightforward language with a built-in installation system so I should just have to type two lines of code...
source("http://bioconductor.org/biocLite.R")
biocLite("DEGseq")
Not so quick. When I ran this code R tells me it can't find the DEGseq library. A bit more poking around on the internets and I discover that there's an alternate download site:
source("http://bioinfo.au.tsinghua.edu.cn/software/degseq/DEGseqInstall.R")
But after installing some dependancies it also spits out a bunch of errors. I compare the errors... Hmmm... Both installs appear to by dying on the tcl/tk install, but tcltk is a default R library. I can see it right there in "/Library/Frameworks/R.framework/Resources/library". Two hours later and after trying a bunch of crap I find this helpful website:
http://cran.r-project.org/bin/macosx/tools/
A quick and dirty install of the tcltk-8.5.5-x11.dmg and now "library(tcltk") works like a charm. No errors.
I install DEGseq with the following set of commands:
source("http://bioconductor.org/biocLite.R")
biocLite("DEGseq")
Now, a day an a half later I can see if it's useful. Woo.
Citations:
L Wang, Z Feng, X Wang, X Wang, X Zhang. DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics (2009)
PYTHON Quick Links
I write a lot of code using the PYTHON Programing Language. I just gave a very brief overview to a friend who has to learn it this summer. In the course of this lesson, it occurred to me that a lot of the bioinformatic resources that I use every day are not collected in one place. So I've listed a couple of the most useful modules/packages below:
- BioPython
- This package lets you interface with NCBI, parse datafiles (e.g. fastas, Genbank, blast output etc.), run blast queries, run clustalw, etc.
- SciPy
- N-dimensional array manipulation
- MatPlotLib
- Graphing.
- Python DB API
- Database integration
- Google App Engine
- Free webhosting of python cgi scripts. It's in beta.
- Django
- Python Web Application Development package. It can be used in conjunction with Google App Engine.
Here are a few addition sites that I find useful:
- Python 2.5 Quick Reference
- both html and PDF versions are available for free!
- TextMate
- OS X text editor. It's not free, but there is a student discount available
- Forklift
- OS X ftp program. Also not free, but reasonably priced.