I recently had to design qPCR primers for some genes. I had a genome and an annotated GTF file derived from Cufflinks. Since I wanted the primers to span introns, to prevent the amplification of genomic DNA, I needed a both fasta file of coding sequence to use as input in to Primer3 as well as some associated information about where the introns were spliced out so I could ensure the primers I design spanned introns. So, what I did was use the cufflinks gffread command to convert the GTF file to create set of fasta transcripts annotated with the positions of each of the exons.
Once you have cufflinks installed type
gffread —h to make sure everything is copacetic. Assuming that prints a bunch documentation to the terminal you can then run:
grep 'GeneID' my.gtf | gffread —g my.fasta -W -x GeneID.fasta
You’ll need to know how your gene or transcript is labeled in the GTF file (e.g. ‘GeneID’). The ‘-x’ flag ensures that the results only include coding sequence, the ‘-W’ adds the exon coordinates to the fasta header. Then run
head GeneID.fasta to check that sequences were added to the output file.
One could probably further automate the next steps, but basically you just need to copy the first couple of exons/segs from the file and add brackets around the exon splice site you want the primers to span. You can use the ‘segs’ in the fasta header to determine where the splice junctions are.
>ENS0123412 gene=bactin loc:3(-)22053-225439 segs:1-333,334-490,491-518 ATGGCTCAGAGAGATGCTGACAAATACCTCTATGTGGATAGAAATCTCATCAACAACCCTCTTGCTCAGG CCGATTGGGCAGCTAAGAAACTGGTGTGGGTCCCATCAGAAAAGAATGGCTTTGAGCCTGCTAGCTTAAA AGAGGAAGTAGGAGATGAAGCCATTGTGGAGCTTGCAGAGAACGGGAAGAAAGTGCGAGTAAACAAAGAT GATATCCAAAAGATGAACCCGCCTAAGTTCTCTAAAGTGGAAGACATGGCTGAATTGACCTGCCTGAATG AGGCCTCTGTGTTGCACAACTTAAAGGAACGATACTACTCGGGGCTT[ATCTATACCTACT]CAGGCCTA CTGTGTGGTCATAAATCCCTACAAGAACTTGCCCATCTACTCAGAAGAGATTGTGGAAATGTATAAGGGC AAAAAGAGACACGAGATGCCCCCTCACATCTATGCCATTACAGACACAGCCTACAGGAGTATGATGCAAG
Lastly, use the modified fasta as input into Primer3. You’ll need to adjust the PCR product size to range from 70-200 base-pairs, but the default TM of 60 should be fine.
The final result should look something like this:
PRIMER PICKING RESULTS FOR ENS0123412 gene=bactin loc:3(-)22053-225439 segs:1-333,334-490,491-518 No mispriming library specified Using 1-based sequence positions OLIGO start len tm gc% any 3' seq LEFT PRIMER 275 20 59.83 50.00 8.00 0.00 TGAATGAGGCCTCTGTGTTG RIGHT PRIMER 450 20 60.03 55.00 5.00 3.00 TAGATGTGAGGGGGCATCTC SEQUENCE SIZE: 488 INCLUDED REGION SIZE: 488 PRODUCT SIZE: 176, PAIR ANY COMPL: 4.00, PAIR 3' COMPL: 2.00 TARGETS (start, len)*: 328,13 1 ATGGCTCAGAGAGATGCTGACAAATACCTCTATGTGGATAGAAATCTCATCAACAACCCT 61 CTTGCTCAGGCCGATTGGGCAGCTAAGAAACTGGTGTGGGTCCCATCAGAAAAGAATGGC 121 TTTGAGCCTGCTAGCTTAAAAGAGGAAGTAGGAGATGAAGCCATTGTGGAGCTTGCAGAG 181 AACGGGAAGAAAGTGCGAGTAAACAAAGATGATATCCAAAAGATGAACCCGCCTAAGTTC 241 TCTAAAGTGGAAGACATGGCTGAATTGACCTGCCTGAATGAGGCCTCTGTGTTGCACAAC >>>>>>>>>>>>>>>>>>>> 301 TTAAAGGAACGATACTACTCGGGGCTTATCTATACCTACTCAGGCCTACTGTGTGGTCAT ************* 361 AAATCCCTACAAGAACTTGCCCATCTACTCAGAAGAGATTGTGGAAATGTATAAGGGCAA 421 AAAGAGACACGAGATGCCCCCTCACATCTATGCCATTACAGACACAGCCTACAGGAGTAT <<<<<<<<<<<<<<<<<<<< 481 GATGCAAG