Finding Open Reading Frames Using the NCBI "ORF Finder"
Download theORF Worksheet by double-clicking on the link... it should open in WORD.
Examine the double-stranded sequence and notice that there are three possible ways to group the nucleotides into groups of three for each of the two strands... therefore there are six possible translations of that double-stranded sequence.
Check the translated sequences for errors by refering to the table of the genetic code.
Which of the six translations is without a stop codon? You are welcome to try working through the second page of the worksheet by hand, but there is a short cut.
Highlight the sequence from page two of the worksheet and copy it.
Google NCBI and click on the top link. Then search the NCBI website for "orf finder" as indicated below.
Paste the sequence into the window and insert a ">" sign so that your sequence entry follows FASTA format.
Click the "Six Frames" button. Notice the little green and red lines on the six long boxes. Those represent "start" and "stop" codons. Click on each of the sequences in turn to see the translations.
Which of the six reading frames do you suppose is the real one?
A Real Example...
Go back to the main ORF Finder page and bookmark it. Then click on Blast.
Search the Nucleotide Database for "Short nearly exact matches."
Click on the first match. It should be AY484747, which is the Mytilus edulis complete mitochondrial DNA sequence, all 16,740 base pairs.
Scroll down about one third of the record until you get to the gene for COX III which should be at nucleotides 7600..8392. Read the notes and then click on the CDS link. That automatically extracts the sequence for just the COX III gene.
Convert that extracted sequence to FASTA format... then copy and paste it to a notepad file. Notice the header is now totally wrong. Change it to > Mytilusedulis Cytochrome C Oxidase Subunit III.
Copy and paste the nucleotide sequence into ORF Finder. What happens? What does the open reading frame look like?
BREAK
Next, let us use ORF Finder to look at a smaller protein, but one that is encoded by a nuclear gene, M22877, Human Cytochrome C.