|



Welcome to Homework Part II. This exercise builds on all the previous sessions and provides an example of how sequences are used to answer biological questions.
| Skills developed through this exercise: |
|
![]() |
![]() |
Research Question:
Are whales and dolphins a sister group to Artiodactyls (even-toed ungulates)? Or should they be placed within the Artiodactyls as a sister group to Hippopotami? In other words, are whales a kind of even-toed ungulate as Hippos are? For background see pages 559 - 561 of your textbook, "Whale Evolution: A Case History" (In Biological Science 2nd Ed. by Scott Freeman). There is also a more detailed discussion here.
To answer our research question we will build a phylogenetic tree of relatedness using protein sequence data from the National Center for Biotechnology Information (NCBI)
| STEP ONE - Obtaining an appropriate Cytochrome b protein sequence for the analysis |
Try searching "All Databases" for "Mirounga" |
|
| Next try clicking on "PubMed Central: free, full text journal articles." | ![]() |
It's easy to get side-tracked!
Go ahead and refine the search a bit by clicking "Protein" and adding the search modifier for "organism" like this:
Mirounga [orgn]
That should reduce the number of hits a bit. Adding "cytochrome b" with quotes like this should help a lot:
Mirounga [orgn] "cytochrome b"
Finally, if you add the search modifier for "protein" like this:
Mirounga [orgn] "cytochrome b" [prot]
...it should knock it down to about eight hits that include the Cytochrome b sequences for Mirounga leonina and Mirounga angustirostris.
Create a folder called "SEQs" somewhere on your hard drive where you can find it again (Perhaps write down the pathname). Click on YP_778785 and make sure it is Cytochrome b from Mirounga leonina and is 379 amino acids long. Scan the sequence record. If you ever need the DNA sequence that codes for this protein you can click on the CDS link down near the bottom by the amino acid sequence. Change the Display from GenPept to FASTA format by clicking on the drag down menu. Then copy and paste the FASTA format sequence to Notepad and save in your SEQs folder as a text file named "Cytb_M_leonina.txt" You now know how to find and retrieve a sequence from the protein sequence database at NCBI. |
|
| STEP TWO - Using one sequence to obtain others. |
We now have to retrieve about nine more sequences from the database. There is a convenient way to do this quickly.
Go back to the NCBI home page (Google "NCBI" if you have to).
Click on the "Blast" tab at the top of the NCBI home page.
The Basic Local Alignment and Search Tool allows you
to search a sequence against a sequence database to find similar sequences. Kind of like "Googling" sequences.
It is crude for alignments, and not as sensitive as some other search algorithms, but it is VERY fast.
Select Protein Blast since you will be searching with the Elephant Seal Cytochrome b protein sequence you just saved to your SEQs folder.
|
#1 Copy and paste the Cytochrome b sequence (FASTA format) into the dialog box: #2 Select "Reference proteins" from the drop-down menu, instead of Human or NR (non-redundant). #3 Cytochrome b is pretty much a protein found in all organisms. Since we will be looking at mammals to answer the question where whales belong, it helps to narrow the search to just mammals by typing in "mammalia" here. If you are looking for a specific Cytochrome b, you can type in the Latin name here. #4 Click "BLAST" and wait. This can take a couple of minutes... and you should see a couple of different screens. Be patient, there are thousands of people searching this database right now, and it is run by your tax dollars, not Google :) |
The resulting screen should look like the one below. You can scroll down to check out the hits and alignments if you want... but to pick out a bunch of sequences for a phylogenetic study, it is VERY convenient use the crude tree of potential taxonomic relationships that BLAST produces under "Taxonomy Reports."
Click "Taxonomy Reports" to cause a page like the one below to display. Here you can easily retrieve sequence data from representative taxa for your analysis. But wait... open a new tab in your browser, because we will save even more time by using preassembled sequences.

| STEP THREE - Prepare a multiple alignment of the sequences. | |
Copy and paste all ten of these sequences at once into the ClustalW Alignment Tool at the Kyoto University Bioinformatics Center in Kyoto, Japan. Then select "Execute Multiple Alignment." Wait for the result, then copy and paste the alignment into a WORD or NotePad file. Scroll all the way to the bottom of the alignment. Select "N-J tree" and click the "Exec" button. Do not click "Generate Profile HMM." >Platypus MNNLRKTHPLIKIVNHSFIDLPTPSNISSWWNFGSLLGLCLIIQILTGLFLAMHYTSDTSTAFSSVAHIC RDVNYGWLIRYMHANGASLFFMCIFLHIGRGLYYGSYTQTETWNIGVVLLFTVMATAFVGYVLPWGQMSF WGATVITNLLSAIPYIGTTLVEWIWGGFSVDKATLTRFFAFHFILPFVIAALAVIHLLFLHETGSNNPSG LNSDPDKIPFHPYYSVKDLVGFFMTILVLLTLVLFTPDLLGDPDNYTPANPLSTPPHIKPEWYFLFAYAI LRSIPNKLGGVLALVASILILILVPLLHTSYQRGLAFRPLTQMLFWILVTDLLTLTWIGGQPVEQPFIII GQLASILYFLLITTLIPLTGLLENDLLKW |
| STEP FOUR - Capture a copy of the alignment and the tree that results from that alignment. | |
Feel free to go back to the browser tab that is open to the Taxonomy Report. Check for additional interesting species. It might be worth it to go all the way back to the original search and broaden it beyond mammals. When you are setting up the Blastp parameters, change the "100 results" to "500 results" or use the elephant seal sequence to search for Aves, or Chondrichthyes, or whatever taxa you think might provide a good, unbiased root for your tree.
When preparing files for analysis you should be aware that the tree drawing program constructs
a name to label the tree from the information that is on the first line of the FASTA format file.
It is important to keep the accession numbers in your notes, but to remove all but the simplest
description from that first line. Otherwise you get a messy tree. |