Project FAQ Project Staff Publications Sponsors Gallery 2006-07
Biotechnology Bioinformatics Teaching and Learning Evaluation Lesson Plans
Telecommunications STEM Careers Tutorials Publishing Poster Showcase

Are Whales Hippos?

Welcome to Homework Part II. This exercise builds on all the previous sessions and provides an example of how sequences are used to answer biological questions.

Skills developed through this exercise:
  • Use of National Center for Biotechnology Information (NCBI) databases
  • Retrieval of sequences from NCBI
  • Alignment of homologous protein sequences using ClustalW
  • Using ClustalW output to prepare phylogenetic networks (trees)
  • Testing evolutionary hypotheses

 

tree A Tree B

Research Question:

Are whales and dolphins a sister group to Artiodactyls (even-toed ungulates)?  Or should they be placed within the Artiodactyls as a sister group to Hippopotami? In other words, are whales a kind of even-toed ungulate as Hippos are? For background see pages 559 - 561 of your textbook, "Whale Evolution: A Case History" (In Biological Science 2nd Ed. by Scott Freeman). There is also a more detailed discussion here.

To answer our research question we will build a phylogenetic tree of relatedness using protein sequence data from the National Center for Biotechnology Information (NCBI)

STEP ONE - Obtaining an appropriate Cytochrome b protein sequence for the analysis


Try searching
"All Databases"
for
"Mirounga"
ncbi title


 


Next try clicking on "PubMed Central: free, full text journal articles." image

It's easy to get side-tracked!

Go ahead and refine the search a bit by clicking "Protein" and adding the search modifier for "organism"  like this:

Mirounga [orgn]

That should reduce the number of hits a bit. Adding "cytochrome b" with quotes like this should help a lot:

Mirounga [orgn] "cytochrome b"

Finally, if you add the search modifier for "protein" like this:

Mirounga [orgn] "cytochrome b" [prot]

 ...it should knock it down to about eight hits that include the Cytochrome b sequences for Mirounga leonina and Mirounga angustirostris.


When collecting sequences for any kind of analysis it is preferable to use a "Ref" sequence if it is available. Here there is one for Mirounga leonina (for more information on RefSeqs see the NCBI Handbook).

Create a folder called "SEQs" somewhere on your hard drive where you can find it again (Perhaps write down the pathname).

Click on YP_778785 and make sure it is Cytochrome b from Mirounga leonina and is 379 amino acids long.

Scan the sequence record. If you ever need the DNA sequence that codes for this protein you can click on the CDS link down near the bottom by the amino acid sequence.

Change the Display from GenPept to FASTA format by clicking on the drag down menu. Then copy and paste the FASTA format sequence to Notepad and save in your SEQs folder as a text file named "Cytb_M_leonina.txt"

You now know how to find and retrieve a sequence from the protein sequence database at NCBI.

image

STEP TWO - Using one sequence to obtain others.

 

We now have to retrieve about nine more sequences from the database. There is a convenient way to do this quickly.

Go back to the NCBI home page (Google "NCBI" if you have to).

Click on the "Blast" tab at the top of the NCBI home page.

The Basic Local Alignment and Search Tool allows you to search a sequence against a sequence database to find similar sequences. Kind of like "Googling" sequences. It is crude for alignments, and not as sensitive as some other search algorithms, but it is VERY fast.

Select Protein Blast since you will be searching with the Elephant Seal Cytochrome b protein sequence you just saved to your SEQs folder.

image

 

#1 Copy and paste the Cytochrome b sequence (FASTA format) into the dialog box:


 

#2 Select "Reference proteins" from the drop-down menu, instead of Human or NR (non-redundant).


 

#3 Cytochrome b is pretty much a protein found in all organisms. Since we will be looking at mammals to answer the question where whales belong, it helps to narrow the search to just mammals by typing in "mammalia" here. If you are looking for a specific Cytochrome b, you can type in the Latin name here.


 

#4 Click "BLAST" and wait. This can take a couple of minutes... and you should see a couple of different screens. Be patient, there are thousands of people searching this database right now, and it is run by your tax dollars, not Google :)


 

The resulting screen should look like the one below. You can scroll down to check out the hits and alignments if you want... but to pick out a bunch of sequences for a phylogenetic study, it is VERY convenient use the crude tree of potential taxonomic relationships that BLAST produces under "Taxonomy Reports."



image

 



Click "Taxonomy Reports" to cause a page like the one below to display. Here you can easily retrieve sequence data from representative taxa for your analysis. But wait... open a new tab in your browser, because we will save even more time by using preassembled sequences.



 


image

STEP THREE - Prepare a multiple alignment of the sequences.

Copy and paste all ten of these sequences at once into the ClustalW Alignment Tool at the Kyoto University Bioinformatics Center in Kyoto, Japan. Then select "Execute Multiple Alignment." Wait for the result, then copy and paste the alignment into a WORD or NotePad file. Scroll all the way to the bottom of the alignment. Select "N-J tree" and click the "Exec" button. Do not click "Generate Profile HMM."

>Platypus
MNNLRKTHPLIKIVNHSFIDLPTPSNISSWWNFGSLLGLCLIIQILTGLFLAMHYTSDTSTAFSSVAHIC
RDVNYGWLIRYMHANGASLFFMCIFLHIGRGLYYGSYTQTETWNIGVVLLFTVMATAFVGYVLPWGQMSF
WGATVITNLLSAIPYIGTTLVEWIWGGFSVDKATLTRFFAFHFILPFVIAALAVIHLLFLHETGSNNPSG
LNSDPDKIPFHPYYSVKDLVGFFMTILVLLTLVLFTPDLLGDPDNYTPANPLSTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALVASILILILVPLLHTSYQRGLAFRPLTQMLFWILVTDLLTLTWIGGQPVEQPFIII
GQLASILYFLLITTLIPLTGLLENDLLKW

>Wolf MTNIRKTHPLAKIVNNSFIDLPAPSNISAWWNFGSLLGVCLILQILTGLFLAMHYTSDTATAFSSVTHIC RDVNYGWIIRYMHANGASMFFICLFLHVGRGLYYGSYVFMETWNIGIVLLFATMATAFMGYVLPWGQMSF WGATVITNLLSAIPYIGTDLVEWIWGGFSVDKATLTRFFAFHFILPFIIAALAMVHLLFLHETGSNNPSG ITSDSDKIPFHPYYTIKDILGALLLLLILMSLVLFSPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI LRSIPNKLGGVLALVFSILILAFIPLLHTSKQRSMMFRPLSQCLFWLLVADLLTLTWIGGQPVEHPFIII GQVASMLYFTILLILMPTVSVIENNLLKW >Elephant Seal MTNIRKTHPLAKIINNSFIDLPTPPNISAWWNFGSLLGICLILQILTGLFLAMHYTPDTTTAFSSVTHIC RDVNYGWIIRYMHANGASMFFICLYMHMGRGLYYGSYTFTETWNIGIILLFTIMATAFMGYVLPWGQMSF WGATVITNLLSAVPYVGDDLVQWIWGGFSIDKATLTRFFALHFILPFVALALAAVHLLFLHETGSNNPSG IPSDSDKIPFHPYYTIKDILGALLLILTLMLLVLFSPDLLGDPDNYTPANPLSTPPHIKPEWYFLFAYAI LRSIPNKLGGVLALILSILILAIIPLLHTSSQRGMMFRPISQCLFWLLVADLLTLTWIGGQPVEHPYIII GQLASILYFTILLVLMPITSIIENNILKW >Pig MTNIRKSHPLMKIINNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTTTAFSSVTHIC RDVNYGWVIRYLHANGASMFFICLFIHVGRGLYYGSYMFLETWNIGVVLLFTVMATAFMGYVLPWGQMSF WGATVITNLLSAIPYIGTDLVEWIWGGFSVDKATLTRFFAFHFILPFIITALAAVHLLFLHETGSNNPTG ISSDMDKIPFHPYYTIKDILGALFMMLILMILVLFSPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI LRSIPNKLGGVLALVASILILILMPMLHTSKQRSMMFRPLSQCLFWMLVADLITLTWIGGQPVEHPFIII GQLASILYFLIILVLMPITSIIENNLLKW
>Orca MTNIRKTHPLMKILNNAFIDLPTPSNISSWWNFGSLLGLCLITQILTGLLLAMHYTPDTSTAFSSVAHIC RDVNYGWFIRYLHANGASMFFICLYAHIGRSLYYGSYMFQETWNVGVLLLLAVMATAFVGYVLPWGQMSF WGATVITNLLSAIPYIGTTLVEWIWGGFSVDKATLTRFFAFHFILPFIITALAAVHLLFLHETGSNNPTG IPSNMDMIPFHPYHTIKDTLGALLLILTLLALTLFAPDLLGDPDNYTPANPLSTPAHIKPEWYFLFAYAI LRSVPNKLGGVLALLLSILILIFIPMLQTSKQRSMMFRPFSQLLFWTLIADLLTLTWIGGQPVEHPYIIV GQLASILYFLLILVLMPTISLIENKLLKW
>Rhinoceros MTNIRKSHPLVKIINHSFIDLPTPSNISSWWNFGSLLGICLILQILTGLFLAMHYTPDTTTAFSSVTHIC RDVNYGWMIRYLHANGASMFFICLFIHVGRGLYYGSYTFLETWNIGIILLFTLMATAFMGYVLPWGQMSF WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIILALAITHLLFLHETGSNNPSG IPSNMDKIPFHPYYTIKDILGALLLILVLLILVLFFPDILGDPDNYTPANPLSTPPHIKPEWYFLFAYAI LRSIPNKLGGVLALAFSILILLLIPYLHTSKQRSMMFRPLSQCMFWLLVADLLTLTWIGGQPVEHPFIII GQLASILYFSLILVLMPLAGIIENNLLKW
>Horse MTNIRKSHPLIKIINHSFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTTTAFSSVTHIC RDVNYGWIIRYLHANGASMFFICLFIHVGRGLYYGSYTFLETWNIGIILLFTVMATAFMGYVLPWGQMSF WGATVITNLLSAIPYIGTTLVEWIWGGFSVDKATLTRFFAFHFILPFIITALVVVHLLFLHETGSNNPSG IPSNMDKIPFHPYYTIKDILGLLLLILLLLTLVLFSPDLLGDPDNYTPANPLSTPPHIKPEWYFLFAYAI LRSIPNKLGGVLALILSILILALIPTLHMSKQRSMMFRPLSQCVFWLLVADLLTLTWIGGQPVEHPYVII GQLASILYFSLILIFMPLASTIENNLLKW
>Hippopotamus MTNIRKSHPLMKIINDAFVDLPAPSNISSWWNFGSLLGVCLILQILTGLFLAMHYTPDTLTAFSSVTHIC RDVNYGWVIRYMHANGASIFFICLFTHVGRGLYYGSHTFLETWNIGVILLLTTMATAFMGYVLPWGQMSF WGATVITNLLSAIPYIGTDLVEWIWGGFSVDKATLTRFFAFHFILPFVITALAIVHLLFLHETGSNNPTG IPSNADKIPFHPYYTIKDILGILLLMTTLLTLTLFAPDLLGDPDNYTPANPLSTPPHIKPEWYFLFAYAI LRSIPNKLGGVLALALSILILALIPMLHTSKQRSLMFRPLSQCLFWALIADLLTLTWIGGQPVEHPFIII GQVASILYFLLILVLMPVAGIIENKLLKW
>Cow MTNIRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTTTAFSSVTHIC RDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTG ISSDVDKIPFHPYYTIKDILGALLLILALMLLVLFAPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI LRSIPNKLGGVLALAFSILISALIPLLHTSKQRSMMFRPLSQCLFWALVADLLTLTWIGGQPVEHPYITI GQLASVLYFLLILVLMPTAGTIENKLLKW
>Blue Whale MTNIRKTHPLMKIINDAFIDLPTPSNISSWWNFGSLLGLCLIVQILTGLFLAMHYTPDTMTAFSSVTHIC RDVNYGWVIRYLHANGASMFFICLYAHMGRGLYYGSHAFRETWNIGVILLFTVMATAFVGYVLPWGQMSF WGATVITNLLSAIPYIGTTLVEWIWGGFSVDKATLTRFFAFHFILPFIIMALAIVHLIFLHETGSNNPTG IPSDMDKIPFHPYYTIKDILGALLLILTLLMLTLFAPDLLGDPDNYTPANPLSTPAHIKPEWYFLFAYAI LRSIPNKLGGVLALLLSILVLALIPMLHTSKQRSMMFRPFSQFLFWVLVADLLTLTWIGGQPVEHPYVIV GQLASILYFLLILVLMPVTSLIENKLMKW
STEP FOUR - Capture a copy of the alignment and the tree that results from that alignment.


Feel free to go back to the browser tab that is open to the Taxonomy Report. Check for additional interesting species. It might be worth it to go all the way back to the original search and broaden it beyond mammals. When you are setting up the Blastp parameters, change the "100 results" to "500 results" or use the elephant seal sequence to search for Aves, or Chondrichthyes, or whatever taxa you think might provide a good, unbiased root for your tree.



When preparing files for analysis you should be aware that the tree drawing program constructs a name to label the tree from the information that is on the first line of the FASTA format file. It is important to keep the accession numbers in your notes, but to remove all but the simplest description from that first line. Otherwise you get a messy tree.


 


 

 

 

 

 

 

 

NSF Logo
Marine Biotechnology and Bioinformatics is a teacher professional development program of the Innovative Technology Experiences for Students and Teachers (ITEST) program. This material is based upon work supported by the National Science Foundation (NSF) under Grant No. 0323175 (2004-2006) and Grant No. 0525224 (2006-2009). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.