BIOL200 2013 and Qiu Lab Meetings: Difference between pages

From QiuLab
(Difference between pages)
Jump to navigation Jump to search
imported>Cmartin
 
imported>Weigang
 
Line 1: Line 1:
'''EXPERIMENT # 4'''
==Projects & Goals==
* Borrelia population genomics: Recombination & Natural Selection (Published)
* Borrelia pan-genomics (Submitted)
* Positive and negative selection in Borrelia ORFs and IGS (In submission)
* Dr Bargonetti's project (Summer 2013)
* A population genomics pipeline using MUGSY-FastTree (Summer 2013): [[Population_Genomics_Course|Project page]]
* Borrelia Genome Database & Browser (Summer 2013) [[media:Web.png|Version 2 screen shot]]
* Pseudomonas population genomics (Summer 2013) [[Pseudomonas_population_genomics|Project page]]
*Hypothesis Testing: Do host-interacting genes show adaptive codon usage? (Summer 2013): [[Borrelia_codon_usage|Project page]]
* Phylogenomics browsing with JavaScript/JQuery, Ajax, and [http://http://www.jsphylosvg.com/ jsPhylosvg]
* Frequency distribution of ospC types in wild tick populations (Fall 2013) [[strain_natural_frequency|Project page]]
----


'''BIOL 200 Cell Biology II LAB, Spring 2013'''
==Lab meeting: June 13, 2013==
* Weigang: IGS paper submission should be done by Thursday.
* Che/Slav: Workshop update (Meeting at 3:30pm?)
* Che: SILAC project (Meeting at 4pm?)
* Zhenmao: Tick processing & paired-end Illumina sequencing
* Pedro: Updates on "ncbi-orf" table
* Girish: phyloSVG extension; QuBi video
* Saymon and Deidre: consensus start-codons
* Reeyes and Raymond: Pseudomonas DB; fleN alignment and phylogeny
* Valentyna: BLASTn results (4:30pm?)


Hunter College of the City University of New York
==Lab meeting: May 23, 2013==
* <font color="red">May 24, Friday: End of School Year Party in the Park (we leave from Hunter @ 1:30pm)</font>
* Recommended reading of the week: [http://www.genetics.org/content/194/1/199.abstract Detecting Neanderthal genes using the D' homoplasy statistic]
* Weigang: IGS paper submission
* Che: Thesis update/SILAC project/Summer teaching
* Zhenmao: Manuscript update: Material & Methods; Results (Tables and Figures)
* Pedro: Catlyst web framework
* Girish: cp26 phylogenomic analysis
* Saymon and Deidre: consensus start-codons
----


==Course information==
==Lab meeting: May 16, 2013==
'''Instructors:''' TBD
* Weigang: IGS paper submitted yet?
 
* Che: Thesis update. Chapter 3. Evolution of ospA/ospB gene family
'''Class Hours:''' Room TBD HN; TBD
* Pedro/Zhenmao: Can we wrap up the BLAST identification of ospC types?
 
* Girish: Fetch cp26 sequences from DB; Run MUGSY & FastTree
'''Office Hours:''' Room 830 HN; Thursdays 2-4pm or by appointment
* Saymon/Deidre: Identification of consensus start-codon positions
 
* Pedro/Girish: orth_get/orth_igs website development. Catalyst. Implement graphics (genome map & phylogeny) query interface
'''Contact information:'''
* Raymond: start the Pseudomonas summer project
* Dr. Weigang Qiu: weigang@genectr.hunter.cuny.edu, 1-212-772-5296
----
 
==Foundational Readings==
 
* Molecular phylogenetics
 
* Population genetics
==Experiment #4==
* Genomics
 
* Systems Biology
 
----
===<span style="color: DodgerBlue;font-weight:bold;font-size:large;">The Tree of Life and Molecular Identification of Microorganisms<span>===
==Informatics Architecture==
 
* Operating Systems: Linux OS/Ubuntu, Mac OS
===Objective===
* Programming languages: BASH, Perl/BioPerl, R
<span style="color: Crimson;font-weight:bold;">To classify microorganisms and determine their relatedness using molecular sequences.</span>
* Relational Databases: PostgreSQL
 
* Software architecture
===LAB REPORT GRADING GUIDE===
** bb2: Borrelia Genome Database
CELL BIO II Experiment #4:
** bb2i: an Perl API for bb2
*'''Introduction'''<span style="font-weight:bold;color:OrangeRed;"> 1 point</span> ''':'''
** DNATweezer: Perl wrappers of most frequently used BioPerl modules, including Bio::Seq, Bio::SimpleAlign, and Bio::Tree [https://sourceforge.net/p/dnatwizzer/home/Home/]
  Statement of objectives or aims of the experiment in the student’s own words.
** SimBac: A Perl/Moose package for simulating bacterial genome evolution [http://sourceforge.net/projects/bacsim/files/]
  (not to be copied from the Lab Manual)
** Borrelia Ortholog Retriever: Download ortholog alignments from 23 Borrelia spp genomes. Search by gene names and IDs.[http://borreliagenome.org/orth_get/]
*'''MATERIALS AND METHODS'''<span style="font-weight:bold;color:OrangeRed;"> 0 points</span> ''':'''
* Hardware Setup
  This should be a brief synopsis and must include any changes or deviations
** NSF File Server
  from the procedures outlined in the Lab Manual. Specify which organisms were
** Database and Application Server
  used to create the phylogram.
** Web Server
*'''RESULTS'''<span style="font-weight:bold;color:OrangeRed;"> 4 points</span> ''':'''
** Linux Workstations
  A print out of the phylogram will suffice.
----
*'''DISCUSSION'''<span style="font-weight:bold;color:OrangeRed;"> 4 points</span> ''':'''
==Perl Challenges==
  Responses to discussion questions.
*'''SUMMARY |CONCLUSION'''<span style="font-weight:bold;color:OrangeRed;"> 1 point</span> ''':'''
  Two sentence summary of your findings.
*'''REFERENCES'''<span style="font-weight:bold;color:OrangeRed;"> 1 point</span> ''':'''
  Credit is given for pertinent references obtained from sources other than the Lab Manual.
  This point is in addition to the 10 for the lab report..
 
===INTRODUCTION===
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
! Introduction
|- style="background-color:powderblue;"
| Evolution can be defined as descent with modification.  In other words, changes in the nucleotide sequence of an organsim’s genomic DNA is inherited by the next generation.  According to this, all organisms are related through descent from an ancestor that lived in the distant past.  Since that time, about 4 billion years ago, life has undergone an extensive process of change as new kinds of organisms arose from other kinds existing in the past.<br /> The evolutionary history of a group is called a phylogeny, and can be represented by a phylogram (Figure 1).  A major goal of evolutionary analysis is to understand this history.  We do not have direct knowledge of the path of evolution, as by definition, extinct organisms no longer exist.  Therefore, phylogeny must be inferred indirectly.  Originally, evolutionary analysis was based upon the organisms’ morphology and metabolism.  This is the basis for the Linnaean classification scheme (the “Five Kingdoms” scheme).  However, this method can lead to mistaken relationships.  Different species living in the same environment may have similar morphologies in order to deal with specific environmental factors.  Thus these similarities have nothing to do with how related the organisms are, but are a direct result of shared surroundings.  However, with the advent of genomics, organisms can be grouped based upon their sequence relatedness.  Since evolution is a process of inherited nucleotide change, analyzing DNA sequence differences allows for the reconstruction of a better phylogenetic history.<br/>
|-
|[[File:TreeLife.PNG|center|alt=The Tree of Life.|Tree of life based on 16S ribosomal RNA (image credit: NR Pace, Science 1997)]]
|-style="background-color:powderblue;"
|Of course, when comparing DNA sequences, the question of which genes to use arises.  The most widely used genes are those coding for the 16S rRNA gene in prokaryotes and the 18S rRNA gene in eukaryotes.  These genes code for small subunit ribosomal RNA and are used for evolutionary analysis because they 1) are found in all organisms, 2) are functionally conserved, 3) vary only slightly between organisms (their nucleotide sequence changed slowly throughout evolution), and 4) have adequate length.  In this lab, you will be performing evolutionary analysis by constructing a phylogram of 15 microbes spanning bacteria, archaea and eukarya.  You will find and download rRNA sequences, align them and use that alignment to create a phylogram.
|}
 
===MATERIALS===
*'''Required hardware:''' Computer
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
! Procedure
|- style="background-color:powderblue;"
|
#Examine Table I, select representative species from Bergey’s Manual.  Select 2 prokaryotic species from each group, giving 14 prokaryotic species total.  Also select the Eukaryotic representative, Saccharomyces cerevisiae.
#Access the NCBI website: http://www.ncbi.nlm.nih.gov/  
#Under the “Search” category, select “Nucleotide”
#Under the “for” category, type the accession number for your first organism, and hit the “Go” button.  This takes you to the access for the 16S rRNA for your organism.
#Download the 16S rRNA sequence for your first organisms by choosing “FASTA” under the “Display” category.
#Copy and paste the entire output into a Microsoft Word file.
#Edit the sequence id to match the format of “Genus_Species_Genbank#” (eg. > Escherichia_coli_174375).
#Repeat process for all of your organisms, pasting the sequences into the same Microsoft Word file. (note: be sure to place a blank line between each sequence entry)
#Access the EMBL CLUSTALW alignment website: http://www.ebi.ac.uk/Tools/clustalw/, and copy and paste your entire Microsoft Word file into the area which asks you to “Enter or paste a set of sequences in any supported format”. Click “Run”.  This program will make an alignment of all of your sequences.
#Click “Show as Phylogram Tree” to create a tree showing the relatedness of your organisms based on their 16S rRNA sequences.
#To print your phylogram tree..
#*a. hit the “Print Screen” button on your keyboard
#*b. open the Paint program from your “accessories” menu on your computer
#* c. hit paste to paste your screen
#* d. “select” your phylogram tree
#* e. copy and paste it into a new paint file
#* f. print your tree and email it to yourself
|}
 
===Table 1===
{| class="wikitable"
{| class="wikitable"
! Problem
! Input
! Output
|-
|-
| colspan="2" |
| DNA transcription
'''Volume 1A (Gram-negative bacteria)'''
| A DNA sequence, in 5'-3' direction (e.g., aaatttaaaagacaaaaagactgctctaagtcttgaaaatttggttttcaaagatgat)
 
| An RNA sequence, in 5'-3' direction
|-
|
''Escherichia coli''
 
|
ACCESSION #174375
 
|-
|
''Helicobacter pylori''
 
|
ACCESSION #402670
 
|-
|-
|
| Genetic code
''Salmonella typhi''
| None
 
| 64 codons, one per line (using loops)
|
ACCESSION #2826789
 
|-
|-
|
| Random sequence 1
''Serratia marcescens''
| None
 
| Generate a random DNA sequence (e.g., 1000 bases) with equal base frequencies
|
ACCESSION #4582213
 
|-
|-
|
| Random sequence 2
''Treponema pallidum''
| None
 
| Generate a random DNA sequence with biased base frequencies, e.g., 10% G, 10% C, 40% T, and 40% A.
|
ACCESSION #176249
 
|-
|-
| colspan="2" |
| Graphics I
Additional species: ''Agrobacterium tumefaciens, Boredetella pertussis, Thermus aquaticus, Yersinia pestis, Borrelia burgdorferi. '''''(Note: To search for unlisted 16S sequences, type key words such as “yersinia<nowiki> AND 16S [gene]” in the NCBI </nowiki>GenBank search box.)'''
| a categorical dataset, e.g., Biology
 
| a bar graph & a pie char, using GD::Simple or Postscript::Simple
|-
| colspan="2" |
'''Volume 1B (Rikettsias and endosymbionts)'''
 
|-
|
''Baronella bacilliformis''
 
|
ACCESSION #173825
 
|-
|
''Chlamydia trachomatis''
 
|
ACCESSION #2576240
 
|-
|
''Rickettsia rickettsii''
 
|
ACCESSION #538436
 
|-
| colspan="2" |
Additional species: ''Coxiella burnetii, Thermoplasma acidophilum''
 
|-
| colspan="2" |
'''Volume 2A (Gram-positive bacteria)'''
 
|-
|
''Bacillus subtilis''
 
|
ACCESSION #8980302
 
|-
|
''Dinococcus radiodurans''
 
|
ACCESSION #145033
 
|-
|
''Staphylococcus aureus''
 
|
ACCESSION #576603
 
|-
| colspan="2" |
Additional species: ''Bacillus anthracis, Clostridium botulinum, Lactobacillus acidophilus, Streptococcus pyogenes''
 
|-
| colspan="2" |
'''Volume 2B (Mycobacteria and nocardia)'''
 
|-
|
''Mycobacterium haemophilum''
 
|
ACCESSION #406086
 
|-
|
''Mycobacterium tuberculosis''
 
|
ACCESSION #3929878
 
|-
| colspan="2" |
Additional species: ''Mycobacterium bovis, Nocardia orientalis''
 
|-
| colspan="2" |
'''Volume 3A (Phototrophs, chemolithotrophs, sheathed bacteria, gliding bacteria)'''
 
|-
|
''Anabaena sp.''
 
|
ACCESSION #39010
 
|-
|
''Cytophaga latercula''
 
|
ACCESSION #37222646
 
|-
|
''Nitrobacter wiogradskyi''
 
|
ACCESSION #402722
 
|-
| colspan="2" |
Additional species: ''Heliothrix oregonensis, Myxococcus fulvus, Thiobacillus ferrooxidans''
 
|-
| colspan="2" |
'''Volume 3B (Archeobaceria)'''
 
|-
|
''''Methanococcus jannaschii''
 
|
ACCESSION #175446
 
|-
|
''Thermotoga subterranean''
 
|
ACCESSION #915213
 
|-
| colspan="2" |
Additional species: ''Desulfurococcus mucosus, Halobacterium salinarium, Pyrococcus woesei''
 
|-
| colspan="2" |
'''Volume 4 (Actinomycetes)'''
 
|-
|
''Actinomyces bowdenii''
 
|
ACCESSION #6456800
 
|-
|
''Actinomyces neuii''
 
|
ACCESSION #433527
 
|-
|
''Actinomyces turicensis''
 
|
ACCESSION #642970
 
|-
| colspan="2" |
Eukaryotic representative (used as outgroup for rooting the phylogenetic tree)
 
|-
|
''Saccharomyces cerevisiae''
 
|
ACCESSION #172403
 
|}
 
===ANALYSIS===
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
! Analyzing your phylogram
|- style="background-color:powderblue;"
| A phyolgram is composed of nodes and branches (Figure 2). The internal nodes represent extinct ancestors, and the tips of the branches, also called nodes, are individual strains of microorganisms that exist now, and from which the sequence data were obtained.  The internal nodes are points in evolution where an extinct ancestor diverged into two new entities, each of which began to accumulate differences during its subsequent independent evolution.<br/>
The branches define the order of descent and the ancestry of the nodes.  The branch length represents the number of changes that have occurred along that branch.  Thus, the more recently two organisms share a common ancestor, the more closely related they are. Trees can be either “unrooted” or “rooted”. Unrooted trees show the relationships among the microorganisms under study, but not the evolutionary path leading from an ancestor to a strain.<br/>
|-
|
[[ File:Phylo.PNG|center|Phylogram with internal nodes (a, b, c, d) and tips (1, 2, 3, 4, 5).  Nodes at the tips are species that exist today, and internal nodes are extinct ancestors.]]
|-style="background-color:powderblue;"
|A rooted tree shows the unique path from an ancestor (internal node) to each strain.  Trees are rooted by inclusion of an outgroup in the analysis.  An outgroup is an organism that is less closely related to the other organisms under study than the organisms are to each other.
|}
 
===DISCUSSION===
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
! Discussion Questions
|-style="background-color:powderblue;"
|
#Answer the following questions based on a Tree of Life shown in Figure 1.
#*a. What do internal and terminal nodes represent?
#*b. What do branch lengths represent? What’s the unit and meaning of the scale bar?
#*c. Identify the positions of Humans (Homo), corn (Zea), E.coli, and Bacillus on the tree. Use the scale bar to estimate which pair is evolutionarily more distant: human/corn or E.coli/Bacillus?
#In Figure 2, which two species are more closely related: 1 and 2, 2 and 3, or 1 and 4?  Which are more distantly related?  How did you determine this?
#In Figure 2, is 1 more, less, or equally related to 4 and 5? Explain your rationale.
#List and describe the key steps of constructing a phylogenetic tree.
#Why do we use 18S rRNA information for yeast and 16S for prokaryotes?  Could we use other molecules as phylogenetic markers?  What constitutes a “good” phylogenetic marker for building a tree of life?
#'''Bonus Question'''
#*Define 16S “phylo-species” and “metagenomics”.  Describe how PCR amplification and sequencing of 16S rRNA molecules from environmental microbial samples (e.g., sea water, soil, human gut, hot springs) can be used to define species composition of an environment.
|}
|}
===References===
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
! Reference & Resource
|-style="background-color:powderblue;"
|
#Jungck, J. R.; Fass, M.F.; Stanley, E. D. (ed.). 2003 (2006 Revision). Microbes Count! Problem Posing, Problem Solving, and Peer Persuasion in Microbiology. BioQUEST Curriculum Consortium. (Chapter 6, pg 191)
#Holt. J. G. Editor-in-Chief (1984). Bergey’s Manual of Systematic Bacteriology, Volume 1-4. Williams & Wilkins: Baltimore. http://www.cme.msu.edu/bergeys/pubinfo.html
|}
© Weigang Qiu, Hunter College, Last Update Jan 2013

Revision as of 17:44, 11 June 2013

Projects & Goals

  • Borrelia population genomics: Recombination & Natural Selection (Published)
  • Borrelia pan-genomics (Submitted)
  • Positive and negative selection in Borrelia ORFs and IGS (In submission)
  • Dr Bargonetti's project (Summer 2013)
  • A population genomics pipeline using MUGSY-FastTree (Summer 2013): Project page
  • Borrelia Genome Database & Browser (Summer 2013) Version 2 screen shot
  • Pseudomonas population genomics (Summer 2013) Project page
  • Hypothesis Testing: Do host-interacting genes show adaptive codon usage? (Summer 2013): Project page
  • Phylogenomics browsing with JavaScript/JQuery, Ajax, and jsPhylosvg
  • Frequency distribution of ospC types in wild tick populations (Fall 2013) Project page

Lab meeting: June 13, 2013

  • Weigang: IGS paper submission should be done by Thursday.
  • Che/Slav: Workshop update (Meeting at 3:30pm?)
  • Che: SILAC project (Meeting at 4pm?)
  • Zhenmao: Tick processing & paired-end Illumina sequencing
  • Pedro: Updates on "ncbi-orf" table
  • Girish: phyloSVG extension; QuBi video
  • Saymon and Deidre: consensus start-codons
  • Reeyes and Raymond: Pseudomonas DB; fleN alignment and phylogeny
  • Valentyna: BLASTn results (4:30pm?)

Lab meeting: May 23, 2013

  • May 24, Friday: End of School Year Party in the Park (we leave from Hunter @ 1:30pm)
  • Recommended reading of the week: Detecting Neanderthal genes using the D' homoplasy statistic
  • Weigang: IGS paper submission
  • Che: Thesis update/SILAC project/Summer teaching
  • Zhenmao: Manuscript update: Material & Methods; Results (Tables and Figures)
  • Pedro: Catlyst web framework
  • Girish: cp26 phylogenomic analysis
  • Saymon and Deidre: consensus start-codons

Lab meeting: May 16, 2013

  • Weigang: IGS paper submitted yet?
  • Che: Thesis update. Chapter 3. Evolution of ospA/ospB gene family
  • Pedro/Zhenmao: Can we wrap up the BLAST identification of ospC types?
  • Girish: Fetch cp26 sequences from DB; Run MUGSY & FastTree
  • Saymon/Deidre: Identification of consensus start-codon positions
  • Pedro/Girish: orth_get/orth_igs website development. Catalyst. Implement graphics (genome map & phylogeny) query interface
  • Raymond: start the Pseudomonas summer project

Foundational Readings

  • Molecular phylogenetics
  • Population genetics
  • Genomics
  • Systems Biology

Informatics Architecture

  • Operating Systems: Linux OS/Ubuntu, Mac OS
  • Programming languages: BASH, Perl/BioPerl, R
  • Relational Databases: PostgreSQL
  • Software architecture
    • bb2: Borrelia Genome Database
    • bb2i: an Perl API for bb2
    • DNATweezer: Perl wrappers of most frequently used BioPerl modules, including Bio::Seq, Bio::SimpleAlign, and Bio::Tree [1]
    • SimBac: A Perl/Moose package for simulating bacterial genome evolution [2]
    • Borrelia Ortholog Retriever: Download ortholog alignments from 23 Borrelia spp genomes. Search by gene names and IDs.[3]
  • Hardware Setup
    • NSF File Server
    • Database and Application Server
    • Web Server
    • Linux Workstations

Perl Challenges

Problem Input Output
DNA transcription A DNA sequence, in 5'-3' direction (e.g., aaatttaaaagacaaaaagactgctctaagtcttgaaaatttggttttcaaagatgat) An RNA sequence, in 5'-3' direction
Genetic code None 64 codons, one per line (using loops)
Random sequence 1 None Generate a random DNA sequence (e.g., 1000 bases) with equal base frequencies
Random sequence 2 None Generate a random DNA sequence with biased base frequencies, e.g., 10% G, 10% C, 40% T, and 40% A.
Graphics I a categorical dataset, e.g., Biology a bar graph & a pie char, using GD::Simple or Postscript::Simple