Qiu Lab Meetings: Difference between revisions

From QiuLab
Jump to navigation Jump to search
imported>Weigang
imported>Weigang
Line 60: Line 60:
## Borreliabase exercises: Download B31 genome, ORF, and protein sequences; Download ospA ortholog alignments (nucleotide & protein); Download pf32 paralog alignments; BLAST. Show your directory & files
## Borreliabase exercises: Download B31 genome, ORF, and protein sequences; Download ospA ortholog alignments (nucleotide & protein); Download pf32 paralog alignments; BLAST. Show your directory & files
## An exercise on sequence manipulations: to be posted
## An exercise on sequence manipulations: to be posted
## Tree Quizzes  
## Tree Quizzes [[File:Pretest.pdf|Print & hand in]]
## A scripting exercise: Write a Perl or Python script to export SNPs
## A scripting exercise: Write a Perl or Python script to export SNPs
## An R exercise in statistical analysis: Gene expression analysis using the cancer data
## An R exercise in statistical analysis: Gene expression analysis using the cancer data

Revision as of 20:36, 31 May 2016

Summer 2016

Rules of Conduct

  1. No eating, drinking, or loud talking in the lab. Socialize in the lobby only.
  2. Be respectful to each other, regardless of level of study
  3. Be on time & responsible. Communicate with the PI if late or absent

Readings & Journal Club

  1. A short introduction to molecular phylogenetics: http://www.ncbi.nlm.nih.gov/pubmed/12801728
  2. The latest tree of life: http://www.nature.com/articles/nmicrobiol201648
  3. Microbiome Initiative: http://mbio.asm.org/content/7/3/e00714-16.full?sid=a47e19d3-10c1-408d-9d56-2cecaa73d585
  4. Cancer evolution:
    1. http://sysbio.oxfordjournals.org/content/64/1/e1.long
    2. http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1001789
    3. http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0929-9

Projects

Tick work (Lia [leader], Amanda, Saymon [after first-level])

  1. Goal 1. Protocol optimization for DNA prep & PCR. Status: completed
  2. Goal 2. Protocol development: DNA prep & library construction for MiSeq. Status: to be initiated
  3. Goal 3. Tick microbiome project: design of primers for 16S RNA, for pf32. Status: to be initiated

Borrelia plasmid evolution (Saymon [leader], Sharon, Alanna]

  1. Goal 1. Reconcile pf32 tree within Bbss SNP groups
  2. Goal 2. Reconcile pf32 tree within Bbss
  3. Goal 3. Reconcile pf32 tree with Bbsl

Pseudomonas GWAS (Rayees [leader], Roy, Ishmael; with Dr Xavier of MSKCC)

  1. Goal 1. Simulate bacterial genome evolution (ms, SimPop, SimBac; SFS_CODE (http://sfscode.sourceforge.net/SFS_CODE/SFS_CODE_home/SFS_CODE_home.html); AnA-FiTS (http://www.ncbi.nlm.nih.gov/pubmed/23834340)
  2. Goal 2. Simulate phenotype (SimPheno)
  3. Goal 3. Simulate GWAS (e.g., Hapview with phylogenetic correction)

Pathogen genomics pipeline (John [leader], Zawar)

  1. Goal 1. Variant call pipeline (e.g., cortex_var)
  2. Goal 2. Variant database
  3. Goal 3. Website

Existing projects

  1. Treponema genome evolution (Amanda & Roy)
  2. PVT1 evolution & function (Jeff [after first-level)
  3. PhyloHMM algorithm (weigang)
  4. Adaptive dynamics & effect of diversity to Borrelia virulence (Jiangtao & Sipa)

Weekly Schedule

Friday, May 27, 2016. Lab meeting

  • End-of-semster celebration
  • Finalize EEID posters
  • Summer planning

Tuesday & Wed, May 31 & June 1, 2016. Two Orientation Sessions

  1. Time: 1-5 pm; Room: (to be reserved & posted)
  2. Pre-orientation: Obtain lab accounts (Yozen); Obtain cluster accounts (Carlos)
  3. Day 1. 1:00 - 1:30. Lab overview
  4. Day 1. 1:30 - 2:30. Unix Part 1 (Weigang); 30 min lunch break
  5. Day 1. 2:45 - 3:20. BoreliaBase.org (Lia) [Slides]
  6. Day 1. 3:30 - 4:00. bp-utils (Saymon) [Slides]
  7. Day 1. 4:00 - 4:30. Servers & cluster usage (Rayees)
  8. Day 2. 1:30 - 2:30. Unix Part 2 (Weigang); 30 min lunch break
  9. Day 2. 2:30 - 3:00. Phylogenetics/Tree Quizzes (Weigang)
  10. Day 2. 3:00 - 3:30. SQL & SQL-embeded Perl or Python (John)
  11. Day 2. 3:30 - 4:00. R (Amanda)
  12. Day 2. 4:00 - 4:30. Lab Databases: bb3-dev, pa2, genome_var (weigang)
  13. Assignments (Due Noon, Monday, June 7, 2016)
    1. Log in lab account (first to "darwin.hunter.cuny.edu", then to "wallace") and change password (email me [weigang@genectr.hunter.cuny.edu] if you have trouble logging in)
    2. A unix file-filter exercises: U10.1, U14.1, U16.1, U18.1, U27.1 (with emacs), U29.1 & U29.2 (with emacs)
    3. Borreliabase exercises: Download B31 genome, ORF, and protein sequences; Download ospA ortholog alignments (nucleotide & protein); Download pf32 paralog alignments; BLAST. Show your directory & files
    4. An exercise on sequence manipulations: to be posted
    5. Tree Quizzes Print & hand in
    6. A scripting exercise: Write a Perl or Python script to export SNPs
    7. An R exercise in statistical analysis: Gene expression analysis using the cancer data

Thursday, June 2 to Sunday, June 5. Traveling to EEID meeting (Saymon, Amanda, Rayees, Roy, Weigang)

June 6-10, 2016

June 13-17, 2016

June 20-24, 2016

June 27-July 1, 2016

July 6 - July 10

July 13 - July 17, 2016. Project conclusion

July 17 - August 20, 2016. PI vacation

School Year 2015

Nov 19, 2015

  • Amanda: Summary of Pseudomonas genome variant finding with cortex_var; Drafting a manuscript (starting with Material * Methods)
  • Roy: Briefing on his Poster presentation at ABRCMS
  • Rayees: PA SNP call done. (meeting with MSKCC at 11am)
  • Weigang: ABRCMS briefing / Tools to check out
    • PRICE: a de novo genome assembler of short reads. Document Page
    • QuickGO: a web browser of GO terms.
    • Pathway Tools: for qualitative prediction of pathogenecity, operons, and pathways
    • PCIRUST: predicting functions of microbial community based on gene contents
  • Saymon, John & Weigang: PopGenome package of R to explore selective sweeps, linkage, and drift
  • Sipa: Presentation on Mathematics models of cancer development

Sept 18, 2015

  • Journal Club: latest statistics in detecting population admixture and genome intragression (d3, f4, h4, ChromosomePainter).[1]. Presenter: Saymon

Sept 11, 2015

  • Journal Club: an in-depth analysis of Staphylococcus aureus genomes. [2] Presenter: John
    • Key terms: SNP, mutation, recombination, linkage disequilibrium (LD), synonymous polymorphism (Pi[s])
    • Key methods: identify recombination (from mutation) using shape-shape changes; four-gamete test to identify breakage point; LD decay (based on r2 and probability of tree compatibility) to quantify r/m ratio
    • Key results: extensive recombination among clones; rates and tract length quantified by LD decay
    • My rating: 4/5. Rigorous analysis of recombination in bacteria, innovative methods, informative and attractive figures; the paper is too long and many statements repetitive, effect of selection hinted but not explored.

Sept 4, 2015

  • Journal Club: a nice review of bacterial population genetics (E.coli model), from protein polymorphisms to whole-genome variations. [3]. Presenter: Amanda
    • Technological history of bacterial population genetics: MLEE -> MLST -> Whole-genome
    • Key terms & concepts: clonality, linkage disequilibrium, recombination, homoplasy, r/m ratio
    • Methods for recombination detection: clustered polymorphism, homoplasy (phylogenetic inconsistency) (a Borrelia data set to understand how to identify homoplasy and recombination)
    • Tools to try: recHMM (detecting homoplatic sites, fine-grained), PHI (per gene detection, coarse), USEARCH (alternative to BLAST)/UCLUST (alternative to CD-HIT), Distance method (? no reference given; can't understand algorithm either)
    • My rating: 4.8/5 (concise, thoughtful & solid review, covering a vast range of history, species, and theory; no apparent theoretical or visual flaws; ending a little pessimistic; implications to the greater biomedical audience is not explored)

Aug 28, 2015

  • Journal Club (12:30-1:30): an recent paper claiming wide-spread gene loss & pseudogenization in bacterial pathogens. [4]. Presenter: Roy
    • Key terms/concepts: pan-genome, pan-genes (core/"near core"/rare), normalized identity (NI), genomic fluidity, pseudogene conservation percent (PCP), AAI (aa identity), effective population size (Ne), Muller's Ratchet
    • Key methods: FASTA for ortholog/paralog identification, PHI (pairwise homoplasy index) for detecting recombination, TFASTA for HGT (gene gain), RAST for gene calls and genome annotation
    • Key findings: bi-modal distribution of pangenes; two clonal species has high genomic fluidity, despite being closely related; little HGT ("rare") but lots of losses ("near core") in clonal species; maintenance of pseudogenes (small Ne)
    • Pluses: large number of genomes; results broadly convincing; rigorous interpretations and discussion
    • Flaws: No phylogenetic reconstruction; no synteny verification; no gene function analysis; no statistical evaluation of the conclusion; bad presentation (figures should be tables and tables should be figures)
    • My overall rating: 3.5/5.0
  • Project updates & plans (1:30-2)
    • Weigang: design statistical tests for 2 hypotheses: (1) any co-occurrence of oc types? (2) lineage-stabilizing genes
    • Saymon: tick-bacteria gene transfer positive; pcr is working for positive controls; need to start testing for nymphs
    • John & Rayyes: pa2 database cleaning nearly done; start polymorphism-by-genome-location analysis
    • Amanda & Roy: Treponema project has a working database, pipeline, and preliminary validated results; start documenting protocals, tabulating results, and prepare functional analysis

Summer 2014

Projects & Goals

Name Goal/Description Team
Pseudomonas
  • Gene gain/loss
  • SNP analysis
Example
Borrelia intergenics Clean up start-codon positions Example
SNP pipeline Example Example
Gain/Loss pipeline Example Example
  • Frequency distribution of ospC types in wild tick populations (Fall 2013) Project page
  • Mutual information

Summer 2013

Projects & Goals

  • Borrelia population genomics: Recombination & Natural Selection (Published)
  • Borrelia pan-genomics (Submitted as of 5/25/2013)
  • Positive and negative selection in Borrelia ORFs and IGS (Submitted as of 6/15/2013)
  • Dr Bargonetti's project (Summer 2013)
  • A population genomics pipeline using MUGSY-FastTree (Summer 2013): Project page
  • Borrelia Genome Database & Browser (Summer 2013) Version 2 screen shot
  • Pseudomonas population genomics (Summer 2013) Project page
  • Hypothesis Testing: Do host-interacting genes show adaptive codon usage? (Summer 2013): Project page
  • Phylogenomics browsing with JavaScript/JQuery, Ajax, and jsPhylosvg
  • Frequency distribution of ospC types in wild tick populations (Fall 2013) Project page

Lab meeting: June 13, 2013

  • Weigang: IGS paper submission should be done by Thursday.
  • Che/Slav: Workshop update (Meeting at 3:30pm?)
  • Che: SILAC project (Meeting at 4pm?)
  • Zhenmao: Tick processing & paired-end Illumina sequencing
  • Pedro: Updates on "ncbi-orf" table
  • Girish: phyloSVG extension; QuBi video
  • Saymon and Deidre: consensus start-codons
  • Reeyes and Raymond: Pseudomonas DB; fleN alignment and phylogeny
  • Valentyna: BLASTn results (4:30pm?)

Lab meeting: May 23, 2013

  • May 24, Friday: End of School Year Party in the Park (we leave from Hunter @ 1:30pm)
  • Recommended reading of the week: Detecting Neanderthal genes using the D' homoplasy statistic
  • Weigang: IGS paper submission
  • Che: Thesis update/SILAC project/Summer teaching
  • Zhenmao: Manuscript update: Material & Methods; Results (Tables and Figures)
  • Pedro: Catlyst web framework
  • Girish: cp26 phylogenomic analysis
  • Saymon and Deidre: consensus start-codons

Lab meeting: May 16, 2013

  • Weigang: IGS paper submitted yet?
  • Che: Thesis update. Chapter 3. Evolution of ospA/ospB gene family
  • Pedro/Zhenmao: Can we wrap up the BLAST identification of ospC types?
  • Girish: Fetch cp26 sequences from DB; Run MUGSY & FastTree
  • Saymon/Deidre: Identification of consensus start-codon positions
  • Pedro/Girish: orth_get/orth_igs website development. Catalyst. Implement graphics (genome map & phylogeny) query interface
  • Raymond: start the Pseudomonas summer project

Foundational papers for working in Qiu Lab


Informatics Architecture

  • Operating Systems: Linux OS/Ubuntu, Mac OS
  • Programming languages: BASH, Perl/BioPerl, R
  • Relational Databases: PostgreSQL
  • Software architecture
    • bb3: Borrelia Genome Database. To access: psql -h borreliabase.org -U lab bb3
    • Pseudomonas Genome Database. To access: psql -h ortholog -U lab paerug
    • DNATweezer: Perl wrappers of most frequently used BioPerl modules, including Bio::Seq, Bio::SimpleAlign, and Bio::Tree [5]
    • SimBac: A Perl/Moose package for simulating bacterial genome evolution [6]
    • BorreliaBase

Perl Challenges

Problem Input Output
DNA transcription A DNA sequence, in 5'-3' direction (e.g., aaatttaaaagacaaaaagactgctctaagtcttgaaaatttggttttcaaagatgat) An RNA sequence, in 5'-3' direction
Genetic code None 64 codons, one per line (using loops)
Count amino acids A protein sequence Frequency counts of individual amino acids
Count codons A protein-coding DNA sequence Frequency counts of individual codons
Random sequence 1 None Generate a random DNA sequence (e.g., 1000 bases) with equal base frequencies
Random sequence 2 None Generate a random DNA sequence with biased base frequencies, e.g., 10% G, 10% C, 40% T, and 40% A.
Graphics I a categorical dataset, e.g., Biology a bar graph & a pie char, using GD::Simple or Postscript::Simple