Qiu Lab Meetings: Difference between revisions

From QiuLab
Jump to navigation Jump to search
imported>Weigang
imported>Weigang
Line 15: Line 15:


===Pathogen genomics pipeline (John [leader], Zawar)===
===Pathogen genomics pipeline (John [leader], Zawar)===
# Goal 1. Variant call pipeline (e.g., cortex_var)
# Goal 2. Variant database
# Goal 3. Website
==Weekly Schedule==
==Weekly Schedule==
===Friday, May 27, 2016. Lab meeting===
===Friday, May 27, 2016. Lab meeting===

Revision as of 19:48, 26 May 2016

Summer 2016

Projects

Tick work (Lia [leader], Amanda, Saymon [after first-level])

  1. Goal 1. Protocol optimization for DNA prep & PCR. Status: completed
  2. Goal 2. Protocol development: DNA prep & library construction for MiSeq. Status: to be initiated
  3. Goal 3. Tick microbiome project: design of primers for 16S RNA, for pf32. Status: to be initiated

Borrelia plasmid evolution (Saymon [leader], Sharon, Alanna]

  1. Goal 1. Reconcile pf32 tree within Bbss SNP groups
  2. Goal 2. Reconcile pf32 tree within Bbss
  3. Goal 3. Reconcile pf32 tree with Bbsl

Pseudomonas GWAS (Rayees [leader], Roy, Ishmael)

  1. Goal 1. Simulate bacterial genome evolution (ms, SimPop, SimBac)
  2. Goal 2. Simulate phenotype (SimPheno)
  3. Goal 3. Simulate GWAS (e.g., Hapview with phylogenetic correction)

Pathogen genomics pipeline (John [leader], Zawar)

  1. Goal 1. Variant call pipeline (e.g., cortex_var)
  2. Goal 2. Variant database
  3. Goal 3. Website

Weekly Schedule

Friday, May 27, 2016. Lab meeting

  • End-of-semster celebration
  • Summer planning

Tuesday & Wed, May 31 & June 1, 2016. Two Orientation Sessions

  1. Time: 1-5 pm; Room: (to be reserved & posted)
  2. Obtain lab accounts (Yozen)
  3. Servers & cluster usage (Rayees)
  4. bp-utils (Saymon)
  5. Assignments
    1. Download a nucleotide alignment of ospA form borreliabase.org & turn into protein alignment
    2. Write a Perl or Python script to export SNPs

Thursday, June 2 to Sunday, June 5. Traveling to EEID meeting (Saymon, Amanda, Rayees, Roy, Weigang)

June 6-10, 2016

June 13-17, 2016

June 20-24, 2016

June 27-July 1, 2016

July 6 - July 10

July 13 - July 17, 2016. Project conclusion

July 17 - August 20, 2016. PI vacation

School Year 2015

Nov 19, 2015

  • Amanda: Summary of Pseudomonas genome variant finding with cortex_var; Drafting a manuscript (starting with Material * Methods)
  • Roy: Briefing on his Poster presentation at ABRCMS
  • Rayees: PA SNP call done. (meeting with MSKCC at 11am)
  • Weigang: ABRCMS briefing / Tools to check out
    • PRICE: a de novo genome assembler of short reads. Document Page
    • QuickGO: a web browser of GO terms.
    • Pathway Tools: for qualitative prediction of pathogenecity, operons, and pathways
    • PCIRUST: predicting functions of microbial community based on gene contents
  • Saymon, John & Weigang: PopGenome package of R to explore selective sweeps, linkage, and drift
  • Sipa: Presentation on Mathematics models of cancer development

Sept 18, 2015

  • Journal Club: latest statistics in detecting population admixture and genome intragression (d3, f4, h4, ChromosomePainter).[1]. Presenter: Saymon

Sept 11, 2015

  • Journal Club: an in-depth analysis of Staphylococcus aureus genomes. [2] Presenter: John
    • Key terms: SNP, mutation, recombination, linkage disequilibrium (LD), synonymous polymorphism (Pi[s])
    • Key methods: identify recombination (from mutation) using shape-shape changes; four-gamete test to identify breakage point; LD decay (based on r2 and probability of tree compatibility) to quantify r/m ratio
    • Key results: extensive recombination among clones; rates and tract length quantified by LD decay
    • My rating: 4/5. Rigorous analysis of recombination in bacteria, innovative methods, informative and attractive figures; the paper is too long and many statements repetitive, effect of selection hinted but not explored.

Sept 4, 2015

  • Journal Club: a nice review of bacterial population genetics (E.coli model), from protein polymorphisms to whole-genome variations. [3]. Presenter: Amanda
    • Technological history of bacterial population genetics: MLEE -> MLST -> Whole-genome
    • Key terms & concepts: clonality, linkage disequilibrium, recombination, homoplasy, r/m ratio
    • Methods for recombination detection: clustered polymorphism, homoplasy (phylogenetic inconsistency) (a Borrelia data set to understand how to identify homoplasy and recombination)
    • Tools to try: recHMM (detecting homoplatic sites, fine-grained), PHI (per gene detection, coarse), USEARCH (alternative to BLAST)/UCLUST (alternative to CD-HIT), Distance method (? no reference given; can't understand algorithm either)
    • My rating: 4.8/5 (concise, thoughtful & solid review, covering a vast range of history, species, and theory; no apparent theoretical or visual flaws; ending a little pessimistic; implications to the greater biomedical audience is not explored)

Aug 28, 2015

  • Journal Club (12:30-1:30): an recent paper claiming wide-spread gene loss & pseudogenization in bacterial pathogens. [4]. Presenter: Roy
    • Key terms/concepts: pan-genome, pan-genes (core/"near core"/rare), normalized identity (NI), genomic fluidity, pseudogene conservation percent (PCP), AAI (aa identity), effective population size (Ne), Muller's Ratchet
    • Key methods: FASTA for ortholog/paralog identification, PHI (pairwise homoplasy index) for detecting recombination, TFASTA for HGT (gene gain), RAST for gene calls and genome annotation
    • Key findings: bi-modal distribution of pangenes; two clonal species has high genomic fluidity, despite being closely related; little HGT ("rare") but lots of losses ("near core") in clonal species; maintenance of pseudogenes (small Ne)
    • Pluses: large number of genomes; results broadly convincing; rigorous interpretations and discussion
    • Flaws: No phylogenetic reconstruction; no synteny verification; no gene function analysis; no statistical evaluation of the conclusion; bad presentation (figures should be tables and tables should be figures)
    • My overall rating: 3.5/5.0
  • Project updates & plans (1:30-2)
    • Weigang: design statistical tests for 2 hypotheses: (1) any co-occurrence of oc types? (2) lineage-stabilizing genes
    • Saymon: tick-bacteria gene transfer positive; pcr is working for positive controls; need to start testing for nymphs
    • John & Rayyes: pa2 database cleaning nearly done; start polymorphism-by-genome-location analysis
    • Amanda & Roy: Treponema project has a working database, pipeline, and preliminary validated results; start documenting protocals, tabulating results, and prepare functional analysis

Summer 2014

Projects & Goals

Name Goal/Description Team
Pseudomonas
  • Gene gain/loss
  • SNP analysis
Example
Borrelia intergenics Clean up start-codon positions Example
SNP pipeline Example Example
Gain/Loss pipeline Example Example
  • Frequency distribution of ospC types in wild tick populations (Fall 2013) Project page
  • Mutual information

Summer 2013

Projects & Goals

  • Borrelia population genomics: Recombination & Natural Selection (Published)
  • Borrelia pan-genomics (Submitted as of 5/25/2013)
  • Positive and negative selection in Borrelia ORFs and IGS (Submitted as of 6/15/2013)
  • Dr Bargonetti's project (Summer 2013)
  • A population genomics pipeline using MUGSY-FastTree (Summer 2013): Project page
  • Borrelia Genome Database & Browser (Summer 2013) Version 2 screen shot
  • Pseudomonas population genomics (Summer 2013) Project page
  • Hypothesis Testing: Do host-interacting genes show adaptive codon usage? (Summer 2013): Project page
  • Phylogenomics browsing with JavaScript/JQuery, Ajax, and jsPhylosvg
  • Frequency distribution of ospC types in wild tick populations (Fall 2013) Project page

Lab meeting: June 13, 2013

  • Weigang: IGS paper submission should be done by Thursday.
  • Che/Slav: Workshop update (Meeting at 3:30pm?)
  • Che: SILAC project (Meeting at 4pm?)
  • Zhenmao: Tick processing & paired-end Illumina sequencing
  • Pedro: Updates on "ncbi-orf" table
  • Girish: phyloSVG extension; QuBi video
  • Saymon and Deidre: consensus start-codons
  • Reeyes and Raymond: Pseudomonas DB; fleN alignment and phylogeny
  • Valentyna: BLASTn results (4:30pm?)

Lab meeting: May 23, 2013

  • May 24, Friday: End of School Year Party in the Park (we leave from Hunter @ 1:30pm)
  • Recommended reading of the week: Detecting Neanderthal genes using the D' homoplasy statistic
  • Weigang: IGS paper submission
  • Che: Thesis update/SILAC project/Summer teaching
  • Zhenmao: Manuscript update: Material & Methods; Results (Tables and Figures)
  • Pedro: Catlyst web framework
  • Girish: cp26 phylogenomic analysis
  • Saymon and Deidre: consensus start-codons

Lab meeting: May 16, 2013

  • Weigang: IGS paper submitted yet?
  • Che: Thesis update. Chapter 3. Evolution of ospA/ospB gene family
  • Pedro/Zhenmao: Can we wrap up the BLAST identification of ospC types?
  • Girish: Fetch cp26 sequences from DB; Run MUGSY & FastTree
  • Saymon/Deidre: Identification of consensus start-codon positions
  • Pedro/Girish: orth_get/orth_igs website development. Catalyst. Implement graphics (genome map & phylogeny) query interface
  • Raymond: start the Pseudomonas summer project

Foundational papers for working in Qiu Lab


Informatics Architecture

  • Operating Systems: Linux OS/Ubuntu, Mac OS
  • Programming languages: BASH, Perl/BioPerl, R
  • Relational Databases: PostgreSQL
  • Software architecture
    • bb3: Borrelia Genome Database. To access: psql -h borreliabase.org -U lab bb3
    • Pseudomonas Genome Database. To access: psql -h ortholog -U lab paerug
    • DNATweezer: Perl wrappers of most frequently used BioPerl modules, including Bio::Seq, Bio::SimpleAlign, and Bio::Tree [5]
    • SimBac: A Perl/Moose package for simulating bacterial genome evolution [6]
    • BorreliaBase

Perl Challenges

Problem Input Output
DNA transcription A DNA sequence, in 5'-3' direction (e.g., aaatttaaaagacaaaaagactgctctaagtcttgaaaatttggttttcaaagatgat) An RNA sequence, in 5'-3' direction
Genetic code None 64 codons, one per line (using loops)
Count amino acids A protein sequence Frequency counts of individual amino acids
Count codons A protein-coding DNA sequence Frequency counts of individual codons
Random sequence 1 None Generate a random DNA sequence (e.g., 1000 bases) with equal base frequencies
Random sequence 2 None Generate a random DNA sequence with biased base frequencies, e.g., 10% G, 10% C, 40% T, and 40% A.
Graphics I a categorical dataset, e.g., Biology a bar graph & a pie char, using GD::Simple or Postscript::Simple