Qiu Lab Meetings

From EvoBioLabatHunter
Jump to navigation Jump to search

Summer 2016

Rules of Conduct

  1. No eating, drinking, or loud talking in the lab. Socialize in the lobby only.
  2. Be respectful to each other, regardless of level of study
  3. Be on time & responsible. Communicate with the PI if late or absent

Readings & Journal Club

  1. A short introduction to molecular phylogenetics: http://www.ncbi.nlm.nih.gov/pubmed/12801728
  2. The latest tree of life: http://www.nature.com/articles/nmicrobiol201648
  3. Microbiome Initiative: http://mbio.asm.org/content/7/3/e00714-16.full?sid=a47e19d3-10c1-408d-9d56-2cecaa73d585
  4. Evolutionary mechanisms in polio viruses
    1. Fitness landscape at single-nucleotide levels: Acevedo et al (2014)
    2. Recombination facilitates adaptation of polio virus: Xiao et al (2016)
  5. Cancer evolution:
    1. http://sysbio.oxfordjournals.org/content/64/1/e1.long
    2. http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1001789
    3. http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0929-9


Tick work (Lia [leader], Amanda, Saymon [after first-level])

  1. Goal 1. Protocol optimization for DNA prep & PCR. Status: completed
  2. Goal 2. Protocol development: DNA prep & library construction for MiSeq. Status: to be initiated
  3. Goal 3. Tick microbiome project: design of primers for 16S RNA, for pf32. Status: to be initiated

Borrelia plasmid evolution (Saymon [leader], Sharon, Alanna]

  1. Goal 1. Reconcile pf32 tree within Bbss SNP groups
  2. Goal 2. Reconcile pf32 tree within Bbss
  3. Goal 3. Reconcile pf32 tree with Bbsl

Pseudomonas GWAS (Rayees [leader], Roy, Ishmael; with Dr Xavier of MSKCC)

  1. Goal 1. Simulate bacterial genome evolution (ms, SimPop, SimBac; SFS_CODE (http://sfscode.sourceforge.net/SFS_CODE/SFS_CODE_home/SFS_CODE_home.html); AnA-FiTS (http://www.ncbi.nlm.nih.gov/pubmed/23834340)
  2. Goal 2. Simulate phenotype (SimPheno)
  3. Goal 3. Simulate GWAS (e.g., Hapview with phylogenetic correction)

Pathogen genomics pipeline (John [leader], Zawar)

  1. Goal 1. Variant call pipeline (e.g., cortex_var)
  2. Goal 2. Variant database
  3. Goal 3. Website

Existing projects

  1. Treponema genome evolution (Amanda & Roy)
  2. PVT1 evolution & function (Jeff [after first-level)
  3. PhyloHMM algorithm (weigang)
  4. Adaptive dynamics & effect of diversity to Borrelia virulence (Jiangtao & Sipa)

Weekly Schedule

Friday, May 27, 2016. Lab meeting

  • End-of-semster celebration
  • Finalize EEID posters
  • Summer planning

Tuesday, May 31, 2016. Orientation Session 1

  1. Time: 1-5 pm; Room: (to be reserved & posted)
  2. Pre-orientation: Obtain lab accounts (Yozen); Obtain cluster accounts (Carlos)
  3. Day 1. 1:00 - 1:30. Lab overview
  4. Day 1. 1:30 - 2:00. Unix Part 1 (Weigang);
  5. Day 1. 2:00 - 2:30. Lunch break
  6. Day 1. 2:45 - 3:20. BoreliaBase.org (Lia) Slides: File:BorreliaBase-intro.pptx
  7. Day 1. 3:30 - 4:00. bp-utils (Saymon): tutorials
  8. Day 1. 4:00 - 4:30. Servers & cluster usage (Rayees, Tutorial )

Wed, June 1, 2016. Orientation Session 2

  1. Day 2. 1:00 - 2:00. Phylogenetics/Tree Quizzes (Weigang)
  2. Day 2. 2:00 - 2:45. Lunch break
  3. Day 2. 2:45 - 3:15. R (Amanda). Download data set from http://diverge.hunter.cuny.edu/~weigang/data-sets-for-biostat/intern_data.csv2 & save as "rna_seq.csv"
  4. Day 2. 3:30 - 4:00. SQL & SQL-embeded Perl or Python (John)
  5. Day 2. 4:00 - 4:30. Unix Part 2 (Roy)
  6. Day 2. 4:30 - 5:00. Lab Databases: bb3-dev, pa2, genome_var (weigang)
Assignments. (Q1 & Q2 Due 1pm, Wed, June 1st, 2016; The rest Due Noon, Monday, June 7st, 2016)
  1. Log in lab account (first to "darwin.hunter.cuny.edu", then to "wallace") and change password (email me [weigang@genectr.hunter.cuny.edu] if you have trouble logging in)
  2. Unix exercises: U10.1, U14.1, U16.1, U18.1, U27.1 (with emacs or vi), U29.1 & U29.2 (with emacs or vi)
  3. Borreliabase exercises:
    1. Download B31 genome, ORF, and protein sequences
    2. Download ospA ortholog alignments (nucleotide & protein)
    3. Download pf32 paralog alignments
    4. Use BLAST to identify which gene(s) in the B31 genome contain this DNA sequence: "caagattaatattattgcaatgatattaactttaatttgcacctcatgcgcaccttttagcaaaatcgatcctaaagcaaatgcaaacactaagccaaaaaaaatcaccaatccgggggaaaacacccaaaattttgaagataaatctggagaccttagcacttctgatgaaaaaattatggaaactatcgcttcaga"
    5. Use BLAST to identify which genes(s) in the B31 proteome contain this amino-acid sequence:"MGINSTSFYSLNMKVKPLDNVKVRKALSFAIDRKTLTESVLN"
  4. Use bioseq to answer all the questions below. Submit only the command that you used to find the answers.
    1. Use accession # CP002316.1 to retrieve the genbank file from NCBI. Save the output to CP002316.1.gb file.
    2. Extract the sequences in FASTA format from file CP002316.1.gb and save the output to CP002316.1.fas file. Use the file CP002316.1.fas to answer the following question.
    3. Count the number of sequences in the file?
    4. Using one single command, pick the first 10 sequences from the file and find the length of them. (Hint: use pipe)
    5. Using single command, pick third and seventh sequences from the file and then do the 3-frame translation for both sequences. Which reading frame is correct? Specify.
    6. Using a single command, get the first 100 nucleotides of all the sequences present in the file and then do 1-frame translation for all the sub-sequences. (Hint: look for option in bioseq help page that could be use to get the subsequence and 1-frame translation. Use pipe)
  5. Use bioaln for the following exercises. Go to /home/shared/lab_tutorial and find the sequence alignment file named “ospC.aln”. Name the format of the alignment file. Use it to answer all the questions below. Submit only the command that you used to find the answers.
    1. Find the length of the alignment.
    2. Count the number of the sequences present in the alignment.
    3. How do you convert this alignment in phylip format? Save your output.
    4. Pick “B31, N40, BOL26, JD1” from the alignment and calculate their average percent identity. (Hint: look for option in bioaln help page that could be use to pick specific sequence and calculate average percent identity. Use pipe)
    5. Extract third sites from the alignment and show the alignment in match view. (Hint: look for option in bioaln help page that could be use to extract third site. Use pipe)
    6. Remove the gaps from the alignment and show the final alignment in codon view. (Hint: look for option in bioaln help page that could be use to remove gap)
  6. SQL exercises:
    1. Login the borreliabase.org database by typing:psql -h borreliabase.org -U lab -d genome_var
    2. Please write down your command to retrieve what is listed as below (don’t forget that each command should end with a “;”):
    3. Select all columns in the table “varlist” and show the first 10 rows
    4. From table “varlist”, select values stored in the columns “acc”, “refcodon”, “altcodon”, “protein_accession”
    5. In the “varlist” table, select all columns where “proj_id” value is “1” from and count the selection
    6. Select those whose “conf” value is greater than “90” and arrange your selection in an ascending order
    7. For the values in table “var”, write an expression to output the sum of the values in the “coverage” grouped by the values in column “genome_id”, limited to where “status” are all ‘f’, arrange your selection in an ascending order
    8. From table “genome”, select values in column “genus”; from table “var”, select values in column “var_id”, “status”, “conf”; from table “varlist”, select values in column “acc”, “refaa”, “altaa”, then join your selection together. What columns are the keys when you join the table?
  7. Tree Quizzes File:Pretest.pdf
  8. A scripting exercise: Write a Perl or Python script to export SNPs
  9. An R exercise in statistical analysis: Gene expression analysis using the cancer data

Thursday, June 2 to Sunday, June 5. Traveling to EEID meeting (Saymon, Amanda, Rayees, Roy, Weigang)

June 6-10, 2016

Monday, June 6

  • project meeting: Pathogen genome pipeline
    • Team: John (leader), Zawar
  • project meeting: trepnema operon algorithm
    • Team: Amanda (leader), Roy, Fatima
    • Schedule: Monday, Wed, & Friday 12-5
  • project meeting: simulation of evolution of traits
    • Team: Rayees (leader), Ishemael, Jesam
    • Schedule: Monday, Tuesday, & Friday 12-5
  • Project: bp-utils development
    • Team: Rocky; Khalikuz

Tuesday, June 7

  • project meeting: Borrelia genomics
    • Team: Saymon (leader), Sharon
    • Schedule: Tuesday, Thursday, and Friday 12-5

June 13-17, 2016

June 20-24, 2016

June 27-July 1, 2016

July 6 - July 10

July 13 - July 17, 2016. Project conclusion

July 17 - August 20, 2016. PI vacation

School Year 2015

Nov 19, 2015

  • Amanda: Summary of Pseudomonas genome variant finding with cortex_var; Drafting a manuscript (starting with Material * Methods)
  • Roy: Briefing on his Poster presentation at ABRCMS
  • Rayees: PA SNP call done. (meeting with MSKCC at 11am)
  • Weigang: ABRCMS briefing / Tools to check out
    • PRICE: a de novo genome assembler of short reads. Document Page
    • QuickGO: a web browser of GO terms.
    • Pathway Tools: for qualitative prediction of pathogenecity, operons, and pathways
    • PCIRUST: predicting functions of microbial community based on gene contents
  • Saymon, John & Weigang: PopGenome package of R to explore selective sweeps, linkage, and drift
  • Sipa: Presentation on Mathematics models of cancer development

Sept 18, 2015

  • Journal Club: latest statistics in detecting population admixture and genome intragression (d3, f4, h4, ChromosomePainter).[1]. Presenter: Saymon

Sept 11, 2015

  • Journal Club: an in-depth analysis of Staphylococcus aureus genomes. [2] Presenter: John
    • Key terms: SNP, mutation, recombination, linkage disequilibrium (LD), synonymous polymorphism (Pi[s])
    • Key methods: identify recombination (from mutation) using shape-shape changes; four-gamete test to identify breakage point; LD decay (based on r2 and probability of tree compatibility) to quantify r/m ratio
    • Key results: extensive recombination among clones; rates and tract length quantified by LD decay
    • My rating: 4/5. Rigorous analysis of recombination in bacteria, innovative methods, informative and attractive figures; the paper is too long and many statements repetitive, effect of selection hinted but not explored.

Sept 4, 2015

  • Journal Club: a nice review of bacterial population genetics (E.coli model), from protein polymorphisms to whole-genome variations. [3]. Presenter: Amanda
    • Technological history of bacterial population genetics: MLEE -> MLST -> Whole-genome
    • Key terms & concepts: clonality, linkage disequilibrium, recombination, homoplasy, r/m ratio
    • Methods for recombination detection: clustered polymorphism, homoplasy (phylogenetic inconsistency) (a Borrelia data set to understand how to identify homoplasy and recombination)
    • Tools to try: recHMM (detecting homoplatic sites, fine-grained), PHI (per gene detection, coarse), USEARCH (alternative to BLAST)/UCLUST (alternative to CD-HIT), Distance method (? no reference given; can't understand algorithm either)
    • My rating: 4.8/5 (concise, thoughtful & solid review, covering a vast range of history, species, and theory; no apparent theoretical or visual flaws; ending a little pessimistic; implications to the greater biomedical audience is not explored)

Aug 28, 2015

  • Journal Club (12:30-1:30): an recent paper claiming wide-spread gene loss & pseudogenization in bacterial pathogens. [4]. Presenter: Roy
    • Key terms/concepts: pan-genome, pan-genes (core/"near core"/rare), normalized identity (NI), genomic fluidity, pseudogene conservation percent (PCP), AAI (aa identity), effective population size (Ne), Muller's Ratchet
    • Key methods: FASTA for ortholog/paralog identification, PHI (pairwise homoplasy index) for detecting recombination, TFASTA for HGT (gene gain), RAST for gene calls and genome annotation
    • Key findings: bi-modal distribution of pangenes; two clonal species has high genomic fluidity, despite being closely related; little HGT ("rare") but lots of losses ("near core") in clonal species; maintenance of pseudogenes (small Ne)
    • Pluses: large number of genomes; results broadly convincing; rigorous interpretations and discussion
    • Flaws: No phylogenetic reconstruction; no synteny verification; no gene function analysis; no statistical evaluation of the conclusion; bad presentation (figures should be tables and tables should be figures)
    • My overall rating: 3.5/5.0
  • Project updates & plans (1:30-2)
    • Weigang: design statistical tests for 2 hypotheses: (1) any co-occurrence of oc types? (2) lineage-stabilizing genes
    • Saymon: tick-bacteria gene transfer positive; pcr is working for positive controls; need to start testing for nymphs
    • John & Rayyes: pa2 database cleaning nearly done; start polymorphism-by-genome-location analysis
    • Amanda & Roy: Treponema project has a working database, pipeline, and preliminary validated results; start documenting protocals, tabulating results, and prepare functional analysis

Summer 2014

Projects & Goals

Name Goal/Description Team
  • Gene gain/loss
  • SNP analysis
Borrelia intergenics Clean up start-codon positions Example
SNP pipeline Example Example
Gain/Loss pipeline Example Example
  • Frequency distribution of ospC types in wild tick populations (Fall 2013) Project page
  • Mutual information

Summer 2013

Projects & Goals

  • Borrelia population genomics: Recombination & Natural Selection (Published)
  • Borrelia pan-genomics (Submitted as of 5/25/2013)
  • Positive and negative selection in Borrelia ORFs and IGS (Submitted as of 6/15/2013)
  • Dr Bargonetti's project (Summer 2013)
  • A population genomics pipeline using MUGSY-FastTree (Summer 2013): Project page
  • Borrelia Genome Database & Browser (Summer 2013) Version 2 screen shot
  • Pseudomonas population genomics (Summer 2013) Project page
  • Hypothesis Testing: Do host-interacting genes show adaptive codon usage? (Summer 2013): Project page
  • Phylogenomics browsing with JavaScript/JQuery, Ajax, and jsPhylosvg
  • Frequency distribution of ospC types in wild tick populations (Fall 2013) Project page

Lab meeting: June 13, 2013

  • Weigang: IGS paper submission should be done by Thursday.
  • Che/Slav: Workshop update (Meeting at 3:30pm?)
  • Che: SILAC project (Meeting at 4pm?)
  • Zhenmao: Tick processing & paired-end Illumina sequencing
  • Pedro: Updates on "ncbi-orf" table
  • Girish: phyloSVG extension; QuBi video
  • Saymon and Deidre: consensus start-codons
  • Reeyes and Raymond: Pseudomonas DB; fleN alignment and phylogeny
  • Valentyna: BLASTn results (4:30pm?)

Lab meeting: May 23, 2013

  • May 24, Friday: End of School Year Party in the Park (we leave from Hunter @ 1:30pm)
  • Recommended reading of the week: Detecting Neanderthal genes using the D' homoplasy statistic
  • Weigang: IGS paper submission
  • Che: Thesis update/SILAC project/Summer teaching
  • Zhenmao: Manuscript update: Material & Methods; Results (Tables and Figures)
  • Pedro: Catlyst web framework
  • Girish: cp26 phylogenomic analysis
  • Saymon and Deidre: consensus start-codons

Lab meeting: May 16, 2013

  • Weigang: IGS paper submitted yet?
  • Che: Thesis update. Chapter 3. Evolution of ospA/ospB gene family
  • Pedro/Zhenmao: Can we wrap up the BLAST identification of ospC types?
  • Girish: Fetch cp26 sequences from DB; Run MUGSY & FastTree
  • Saymon/Deidre: Identification of consensus start-codon positions
  • Pedro/Girish: orth_get/orth_igs website development. Catalyst. Implement graphics (genome map & phylogeny) query interface
  • Raymond: start the Pseudomonas summer project

Foundational papers for working in Qiu Lab

Informatics Architecture

  • Operating Systems: Linux OS/Ubuntu, Mac OS
  • Programming languages: BASH, Perl/BioPerl, R
  • Relational Databases: PostgreSQL
  • Software architecture
    • bb3: Borrelia Genome Database. To access: psql -h borreliabase.org -U lab bb3
    • Pseudomonas Genome Database. To access: psql -h ortholog -U lab paerug
    • DNATweezer: Perl wrappers of most frequently used BioPerl modules, including Bio::Seq, Bio::SimpleAlign, and Bio::Tree [5]
    • SimBac: A Perl/Moose package for simulating bacterial genome evolution [6]
    • BorreliaBase

Perl Challenges

Problem Input Output
DNA transcription A DNA sequence, in 5'-3' direction (e.g., aaatttaaaagacaaaaagactgctctaagtcttgaaaatttggttttcaaagatgat) An RNA sequence, in 5'-3' direction
Genetic code None 64 codons, one per line (using loops)
Count amino acids A protein sequence Frequency counts of individual amino acids
Count codons A protein-coding DNA sequence Frequency counts of individual codons
Random sequence 1 None Generate a random DNA sequence (e.g., 1000 bases) with equal base frequencies
Random sequence 2 None Generate a random DNA sequence with biased base frequencies, e.g., 10% G, 10% C, 40% T, and 40% A.
Graphics I a categorical dataset, e.g., Biology a bar graph & a pie char, using GD::Simple or Postscript::Simple