imported>Weigang |
imported>Weigang |
Line 1: |
Line 1: |
| <center>'''BIOL47120 Biomedical Genomics II'''</center> | | <center>Bioinformatics Boot Camp for Ecology & Evolution: '''Pathogen Evolutionary Genomics'''</center> |
| <center>Spring 2020, Saturdays 9-12 noon, Hunter North Building 1001G</center> | | <center>Thursday, Aug 6, 2020, 2 - 3:30pm</center> |
| <center>'''Instructor:''' Weigang Qiu, Ph.D., Professor, Department of Biological Sciences, Hunter College, CUNY; '''Email:''' weigang@genectr.hunter.cuny.edu</center> | | <center>'''Instructors:''' Dr Weigang Qiu & Ms Saymon Akther</center> |
| <center>'''T.A.:''' Christopher Panlasigui; Hunter College; '''Email:''' christopher.panlasigui47@myhunter.cuny.edu</center> | | <center>'''Email:''' weigang@genectr.hunter.cuny.edu</center> |
| <center>'''Office:''' B402 Belfer Research Building, 413 East 69th Street, New York, NY 10021, USA; '''Office hour''': Wed 3-5pm</center>
| | <center>'''Lab Website:''' http://diverge.hunter.cuny.edu/labwiki/</center> |
| <center> | | <center> |
| {| class="wikitable" | | {| class="wikitable" |
| |- | | |- |
| ! MA plot !! Volcano plot !! Heat map | | ! Lyme Disease (Borreliella) !! CoV Genome Tracker !! Coronavirus evolutuon |
| |- | | |- |
| | [[File:GeneExp1.jpeg|300px|thumbnail| fold change (y-axis) vs. total expression levels (x-axis)]] || | | | [[File:Lp54-gain-loss.png|300px|thumbnail| Gains & losses of host-defense genes among Lyme pathogen genomes (Qiu & Martin 2014)]] || |
| [[File:GeneExp2.jpeg|300px|thumbnail| p-value (y-axis) vs. fold change (x-axis)]] | | [[File:Cov-screenshot-1.png|300px|thumbnail| [http://cov.genometracker.org/ Haplotype network] ]] |
| || | | || |
| [[File:GeneExp3.jpeg|300px|thumbnail| genes significantly down or up-regulated (at p<1e-4)]] | | [[File:Cov-screenshot-2.png|300px|thumbnail| Spike protein alignment ]] |
| |} | | |} |
| </center> | | </center> |
| ==Course Overview==
| | ---- |
| Welcome to Introductory BioMedical Genomics, a seminar course for advanced undergraduates and graduate students. A genome is the total genetic content of an organism. Driven by breakthroughs such as the decoding of the first human genome and rapid DNA and RNA-sequencing technologies, biomedical sciences are undergoing a rapid & irreversible transformation into a highly data-intensive field, that requires familiarity with concepts in both biology, computational, and data sciences.
| |
|
| |
|
| Genome information is revolutionizing virtually all aspects of life sciences including basic research, medicine, and agriculture. Meanwhile, use of genomic data requires life scientists to be familiar with concepts and skills in biology, computer science, as well as statistics.
| | ==Case studies from Qiu Lab== |
| | * [http://borreliabase.org Comparative genomics of worldwide Lyme disease pathogens] |
| | * [http://cov.genometracker.org Covid-19 Genome Tracker] |
|
| |
|
| This workshop is designed to introduce computational analysis of genomic data through hands-on computational exercises. Students are expected to be able to replicate key results of data analysis from published studies.
| | ==CoV genome data set== |
| | * N=565 SARS-CoV-2 genomes collected during January & February 2020. Data source & acknowledgement [http://gisaid.org GIDAID] (<em>Warning: You need to acknowledge GISAID if you reuse the data in any publication</em>) |
| | * Download file: [http://diverge.hunter.cuny.edu/~weigang/qiu-akther.tar.gz data file] |
| | * Create a directory, unzip, & un-tar |
| | <syntaxhighlight lang='bash'> |
| | mkdir QiuAkther |
| | mv cov-camp.tar.gz QiuAkther/ |
| | cd QiuAkther |
| | tar -tzf cov-camp.tar.gz # view files |
| | tar -xzf cov-camp.tar.gz # un-zip & un-tar |
| | </syntaxhighlight> |
| | * View files |
| | <syntaxhighlight lang='bash'> |
| | file TCS.jar |
| | ls -lrt # long list, in reverse timeline |
| | less Jan-Feb.mafft # an alignment of 565 CoV2 genomes in FASTA format; "q" to quit |
| | less cov-565strains-617snvs.phy # non-gapped SNV alignment in PHYLIP format |
| | wc hap.txt # geographic origins |
| | head hap.txt |
| | wc group.txt # color assignment |
| | cat group.txt |
| | less cov-565strains.gml # graph file (output) |
| | </syntaxhighlight> |
|
| |
|
| The pre-requisites of the course are college-level courses in molecular biology, cell biology, and genetics. Introductory courses in computer programming and statistics are preferred but not strictly required.
| | ==Bioinformatics Tools & Learning Goals== |
| | * BpWrapper: commandline tools for sequence, alignment, and tree manipulations (based on BioPerl). |
| | ** [https://github.com/bioperl/p5-bpwrapper Github Link] |
| | ** [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2074-9/figures/1 Flowchart from publication] |
| | * Haplotype network with TCS [https://pubmed.ncbi.nlm.nih.gov/11050560/ PubMed link] |
| | * Web-interactive visualization with [http://D3js.org D3js] |
| | ** [https://github.com/sairum/tcsBU Github link] |
| | ** [https://cibio.up.pt/software/tcsBU/index.html Web tool] |
| | ** [https://academic.oup.com/bioinformatics/article/32/4/627/1744448 Paper] |
|
| |
|
| ==Learning goals== | | ==Tutorial== |
| By the end of this course successful students will be able to:
| | * 2-2:30: Introduction on pathogen phylogenomics |
| * Describe next-generation sequencing (NGS) technologies & contrast it with traditional Sanger sequencing | | * 2:30-2:45: Demo: sequence manipulation with BpWrapper |
| * Explain applications of NGS technology including pathogen genomics, cancer genomics, human genomic variation, transcriptomics, meta-genomics, epi-genomics, and microbiome.
| | <syntaxhighlight lang='bash'> |
| * Visualize and explore genomics data using R & RStudio
| | bioseq --man |
| * Replicate key results using a raw data set produced by a primary research paper
| | bioseq -n Jan-Feb.mafft |
| | | bioaln --man |
| ==Web Links==
| | bioaln -n -i'fasta' Jan-Feb.mafft |
| * Install R base: https://cloud.r-project.org
| | bioaln -l -i'fasta' Jan-Feb.mafft |
| * Install R Studio (Desktop version): http://www.rstudio.com/download
| | bioaln -n -i'phylip' cov-565strains-617snvs.phy |
| * Textbook: [http://r4all.org/#about Introduction to R for Biologists]
| | bioaln -l -i'phylip' cov-565strains-617snvs.phy |
| * Download: [http://www.r4all.org/books/datasets R datasets] | | FastTree -nt cov-565strains-617snvs.phy > cov.dnd |
| * A reference book: [https://r4ds.had.co.nz/ R for Data Science (Wickharm & Grolemund)]
| | biotree --man |
| | | biotree -n cov.dnd |
| ==Quizzes and Exams==
| | biotree -l cov.dnd |
| Student performance will be evaluated by attendance, weekly assignments, quizzes, and a final report:
| | <syntaxhighlight> |
| * Attendance & In-class participation: 50 pts
| | * 2:45-3:00: build haplotype network with TCS |
| * Assignments: 5 x 10 = 50 pts
| | <syntaxhighlight lang='bash'> |
| * Quizzes: 2 x 25 pts = 50 pts
| | java -jar -Xmx1g TCS.jar |
| * Mid-term: 50 pts
| | <syntaxhighlight> |
| * Final presentation & report: 100 pts
| | * 3:00-3:15: interactive visualization with BuTCS |
| Total: 300 pts
| | * 3:15-3:30: Q & A |
| | |
| ==Tips for Success==
| |
| To maximize the your experience we strongly recommend the following strategies:
| |
| * Follow the directions for efficiently, finding high-impact papers, reading science research papers and preparing presentations.
| |
| * Read the papers, watch required videos and do the exercises regularly, long before you attend class.
| |
| * Attend all classes, as required. Late arrival results in loss of points.
| |
| * Keep up with online exercises. Don’t wait until the due date to start tasks.
| |
| * Take notes or annotate slides while attending the lectures.
| |
| * Listen actively and participate in class and in online discussions.
| |
| * Review and summarize material within 24 hrs after class.
| |
| * Observe the deadlines for submitting your work. Late submissions incur penalties.
| |
| * Put away cell phones, do not TM, email or play computer games in class.
| |
| | |
| ==Hunter/CUNY Policies== | |
| * Policy on Academic Integrity
| |
| Hunter College regards acts of academic dishonesty (e.g., plagiarism, cheating on homework, online exercises or examinations, obtaining unfair advantage, and falsification of records and official documents) as serious offenses against the values of intellectual honesty. The College is committed to enforcing the CUNY Policy on Academic Integrity, and we will pursue cases of academic dishonesty according to the Hunter College Academic Integrity Procedures. Students will be asked to read this statement before exams.
| |
| | |
| * ADA Policy
| |
| In compliance with the American Disability Act of 1990 (ADA) and with Section 504 of the Rehabilitation Act of 1973, Hunter College is committed to ensuring educational parity and accommodations for all students with documented disabilities and/or medical conditions. It is recommended that all students with documented disabilities (Emotional, Medical, Physical, and/or Learning) consult the Office of AccessABILITY, located in Room E1214B, to secure necessary academic accommodations. For further information and assistance, please call: (212) 772- 4857 or (212) 650-3230.
| |
| | |
| * Syllabus Policy
| |
| Except for changes that substantially affect implementation of the evaluation (grading) statement, this syllabus is a guide for the course and is subject to change with advance notice, announced in class or posted on Blackboard.
| |
| | |
| ==Course Schedule==
| |
| ===Feb 1, 2020===
| |
| * Introduction
| |
| * R Tutorial 1: Use interface, basic operations, load data. Slides: [[File:R-part-1.pdf|thumbnail]]
| |
| {| class="wikitable sortable mw-collapsible"
| |
| ! Assignment 1 (10 pts; Due next class 2/8, in hard copy)
| |
| |-
| |
| |
| |
| * (3 pts) Print a copy of your first R script, with proper annotations
| |
| * (3 pts) Transform the following "untidy/wide" table into a "tidy/tall" table (print a hard copy)
| |
| <pre>
| |
| PropertyName,Density_250m,Density_500m,Density_1000m
| |
| HighbridgePark,0.006561319,0.009462031,0.010578611
| |
| BronxRiverParkway,0.001318749,0.001978858,0.002652118
| |
| CrotonaPark,0.009412087,0.01164712,0.01202321
| |
| ClaremontPark,0.016391948,0.019972485,0.020350481
| |
| VanCortlandtPark,0.000550151,0.000979312,0.001372675
| |
| </pre>
| |
| * (4 pts) Make a single slide of a primary research paper using next-generation sequencing (NGS) technologies, show the following
| |
| ** proper citation (authors, title, year, journal, URL)
| |
| ** NGS method (Illumina, PacBio, or NanoPore)
| |
| ** NGS application (genomics, cancer, transcriptome, microbiome, proteome, metagenomics, human variation, etc)
| |
| ** a key figure, with a caption explaining x-axis, y-axis, samples, experiments
| |
| ** raw data table (show first few columns and first few rows)
| |
| ** for example, a student has worked on tissue regeneration, the search in PubMed with key words "regeneration zebra fish transcriptome" found the following primary paper as the best because of the high quality of journal and the availability of raw data: https://www.ncbi.nlm.nih.gov/pubmed/28096348
| |
| |}
| |
| | |
| ===Feb 8, 2019===
| |
| * Introduction to NGS: [[File:Intro-NGS.pdf|thumbnail]]
| |
| * 1-slide presentations on Next-Generation Sequencing Technologies (Group I)
| |
| * R Tutorial, Part 2. Data manipulation with dplyr. Slides: [[File:R-tutorials-2.pdf|thumbnail]]
| |
| {| class="wikitable sortable mw-collapsible"
| |
| ! Assignment 2 (10 pts; Due next class 2/15, in hard copy)
| |
| |-
| |
| |
| |
| * (3 pts) Print a copy of your 2nd R script, with proper annotations
| |
| * (4 pts) Show following commands with the chaining operator ("%>%") for the "iris" data set (4 individual commands; not a single one)
| |
| ** Select columns "Sepal.Length" & "Species"
| |
| ** Filter rows 2 through 10
| |
| ** Add a column "logSepalLength" by taking the logarithm of the said column
| |
| ** Calculate mean and standard deviation of Petal.Length in each species
| |
| * (3 pts) Transform the "iris" data table into a "tidy/tall" table (manually, show first 10 rows, print a hard copy)
| |
| |}
| |
| | |
| ===Feb 15, 2019===
| |
| * NGS presentations (Group II)
| |
| * R Tutorial. Part 3. Data visualization with ggplot2. Slides: [[File:R-tutorials-3.pdf|thumbnail]]
| |
| * No assignment (go over slides and 3 tutorial scripts to prepare for Quiz next week)
| |
| | |
| ===Feb 22, 2019===
| |
| * Quiz 1 (Open Book)
| |
| * R Tutorial: Part 4. BioStat (chi-square & t-test) Lecture slides: [[File:R-tutorial-4.pdf|thumbnail]]
| |
| {| class="wikitable sortable mw-collapsible"
| |
| ! Assignment 3 (10 pts). In-class workshop. Evaluation of papers according to the following rubrics (submit by email)
| |
| |-
| |
| |
| |
| * Citation & PubMed Link
| |
| * Main research question
| |
| * Samples, sample sizes, & controls
| |
| * Omics technologies (e.g., genomics, metagenomics, microbiome, transcriptome, proteome, mythylome, RNA-seq, 16S amplicon sequencing)
| |
| * Sequencing platform (e.g., illumina, PacBio, nanopore)
| |
| * Main computational tools (e.g., R, RStudio, QIMME)
| |
| * Main graphics (e.g., scatterplot, boxplot, heatmap, vocano plot)
| |
| * Main statistical analysis (e.g., t-test, chi-square, regression analysis) | |
| * Data set: a short description & links
| |
| |}
| |
| | |
| ===Feb 29, 2019===
| |
| * Paper evaluation & selection
| |
| * R Tutorial: Part 4. BioStat (regression & ANOVA) [[File:R-tutorial-5.pdf|thumbnail]]
| |
| | |
| ===March 7, 2019=== | |
| * Self study & prepare for mid-term (no class)
| |
| | |
| ===March 14, 2019===
| |
| * Mid-term exam (50 pts). Open Book
| |
| | |
| ===March 22, 2019===
| |
| * R tutorial: Section 5.3. t-test | |
| * Group presentations (Data visualization)
| |
|
| |
| ===March 28, 2019===
| |
| * (Self study; No live class) | |
| * Abstract (200 words; individualized; due 3/30)
| |
| * Review contingency test & two-sample t-test
| |
| * Generate preliminary graphs
| |
| | |
| ===March 30, 2019===
| |
| * 20 pts Quiz on contingency test & two-sample t-test
| |
| * Group presentations (Show preliminary graphs)
| |
| * Material & Methods (due 4/6)
| |
| | |
| ===April 4, 2019===
| |
| * 20 pts Quiz
| |
| * R tutorial: Section 5.4. Regression analysis
| |
| * Results (due 4/13)
| |
| ** Tables to show the dataset you work on (not all, but a sample)
| |
| ** Figures with legend (R methods, x & y-axis, conclusion)
| |
| ** 1-paragraph summary of your results
| |
| | |
| ===April 18, 2019===
| |
| * 20 pts Quiz. Regression analysis
| |
| * Background & Introduction (due 5/4)
| |
| | |
| ===April 25, 2019===
| |
| * Final presentation I. Graded on:
| |
| ** Objective (original & your own)
| |
| ** Material & methods (original & your own)
| |
| ** Results (your own)
| |
| ** Conclusion (your own)
| |
| ** Conclusion (due 5/11)
| |
| | |
| ===May 2, 2019===
| |
| * Self study: Prepare your 10-slide presentation
| |
| * No class (instructor travels)
| |
|
| |
| ===May 16, 2019, 9-1pm===
| |
| * Final presentation
| |
| * May 22, 2018 (Wed, 5pm) Final Report Due (hard copy; n my office or in mailbox)
| |