Southwest-University and EEB BootCamp 2020: Difference between pages

From QiuLab
(Difference between pages)
Jump to navigation Jump to search
imported>Weigang
 
imported>Weigang
 
Line 1: Line 1:
<center>'''Biomedical Genomics'''</center>
<center>Bioinformatics Boot Camp for Ecology & Evolution: '''Pathogen Evolutionary Genomics'''</center>
<center>July 8-19, 2019</center>
<center>Thursday, Aug 6, 2020, 2 - 3:30pm</center>
<center>'''Instructor:''' Weigang Qiu, Ph.D.<br>Professor, Department of Biological Sciences, City University of New York, Hunter College & Graduate Center<br>Adjunct Faculty, Department of Physiology and Biophysics,
<center>'''Instructors:''' Dr Weigang Qiu & Ms Saymon Akther</center>
Institute for Computational Biomedicine, Weil Cornell Medical College</center>
<center>'''Office:''' B402 Belfer Research Building, 413 East 69th Street, New York, NY 10021, USA</center>
<center>'''Email:''' weigang@genectr.hunter.cuny.edu</center>
<center>'''Email:''' weigang@genectr.hunter.cuny.edu</center>
<center>'''Lab Website:''' http://diverge.hunter.cuny.edu/labwiki/</center>
<center>'''Lab Website:''' http://diverge.hunter.cuny.edu/labwiki/</center>
<br>
<center>
<center>'''Host''': Shunqin Zhu (祝顺琴), Ph.D.<br>Associate Professor, School of  Life Science, South West University</center>
----
[[File:Lp54-gain-loss.png|300px|thumbnail|Figure 1. Gains & losses of host-defense genes among Lyme pathogen genomes ([https://www.ncbi.nlm.nih.gov/pubmed/24704760 Qiu & Martin 2014])]]
==Course Overview==
Welcome to BioMedical Genomics, a computer workshop for advanced undergraduates and graduate students. A genome is the total genetic content of an organism. Driven by breakthroughs such as the decoding of the first human genome and next-generation DNA -sequencing technologies, biomedical sciences are undergoing a rapid and irreversible transformation into a highly data-intensive field.
 
Genome information is revolutionizing virtually all aspects of life sciences including basic research, medicine, and agriculture. Meanwhile, use of genomic data requires life scientists to be familiar with concepts and skills in biology, computer science, as well as data analysis.
 
This workshop is designed to introduce computational analysis of genomic data through hands-on computational exercises, using published studies.
 
The pre-requisites of the course are college-level courses in molecular biology, cell biology, and genetics. Introductory courses in computer programming and statistics are preferred but not strictly required.
 
==Learning goals==
By the end of this course successful students will be able to:
* Describe next-generation sequencing  (NGS) technologies & contrast it with traditional Sanger sequencing
* Explain applications of NGS technology including pathogen genomics, cancer genomics, human genomic variation, transcriptomics, meta-genomics, epi-genomics, and microbiome.
* Visualize and explore genomics data using RStudio
* Replicate key results using a raw data set produced by a primary research paper
 
==Web Links==
* Install R base: https://cloud.r-project.org
* Install R Studio (Desktop version): http://www.rstudio.com/download
* Download: [http://www.r4all.org/books/datasets R datasets]
* A reference book: [https://r4ds.had.co.nz/ R for Data Science (Wickharm & Grolemund)]
 
==Quizzes and Exams==
Student performance will be evaluated by attendance, three (4) quizzes and a final report:
* Attendance: 50 pts
* Assignments: 5 x 10 = 50 pts
* Quizzes: 2 x 25 pts = 50 pts
* Mid-term: 50 pts
* Final presentation: 50 pts
Total: 250 pts
 
==Course Schedule==
{| class="wikitable"
{| class="wikitable"
|-
|-
! Date & Hour !! Tutorials !! Assignment !! Quiz & Exam
! Lyme Disease (Borreliella) !! CoV Genome Tracker !! Coronavirus evolutuon
|-
| July 8 (Mon), 8:40-12:10 || Introduction; R Tutorial I; 
[[File:R-part-1-small.pdf|thumbnail|Lecture slides]]
||
Assignment #1 (create a WORD document including scripts & graphs (i.e., compile your work into a lab report, due tomorrow)
* Install R/R studio and the "tidyverse" package on your own computer
* Recreate Script 1 & Mini-Practical
* Show help page for function "seq"
* Download dataset
** Create a new folder (e.g., Desktop/rtutor)
** Create a sub-folder (e.g., Desktop/rtutor/data/)
** Download from http://www.r4all.org/the-book/datasets
** Save to the sub-folder
** Unzip the file
 
  ||
|-
|-
| July 9 (Tu), 8:40-12:10 || NGS; R Tutorial II ||
| [[File:Lp54-gain-loss.png|300px|thumbnail| Gains & losses of host-defense genes among Lyme pathogen genomes (Qiu & Martin 2014)]] ||
Assignment #2
[[File:Cov-screenshot-1.png|300px|thumbnail| [http://cov.genometracker.org/ Haplotype network] ]]
* List pros & cons of Sanger vs NGS
* Compare accuracy, read length, and error rate between Illumina and PacBio
* Describe sequence information captured with each of the following file formats: FASTA, FASTQ, SAM, VCF
* Wide vs Tall data frames
* Variable names (informative, case sensitive)
* Read file
||  
||  
|-
  [[File:Cov-screenshot-2.png|300px|thumbnail| Spike protein alignment ]]
| July 10 (Wed), 8:40-12:10 || Microbiome I; R Tutorial III ||
Assignment #3
|| Quiz I
|-
| July 11 (Thur), 8:40-12:10 || Microbiome II; R Tutorial IV ||
Assignment #4
||
|-
| July 12 (Fri), 8:40-12:10 ||  || || Mid-term Exam
|-
| Weekend || Break
|-
| July 15 (Mon), 8:00-12:10 || Transcriptome; R Tutorial V ||
Assignment #5
  ||
|-
| July 16 (Tu), 8:00-12:10 || Proteome ||
||
|-
| July 17 (Wed), 8:00-12:10 || Genomics I ||
|| Quiz II
|-
| July 18 (Thur), 8:00-12:10 || Genomics II  || ||
|-
| July 19 (Fri), 8:00-12:10|| Presentations
|}
|}
</center>
----


==Papers & Datasets==
==Case studies from Qiu Lab==
{| class="wikitable sortable"
* [http://borreliabase.org Comparative genomics of worldwide Lyme disease pathogens]
|-
* [http://cov.genometracker.org Covid-19 Genome Tracker]
! Omics Application !! Paper link !! Data set !! NGS Technology
 
|-
==CoV genome data set==
| Microbiome || [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0193652 Rimoldi_etal_2018_PlosOne] || [https://doi.org/10.1371/journal.pone.0193652.s004 S1 Dataset] || 16S rDNA amplicon sequencing
* N=565 SARS-CoV-2 genomes collected during January & February 2020. Data source & acknowledgement [http://gisaid.org GIDAID] (<em>Warning: You need to acknowledge GISAID if you reuse the data in any publication</em>)
|-
* Download file: [http://diverge.hunter.cuny.edu/~weigang/qiu-akther.tar.gz data file]
| Transcriptome || [https://science.sciencemag.org/content/350/6264/1096 Wang_etal_2015_Science] || Tables S2 & S4 || RNA-Seq
* Create a directory, unzip, & un-tar
|-
<syntaxhighlight lang='bash'>
| Transcriptome & Regulome || [https://bmcmedgenomics.biomedcentral.com/articles/10.1186/s12920-019-0477-8 Nava_etal_2019_BMCGenomics] || Tables S2 & S3 || RNA-Seq & CHIP-Seq
mkdir QiuAkther
|-
mv cov-camp.tar.gz QiuAkther/
| Proteome || [https://www.ncbi.nlm.nih.gov/pubmed/28232952 Qiu_etal_2017_NPJ] || (to be posted) || SILAC
cd QiuAkther
|-
tar -tzf cov-camp.tar.gz # view files
| Population genomics (Lyme) || [https://jcm.asm.org/content/56/11/e00940-18.long Di_etal_2018_JCM] || [https://github.com/weigangq/ocseq Data & R codes] || Amplicon sequencing (antigen locus)
tar -xzf cov-camp.tar.gz # un-zip & un-tar
|-
</syntaxhighlight>
| Population genomics/GWAS (Human) || [https://science.sciencemag.org/content/351/6274/737.long Simonti_etal_2016_Science] || [https://science.sciencemag.org/highwire/filestream/673591/field_highwire_adjunct_files/1/aad2149-Simonti-SM.Table.S2.xlsx Table S2] || whole-genome sequencing (WGS); [http://www.internationalgenome.org/ 1000 Genome Project (IGSR)]
* View files
|-
<syntaxhighlight lang='bash'>
| TB surveillance || [https://jcm.asm.org/content/53/7/2230 Brow_etal_2015]  || [https://www.ebi.ac.uk/ena/data/view/PRJEB9206 Sequence Archives]|| Whole-genome sequencing (WGS)
file TCS.jar
|-
ls -lrt # long list, in reverse timeline
| Example || Example || Example || Example
less Jan-Feb.mafft # an alignment of 565 CoV2 genomes in FASTA format; "q" to quit
|-
less cov-565strains-617snvs.phy # non-gapped SNV alignment in PHYLIP format
| Example || Example || Example || Example
wc hap.txt # geographic origins
|-
head hap.txt
| Example || Example || Example || Example
wc group.txt # color assignment
|}
cat group.txt
less cov-565strains.gml # graph file (output)
</syntaxhighlight>
 
==Bioinformatics Tools & Learning Goals==
* BpWrapper: commandline tools for sequence, alignment, and tree manipulations (based on BioPerl).
** [https://github.com/bioperl/p5-bpwrapper Github Link]
** [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2074-9/figures/1 Flowchart from publication]
* Haplotype network with TCS [https://pubmed.ncbi.nlm.nih.gov/11050560/ PubMed link]
* Web-interactive visualization with [http://D3js.org D3js]
** [https://github.com/sairum/tcsBU Github link]
** [https://cibio.up.pt/software/tcsBU/index.html Web tool]
** [https://academic.oup.com/bioinformatics/article/32/4/627/1744448 Paper]
 
==Tutorial==
* 2-2:30: Introduction on pathogen phylogenomics
* 2:30-2:45: Demo: sequence manipulation with BpWrapper
<syntaxhighlight lang='bash'>
bioseq --man
bioseq -n Jan-Feb.mafft
bioaln --man
bioaln -n -i'fasta' Jan-Feb.mafft
bioaln -l -i'fasta' Jan-Feb.mafft
bioaln -n -i'phylip' cov-565strains-617snvs.phy
bioaln -l -i'phylip' cov-565strains-617snvs.phy
FastTree -nt cov-565strains-617snvs.phy > cov.dnd
biotree --man
biotree -n cov.dnd
biotree -l cov.dnd
<syntaxhighlight>
* 2:45-3:00: build haplotype network with TCS
<syntaxhighlight lang='bash'>
java -jar -Xmx1g TCS.jar
<syntaxhighlight>
* 3:00-3:15: interactive visualization with BuTCS
* 3:15-3:30: Q & A

Revision as of 07:23, 26 July 2020

Bioinformatics Boot Camp for Ecology & Evolution: Pathogen Evolutionary Genomics
Thursday, Aug 6, 2020, 2 - 3:30pm
Instructors: Dr Weigang Qiu & Ms Saymon Akther
Email: weigang@genectr.hunter.cuny.edu
Lab Website: http://diverge.hunter.cuny.edu/labwiki/
Lyme Disease (Borreliella) CoV Genome Tracker Coronavirus evolutuon
Gains & losses of host-defense genes among Lyme pathogen genomes (Qiu & Martin 2014)
Spike protein alignment

Case studies from Qiu Lab

CoV genome data set

  • N=565 SARS-CoV-2 genomes collected during January & February 2020. Data source & acknowledgement GIDAID (Warning: You need to acknowledge GISAID if you reuse the data in any publication)
  • Download file: data file
  • Create a directory, unzip, & un-tar
mkdir QiuAkther
mv cov-camp.tar.gz QiuAkther/
cd QiuAkther
tar -tzf cov-camp.tar.gz # view files
tar -xzf cov-camp.tar.gz # un-zip & un-tar
  • View files
file TCS.jar
ls -lrt # long list, in reverse timeline
less Jan-Feb.mafft # an alignment of 565 CoV2 genomes in FASTA format; "q" to quit
less cov-565strains-617snvs.phy # non-gapped SNV alignment in PHYLIP format
wc hap.txt # geographic origins
head hap.txt
wc group.txt # color assignment
cat group.txt
less cov-565strains.gml # graph file (output)

Bioinformatics Tools & Learning Goals

Tutorial

  • 2-2:30: Introduction on pathogen phylogenomics
  • 2:30-2:45: Demo: sequence manipulation with BpWrapper

<syntaxhighlight lang='bash'> bioseq --man bioseq -n Jan-Feb.mafft bioaln --man bioaln -n -i'fasta' Jan-Feb.mafft bioaln -l -i'fasta' Jan-Feb.mafft bioaln -n -i'phylip' cov-565strains-617snvs.phy bioaln -l -i'phylip' cov-565strains-617snvs.phy FastTree -nt cov-565strains-617snvs.phy > cov.dnd biotree --man biotree -n cov.dnd biotree -l cov.dnd <syntaxhighlight>

  • 2:45-3:00: build haplotype network with TCS

<syntaxhighlight lang='bash'> java -jar -Xmx1g TCS.jar <syntaxhighlight>

  • 3:00-3:15: interactive visualization with BuTCS
  • 3:15-3:30: Q & A