Southwest-University and EEB BootCamp 2020: Difference between pages

From QiuLab
(Difference between pages)
Jump to navigation Jump to search
imported>Weigang
 
imported>Weigang
 
Line 1: Line 1:
<center>'''Biomedical Genomics'''</center>
<center>Bioinformatics Boot Camp for Ecology & Evolution: '''Pathogen Evolutionary Genomics'''</center>
<center>July 8-19, 2019</center>
<center>Thursday, Aug 6, 2020, 2 - 3:30pm</center>
<center>'''Instructor:''' Weigang Qiu, Ph.D.<br>Professor, Department of Biological Sciences, City University of New York, Hunter College & Graduate Center<br>Adjunct Faculty, Department of Physiology and Biophysics
<center>'''Instructors:''' Dr Weigang Qiu & Ms Saymon Akther</center>
Institute for Computational Biomedicine, Weil Cornell Medical College</center>
<center>'''Office:''' B402 Belfer Research Building, 413 East 69th Street, New York, NY 10021, USA</center>
<center>'''Email:''' weigang@genectr.hunter.cuny.edu</center>
<center>'''Email:''' weigang@genectr.hunter.cuny.edu</center>
<center>'''Lab Website:''' http://diverge.hunter.cuny.edu/labwiki/</center>
<center>'''Lab Website:''' http://diverge.hunter.cuny.edu/labwiki/</center>
<center>
{| class="wikitable"
|-
! Lyme Disease (Borreliella) !! CoV Genome Tracker !! Coronavirus evolutuon
|-
| [[File:Lp54-gain-loss.png|300px|thumbnail| Gains & losses of host-defense genes among Lyme pathogen genomes (Qiu & Martin 2014)]] ||
[[File:Cov-screenshot-1.png|300px|thumbnail| [http://cov.genometracker.org/ Haplotype network] ]]
||
[[File:Cov-screenshot-2.png|300px|thumbnail| Spike protein alignment ]]
|}
</center>
----
----
[[File:Lp54-gain-loss.png|200px|thumbnail|Figure 1. Gains & losses of host-defense genes among Lyme pathogen genomes (Qiu & Martin 2014)]]
==Course Overview==
Welcome to BioMedical Genomics, a computer workshop for advanced undergraduates and graduate students. A genome is the total genetic content of an organism. Driven by breakthroughs such as the decoding of the first human genome and next-generation DNA -sequencing technologies, biomedical sciences are undergoing a rapid and profound transformation into a highly data-intensive field.
Genome information is revolutionizing virtually all aspects of life sciences including basic basic, medicine, and agriculture. Meanwhile, use of genomic data requires life scientists to be familiar with concepts and skills in biology, computer science, as well as data analysis.
This workshop is designed to introduce computational analysis of genomic data through hands-on computational exercises.
The pre-requisites of the course includes college-level courses in molecular biology, cell biology, and genetics. Introductory courses in computer programming and statistics are preferred but not strictly required.


==Learning goals==
==Case studies from Qiu Lab==
By the end of this course successful students will be able to:
* [http://borreliabase.org Comparative genomics of worldwide Lyme disease pathogens]
* Describe next-generation sequencing  (NGS) technologies & contrast it with traditional Sanger sequencing
* [http://cov.genometracker.org Covid-19 Genome Tracker]
* Explain applications of NGS technology including pathogen genomics, cancer genomics, human genomic variation, transcriptomics, meta-genomics, epi-genomics, and microbiome, and single-cell genomics
* Visualize and explore genomics data using RStudio
* Replicate key results using a data set associated with a primary research paper


==Useful links==
==CoV genome data set==
* Install R and R Studio
* N=565 SARS-CoV-2 genomes collected during January & February 2020. Data source & acknowledgement [http://gisaid.org GIDAID] (<em>Warning: You need to acknowledge GISAID if you reuse the data in any publication</em>)
* Unix Tutorial
* Download file: [http://diverge.hunter.cuny.edu/~weigang/qiu-akther.tar.gz data file]
* Textbook
* Create a directory, unzip, & un-tar
<syntaxhighlight lang='bash'>
mkdir QiuAkther
mv cov-camp.tar.gz QiuAkther/
cd QiuAkther
tar -tzf cov-camp.tar.gz # view files
tar -xzf cov-camp.tar.gz # un-zip & un-tar
</syntaxhighlight>
* View files
<syntaxhighlight lang='bash'>
file TCS.jar
ls -lrt # long list, in reverse timeline
less Jan-Feb.mafft # an alignment of 565 CoV2 genomes in FASTA format; "q" to quit
less cov-565strains-617snvs.phy # non-gapped SNV alignment in PHYLIP format
wc hap.txt # geographic origins
head hap.txt
wc group.txt # color assignment
cat group.txt
less cov-565strains.gml # graph file (output)
</syntaxhighlight>


==Quizzes and Exams==
==Bioinformatics Tools & Learning Goals==
Student performance will be evaluated by attendance, three (4) quizzes and a final report:
* BpWrapper: commandline tools for sequence, alignment, and tree manipulations (based on BioPerl).
* Attendance: 50 pts
** [https://github.com/bioperl/p5-bpwrapper Github Link]
* Quizzes: 4 x 25 pts = 100 pts
** [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2074-9/figures/1 Flowchart from publication]
* Final report: 50 pts
* Haplotype network with TCS [https://pubmed.ncbi.nlm.nih.gov/11050560/ PubMed link]
Total: 200 pts
* Web-interactive visualization with [http://D3js.org D3js]
** [https://github.com/sairum/tcsBU Github link]
** [https://cibio.up.pt/software/tcsBU/index.html Web tool]
** [https://academic.oup.com/bioinformatics/article/32/4/627/1744448 Paper]


==Course Schedule==
==Tutorial==
* July 8 (Mon), 8:40-12:10
* 2-2:30: Introduction on pathogen phylogenomics
* July 9 (Tu), 8:40-12:10
* 2:30-2:45: Demo: sequence manipulation with BpWrapper
* July 10 (Wed), 8:40-12:10
<syntaxhighlight lang='bash'>
* July 11 (Thur), 8:40-12:10
bioseq --man
* July 12 (Fri), 8:40-12:10
bioseq -n Jan-Feb.mafft
* July 15 (Mon), 8:00-12:10
bioaln --man
* July 16 (Tu), 8:00-12:10
bioaln -n -i'fasta' Jan-Feb.mafft
* July 17 (Wed), 8:00-12:10
bioaln -l -i'fasta' Jan-Feb.mafft
* July 18 (Thur), 8:00-12:10
bioaln -n -i'phylip' cov-565strains-617snvs.phy
* July 19 (Fri), 8:00-12:10
bioaln -l -i'phylip' cov-565strains-617snvs.phy
 
FastTree -nt cov-565strains-617snvs.phy > cov.dnd
==Papers & Data==
biotree --man
{| class="wikitable sortable"
biotree -n cov.dnd
|-
biotree -l cov.dnd
! Omics Application !! Paper link !! Data set !! NGS Technology
<syntaxhighlight>
|-
* 2:45-3:00: build haplotype network with TCS
| Microbiome || [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0193652 Rimoldi_etal_2018_PlosOne] || [https://doi.org/10.1371/journal.pone.0193652.s004 S1 Dataset] || 16S rDNA amplicon sequencing
<syntaxhighlight lang='bash'>
|-
java -jar -Xmx1g TCS.jar
| Transcriptome || [https://science.sciencemag.org/content/350/6264/1096 Wang_etal_2015_Science] || Tables S2 & S4 || RNA-Seq
<syntaxhighlight>
|-
* 3:00-3:15: interactive visualization with BuTCS
| Transcriptome & Regulome || [https://bmcmedgenomics.biomedcentral.com/articles/10.1186/s12920-019-0477-8 Nava_etal_2019_BMCGenomics] || Tables S2 & S3 || RNA-Seq/CHIP-Seq
* 3:15-3:30: Q & A
|-
| Example || Example || Example || Example
|-
| Example || Example || Example || Example
|-
| Example || Example || Example || Example
|-
| Example || Example || Example || Example
|-
| Example || Example || Example || Example
|-
| Example || Example || Example || Example
|-
| Example || Example || Example || Example
|}

Revision as of 07:23, 26 July 2020

Bioinformatics Boot Camp for Ecology & Evolution: Pathogen Evolutionary Genomics
Thursday, Aug 6, 2020, 2 - 3:30pm
Instructors: Dr Weigang Qiu & Ms Saymon Akther
Email: weigang@genectr.hunter.cuny.edu
Lab Website: http://diverge.hunter.cuny.edu/labwiki/
Lyme Disease (Borreliella) CoV Genome Tracker Coronavirus evolutuon
Gains & losses of host-defense genes among Lyme pathogen genomes (Qiu & Martin 2014)
Spike protein alignment

Case studies from Qiu Lab

CoV genome data set

  • N=565 SARS-CoV-2 genomes collected during January & February 2020. Data source & acknowledgement GIDAID (Warning: You need to acknowledge GISAID if you reuse the data in any publication)
  • Download file: data file
  • Create a directory, unzip, & un-tar
mkdir QiuAkther
mv cov-camp.tar.gz QiuAkther/
cd QiuAkther
tar -tzf cov-camp.tar.gz # view files
tar -xzf cov-camp.tar.gz # un-zip & un-tar
  • View files
file TCS.jar
ls -lrt # long list, in reverse timeline
less Jan-Feb.mafft # an alignment of 565 CoV2 genomes in FASTA format; "q" to quit
less cov-565strains-617snvs.phy # non-gapped SNV alignment in PHYLIP format
wc hap.txt # geographic origins
head hap.txt
wc group.txt # color assignment
cat group.txt
less cov-565strains.gml # graph file (output)

Bioinformatics Tools & Learning Goals

Tutorial

  • 2-2:30: Introduction on pathogen phylogenomics
  • 2:30-2:45: Demo: sequence manipulation with BpWrapper

<syntaxhighlight lang='bash'> bioseq --man bioseq -n Jan-Feb.mafft bioaln --man bioaln -n -i'fasta' Jan-Feb.mafft bioaln -l -i'fasta' Jan-Feb.mafft bioaln -n -i'phylip' cov-565strains-617snvs.phy bioaln -l -i'phylip' cov-565strains-617snvs.phy FastTree -nt cov-565strains-617snvs.phy > cov.dnd biotree --man biotree -n cov.dnd biotree -l cov.dnd <syntaxhighlight>

  • 2:45-3:00: build haplotype network with TCS

<syntaxhighlight lang='bash'> java -jar -Xmx1g TCS.jar <syntaxhighlight>

  • 3:00-3:15: interactive visualization with BuTCS
  • 3:15-3:30: Q & A