BigData 2020 and EEB BootCamp 2020: Difference between pages

From QiuLab
(Difference between pages)
Jump to navigation Jump to search
imported>Weigang
 
imported>Weigang
 
Line 1: Line 1:
<center>[http://bigdata.citytech.cuny.edu/ City Tech/Cornell BioMedical Big Data Week 2020]: '''Pathogen Evolutionary Genomics'''</center>
<center>Bioinformatics Boot Camp for Ecology & Evolution: '''Pathogen Evolutionary Genomics'''</center>
<center>Wed, July 22, 2020, 9 am - 12 noon</center>
<center>Thursday, Aug 6, 2020, 2 - 3:30pm</center>
<center>'''Instructor:''' Dr Weigang Qiu, Professor, Department of Biological Sciences </center>
<center>'''Instructors:''' Dr Weigang Qiu & Ms Saymon Akther</center>
<center>'''Office:''' B402 Belfer Research Building</center>
<center>'''Email:''' weigang@genectr.hunter.cuny.edu</center>
<center>'''Email:''' weigang@genectr.hunter.cuny.edu</center>
<center>'''Lab Website:''' http://diverge.hunter.cuny.edu/labwiki/</center>
<center>'''Lab Website:''' http://diverge.hunter.cuny.edu/labwiki/</center>
Line 17: Line 16:
</center>
</center>
----
----
==What is evolutionary genomics?==
Genomes differ among individuals and species. Evolutionary genomics studies genome variability and genome changes using evolutionary principles. Typical applications in pathogen research include molecular epidemiology (e.g., wildlife origin of SARS-CoV-2 & tracking Covid-19 spread), molecular evolution (e.g., identify key genes and protein sequences contributing to virulence and immune escape), and vaccine design (e.g., influenza vaccine based on latest circulating strains).
Genome changes are studied at two distinct levels: (1) within-species/within-population variations (e.g., genomic changes during Covid-19 pandemic), and (2) between-species divergence (e.g., difference between SARS-CoV-1 and SARS-CoV-2).
The key for analyzing genome variations within species is "population-thinking", the idea that there is no one individual genome that is standard, normal, or "wildtype".
The key for comparing genomes across species is "tree-thinking", the idea that evolution happens by diversification (like a branching tree), not by climbing a ladder. There is no such thing as "advanced" or "primitive" species. All living species have the exact same evolutionary distances/time of divergence since the origin of life.


==Case studies from Qiu Lab==
==Case studies from Qiu Lab==
* [http://borreliabase.org Comparative genomics of worldwide Lyme disease pathogens]
* [http://borreliabase.org Comparative genomics of worldwide Lyme disease pathogens]
* [http://cov.genometracker.org Covid-19 Genome Tracker]  
* [http://cov.genometracker.org Covid-19 Genome Tracker]
 
==Essential bioinformatics skills==
* Linux command-line interface (e.g., BASH shell)
* Familiarity with a programming language (e.g., Python or Perl)
* Data visualization & statistical analysis (e.g., JavaScript; the R statistical computing environment)


==Learning Goals==
==CoV genome data set==
* Be able to compare evolutionary relationships using phylogenetic trees
* N=565 SARS-CoV-2 genomes collected during January & February 2020. Data source & acknowledgement [http://gisaid.org GIDAID] (<em>Warning: You need to acknowledge GISAID if you reuse the data in any publication</em>)
* Be able to use command-line tools for batch-processing of genome files
* Download file: [http://diverge.hunter.cuny.edu/~weigang/qiu-akther.tar.gz data file]
* Be able to perform genome-wide association analysis on the R platform
* Create a directory, unzip, & un-tar
<syntaxhighlight lang='bash'>
mkdir QiuAkther
mv cov-camp.tar.gz QiuAkther/
cd QiuAkther
tar -tzf cov-camp.tar.gz # view files
tar -xzf cov-camp.tar.gz # un-zip & un-tar
</syntaxhighlight>
* View files
<syntaxhighlight lang='bash'>
file TCS.jar
ls -lrt # long list, in reverse timeline
less Jan-Feb.mafft # an alignment of 565 CoV2 genomes in FASTA format; "q" to quit
less cov-565strains-617snvs.phy # non-gapped SNV alignment in PHYLIP format
wc hap.txt # geographic origins
head hap.txt
wc group.txt # color assignment
cat group.txt
less cov-565strains.gml # graph file (output)
</syntaxhighlight>


==Schedule==
==Bioinformatics Tools & Learning Goals==
* 9:00  -  9:25: Introduction; [http://rstudio.org Install R & R Studio]; Download fasta file & save as "spike.fasta" : [[File:Spike2.txt|thumbnail]]
* BpWrapper: commandline tools for sequence, alignment, and tree manipulations (based on BioPerl).
* 9:30  - 10:00: Unix Tutorial ([http://korflab.ucdavis.edu/Unix_and_Perl/current.html#part1 Part I. Unix Basics])
** [https://github.com/bioperl/p5-bpwrapper Github Link]
* 10:05 - 10:30: Unix Tutorial ([http://korflab.ucdavis.edu/Unix_and_Perl/current.html#part2 Part II Advanced Unix])
** [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2074-9/figures/1 Flowchart from publication]
* 10:35 - 11:00: Tree-thinking Quizzes: Slides [[File:Big-data-phylogeny.pptx|thumbnail]] & Handouts [[File:Pretest.pdf|thumbnail]]
* Haplotype network with TCS [https://pubmed.ncbi.nlm.nih.gov/11050560/ PubMed link]
* 11:05 - 12: Demo: [[Mini-Tutorals#ospC_amplicon_identification|identification of genomic mutations associated with antibiotic resistance]]
* Web-interactive visualization with [http://D3js.org D3js]
** [https://github.com/sairum/tcsBU Github link]
** [https://cibio.up.pt/software/tcsBU/index.html Web tool]
** [https://academic.oup.com/bioinformatics/article/32/4/627/1744448 Paper]


==Exercises & Challenges==
==Tutorial==
* Finish Tree Thinking Quizzes
* 2-2:30: Introduction on pathogen phylogenomics
* Unix exercises:
* 2:30-2:45: Demo: sequence manipulation with BpWrapper
** count the number of sequences using "grep -v" or "wc"
<syntaxhighlight lang='bash'>
** display the first 5 lines of a file
bioseq --man
** display the last 5 lines of a file
bioseq -n Jan-Feb.mafft
** change upper-cases to lower-cases
bioaln --man
** change "|" to "_"
bioaln -n -i'fasta' Jan-Feb.mafft
** replace strings
bioaln -l -i'fasta' Jan-Feb.mafft
bioaln -n -i'phylip' cov-565strains-617snvs.phy
bioaln -l -i'phylip' cov-565strains-617snvs.phy
FastTree -nt cov-565strains-617snvs.phy > cov.dnd
biotree --man
biotree -n cov.dnd
biotree -l cov.dnd
<syntaxhighlight>
* 2:45-3:00: build haplotype network with TCS
<syntaxhighlight lang='bash'>
java -jar -Xmx1g TCS.jar
<syntaxhighlight>
* 3:00-3:15: interactive visualization with BuTCS
* 3:15-3:30: Q & A

Revision as of 07:23, 26 July 2020

Bioinformatics Boot Camp for Ecology & Evolution: Pathogen Evolutionary Genomics
Thursday, Aug 6, 2020, 2 - 3:30pm
Instructors: Dr Weigang Qiu & Ms Saymon Akther
Email: weigang@genectr.hunter.cuny.edu
Lab Website: http://diverge.hunter.cuny.edu/labwiki/
Lyme Disease (Borreliella) CoV Genome Tracker Coronavirus evolutuon
Gains & losses of host-defense genes among Lyme pathogen genomes (Qiu & Martin 2014)
Spike protein alignment

Case studies from Qiu Lab

CoV genome data set

  • N=565 SARS-CoV-2 genomes collected during January & February 2020. Data source & acknowledgement GIDAID (Warning: You need to acknowledge GISAID if you reuse the data in any publication)
  • Download file: data file
  • Create a directory, unzip, & un-tar
mkdir QiuAkther
mv cov-camp.tar.gz QiuAkther/
cd QiuAkther
tar -tzf cov-camp.tar.gz # view files
tar -xzf cov-camp.tar.gz # un-zip & un-tar
  • View files
file TCS.jar
ls -lrt # long list, in reverse timeline
less Jan-Feb.mafft # an alignment of 565 CoV2 genomes in FASTA format; "q" to quit
less cov-565strains-617snvs.phy # non-gapped SNV alignment in PHYLIP format
wc hap.txt # geographic origins
head hap.txt
wc group.txt # color assignment
cat group.txt
less cov-565strains.gml # graph file (output)

Bioinformatics Tools & Learning Goals

Tutorial

  • 2-2:30: Introduction on pathogen phylogenomics
  • 2:30-2:45: Demo: sequence manipulation with BpWrapper

<syntaxhighlight lang='bash'> bioseq --man bioseq -n Jan-Feb.mafft bioaln --man bioaln -n -i'fasta' Jan-Feb.mafft bioaln -l -i'fasta' Jan-Feb.mafft bioaln -n -i'phylip' cov-565strains-617snvs.phy bioaln -l -i'phylip' cov-565strains-617snvs.phy FastTree -nt cov-565strains-617snvs.phy > cov.dnd biotree --man biotree -n cov.dnd biotree -l cov.dnd <syntaxhighlight>

  • 2:45-3:00: build haplotype network with TCS

<syntaxhighlight lang='bash'> java -jar -Xmx1g TCS.jar <syntaxhighlight>

  • 3:00-3:15: interactive visualization with BuTCS
  • 3:15-3:30: Q & A