EEB BootCamp 2020: Difference between revisions

From QiuLab
Jump to navigation Jump to search
imported>Weigang
imported>Weigang
Line 46: Line 46:


==Bioinformatics tools for genomic epidemiology==
==Bioinformatics tools for genomic epidemiology==
* BpWrapper: command-line tools for manipulation of sequences, alignment, and tree (based on BioPerl).
* BpWrapper: command-line tools for manipulation of sequences, alignment, and tree (based on BioPerl). [https://github.com/bioperl/p5-bpwrapper Github Link]; [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2074-9/figures/1 Flowchart from publication]
** [https://github.com/bioperl/p5-bpwrapper Github Link]
** [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2074-9/figures/1 Flowchart from publication]
* Pairwise genome alignment with MUMMER: [https://github.com/mummer4/mummer Github link]
* Pairwise genome alignment with MUMMER: [https://github.com/mummer4/mummer Github link]
* Multiple alignment with MAFFT: [https://github.com/GSLBiotech/mafft Github link]
* Multiple alignment with MAFFT: [https://github.com/GSLBiotech/mafft Github link]
* Extract SNVs with snp-sites: [https://github.com/sanger-pathogens/snp-sites Github link]
* Extract SNVs with snp-sites: [https://github.com/sanger-pathogens/snp-sites Github link]
* Haplotype network with TCS [https://pubmed.ncbi.nlm.nih.gov/11050560/ PubMed link]
* Haplotype network with TCS [https://pubmed.ncbi.nlm.nih.gov/11050560/ PubMed link]
* Web-interactive visualization with [http://D3js.org D3js]
* Web-interactive visualization with [http://D3js.org D3js]: [https://github.com/sairum/tcsBU Github link]; [https://cibio.up.pt/software/tcsBU/index.html Web tool]; [https://academic.oup.com/bioinformatics/article/32/4/627/1744448 Paper]
** [https://github.com/sairum/tcsBU Github link]
** [https://cibio.up.pt/software/tcsBU/index.html Web tool]
** [https://academic.oup.com/bioinformatics/article/32/4/627/1744448 Paper]


==Tutorial==
==Tutorial==

Revision as of 07:53, 26 July 2020

Bioinformatics Boot Camp for Ecology & Evolution: Genomic Epidemiology
Thursday, Aug 6, 2020, 2 - 3:30pm
Instructors: Dr Weigang Qiu & Ms Saymon Akther
Email: weigang@genectr.hunter.cuny.edu
Lab Website: http://diverge.hunter.cuny.edu/labwiki/
Lyme Disease (Borreliella) CoV Genome Tracker Coronavirus evolutuon
Gains & losses of host-defense genes among Lyme pathogen genomes (Qiu & Martin 2014)
Spike protein alignment

Case studies from Qiu Lab

CoV genome data set

  • N=565 SARS-CoV-2 genomes collected during January & February 2020. Data source & acknowledgement GIDAID (Warning: You need to acknowledge GISAID if you reuse the data in any publication)
  • Download file: data file
  • Create a directory, unzip, & un-tar
mkdir QiuAkther
mv cov-camp.tar.gz QiuAkther/
cd QiuAkther
tar -tzf cov-camp.tar.gz # view files
tar -xzf cov-camp.tar.gz # un-zip & un-tar
  • View files
file TCS.jar
ls -lrt # long list, in reverse timeline
less Jan-Feb.mafft # an alignment of 565 CoV2 genomes in FASTA format; "q" to quit
less cov-565strains-617snvs.phy # non-gapped SNV alignment in PHYLIP format
wc hap.txt # geographic origins
head hap.txt
wc group.txt # color assignment
cat group.txt
less cov-565strains.gml # graph file (output)

Bioinformatics tools for genomic epidemiology

Tutorial

  • 2-2:30: Introduction on pathogen phylogenomics
  • 2:30-2:45: Demo: sequence manipulation with BpWrapper
bioseq --man
bioseq -n Jan-Feb.mafft
bioaln --man
bioaln -n -i'fasta' Jan-Feb.mafft
bioaln -l -i'fasta' Jan-Feb.mafft
bioaln -n -i'phylip' cov-565strains-617snvs.phy
bioaln -l -i'phylip' cov-565strains-617snvs.phy
FastTree -nt cov-565strains-617snvs.phy > cov.dnd
biotree --man
biotree -n cov.dnd
biotree -l cov.dnd
  • 2:45-3:00: build haplotype network with TCS
# Data pre-processing
# 1. Download genomes & meta data from GISAID
# 2. Run dnadist against a reference genome
(to be added)
# 3. Remove mis-assembled and reverse-complemented genomes
bioseq -d'file:'
# 4. Remove genomes with more than 10 non-ATCG bases
bioseq -d'ambig:10'
# 5. Run mafft (not run; takes too long)
# 6. Run snp-sites
snp-sites
java -jar -Xmx1g TCS.jar
  • 3:00-3:15: interactive visualization with BuTCS
    • Load graph file
    • Load group file
    • Load haplotype file
  • 3:15-3:30: Q & A