EEB BootCamp 2020

Bioinformatics Boot Camp for Ecology & Evolution: Genomic Epidemiology Thursday, Aug 6, 2020, 2 - 3:30pm Instructors: Dr Weigang Qiu & Ms Saymon Akther Email: weigang@genectr.hunter.cuny.edu Lab Website: http://diverge.hunter.cuny.edu/labwiki/

Lyme Disease (Borreliella)	CoV Genome Tracker	Coronavirus evolutuon
Gains & losses of host-defense genes among Lyme pathogen genomes (Qiu & Martin 2014)	Haplotype network	Spike protein alignment

Case studies from Qiu Lab

CoV genome data set

N=565 SARS-CoV-2 genomes collected during January & February 2020. Data source & acknowledgement GIDAID (Warning: You need to acknowledge GISAID if you reuse the data in any publication)
Download file: data file
Create a directory, unzip, & un-tar

mkdir QiuAkther
mv qiu-akther.tar.gz QiuAkther/
cd QiuAkther
tar -tzf qiu-akther.tar.gz # view files
tar -xzf qiu-akther.tar.gz # un-zip & un-tar

View files

ls -lrt # long list, in reverse timeline
file TCS.jar # Java application
less ref.gb # genbank file as reference sequence
less Jan-Feb.mafft # an alignment of 565 CoV2 genomes in FASTA format; "q" to quit
less cov-565strains-617snvs.phy # non-gapped SNV alignment in PHYLIP format
wc cov-date.tsv # collection dates
head cov-date.tsv
wc cov-geo.txt # geographic origins
head cov-geo.txt
wc group.txt # color assignment
cat group.txt
less cov-565strains.gml # graph file (output)

Bioinformatics tools for genomic epidemiology

BpWrapper: command-line tools for manipulation of sequences, alignment, and tree (based on BioPerl). Github Link; Flowchart from publication
Pairwise genome alignment with MUMMER: Github link
Multiple alignment with MAFFT: Github link
Extract SNVs with snp-sites: Github link
Haplotype network with TCS PubMed link
Web-interactive visualization with D3js: Github link; Web tool; Paper

Tutorial

2-2:30: Introduction on pathogen phylogenomics
2:30-2:45: Demo: sequence manipulation with BpWrapper

bioseq --man
bioseq -i'genbank' ref.gb > ref.fas
bioseq -n Jan-Feb.mafft
bioaln --man
bioaln -n -i'fasta' Jan-Feb.mafft
bioaln -l -i'fasta' Jan-Feb.mafft
bioaln -n -i'phylip' cov-565strains-617snvs.phy
bioaln -l -i'phylip' cov-565strains-617snvs.phy
FastTree -nt cov-565strains-617snvs.phy > cov.dnd
biotree --man
biotree -n cov.dnd
biotree -l cov.dnd

2:45-3:10: build haplotype network with TCS

# Data pre-processing
# 1. Download genomes & meta data from GISAID
# 2. Run dnadist against a reference genome
man nucmer
dnadiff -h
dnadiff ref.fas <query FASTA>
mkdir fasta-files
cd fasta-files
for f in *.fas; do dnadiff ref.fas $f; done
<to be added: plot in R seq diff vs collection date>
# 3. Remove mis-assembled and reverse-complemented genomes
bioseq -d'file:'
# 4. Remove genomes with more than 10 non-ATCG bases
bioseq -d'ambig:10'
# 5. Run mafft (not run; takes too long)
# 6. Run snp-sites
snp-sites
java -jar -Xmx1g TCS.jar

3:10-3:20: interactive visualization with BuTCS
- Load graph file
- Load group file
- Load haplotype file
3:20-3:30: Q & A

EEB BootCamp 2020

Contents

Case studies from Qiu Lab

CoV genome data set

Bioinformatics tools for genomic epidemiology

Tutorial

Navigation menu

EEB BootCamp 2020

Case studies from Qiu Lab

CoV genome data set

Bioinformatics tools for genomic epidemiology

Tutorial

Navigation menu

Search