EEB BootCamp 2020: Difference between revisions

From QiuLab
Jump to navigation Jump to search
imported>Lab
mNo edit summary
imported>Lab
mNo edit summary
Line 28: Line 28:
* N=100 SARS-CoV-2 genomes collected during January, February & March 2020. Data source & acknowledgement [http://gisaid.org GIDAID] (<em>Warning: You need to acknowledge GISAID if you reuse the data in any publication</em>)
* N=100 SARS-CoV-2 genomes collected during January, February & March 2020. Data source & acknowledgement [http://gisaid.org GIDAID] (<em>Warning: You need to acknowledge GISAID if you reuse the data in any publication</em>)
* Download the folder "bootcamp_august_6th_2020": [http://diverge.hunter.cuny.edu/~weigang/qiu-akther.tar.gz data file]
* Download the folder "bootcamp_august_6th_2020": [http://diverge.hunter.cuny.edu/~weigang/qiu-akther.tar.gz data file]
* Create a directory, unzip, & un-tar
* unzip the folder
<syntaxhighlight lang='bash'>
<syntaxhighlight lang='bash'>
unizip bootcamp_august_6th_2020.zip
cd QiuAkther
ls -l # view files
tar -tzf qiu-akther.tar.gz # view files
tar -xzf qiu-akther.tar.gz # un-zip & un-tar
tar -xzf qiu-akther.tar.gz # un-zip & un-tar
</syntaxhighlight>
</syntaxhighlight>

Revision as of 03:13, 4 August 2020

Bioinformatics Boot Camp for Ecology & Evolution: Genomic Epidemiology
Thursday, Aug 6, 2020, 2 - 3:30pm
Instructors: Dr Weigang Qiu & Ms Saymon Akther
Email: weigang@genectr.hunter.cuny.edu
Lab Website: http://diverge.hunter.cuny.edu/labwiki/
CoV Genome Tracker Coronavirus evolutuon Lyme Disease (Borreliella)
Spike protein alignment
Gains & losses of host-defense genes among Lyme pathogen genomes (Qiu & Martin 2014)

Case studies

CoV genome data set

  • N=100 SARS-CoV-2 genomes collected during January, February & March 2020. Data source & acknowledgement GIDAID (Warning: You need to acknowledge GISAID if you reuse the data in any publication)
  • Download the folder "bootcamp_august_6th_2020": data file
  • unzip the folder
unizip bootcamp_august_6th_2020.zip
ls -l # view files
tar -xzf qiu-akther.tar.gz # un-zip & un-tar
  • View files
ls -lrt # long list, in reverse timeline
file TCS.jar # Java application
less ref.gb # genbank file as reference sequence
less Jan-Feb.mafft # an alignment of 565 CoV2 genomes in FASTA format; "q" to quit
less cov-565strains-617snvs.phy # non-gapped SNV alignment in PHYLIP format
wc cov-date.tsv # collection dates
head cov-date.tsv
wc cov-geo.txt # geographic origins
head cov-geo.txt
wc group.txt # color assignment
cat group.txt
less cov-565strains.gml # graph file (output)

Bioinformatics tools for genomic epidemiology

Tutorial

  • 2-2:30: Introduction on pathogen phylogenomics
  • 2:30-2:45: Demo: sequence manipulation with BpWrapper
bioseq --man
bioseq -i'genbank' ref.gb > ref.fas
bioseq -n Jan-Feb.mafft
bioaln --man
bioaln -n -i'fasta' Jan-Feb.mafft
bioaln -l -i'fasta' Jan-Feb.mafft
bioaln -n -i'phylip' cov-565strains-617snvs.phy
bioaln -l -i'phylip' cov-565strains-617snvs.phy
FastTree -nt cov-565strains-617snvs.phy > cov.dnd
biotree --man
biotree -n cov.dnd
biotree -l cov.dnd
  • 2:45-3:10: build haplotype network with TCS
# Data pre-processing
# 1. Download genomes & meta data from GISAID
# 2. Run dnadist against a reference genome
man nucmer
dnadiff -h
dnadiff ref.fas <query FASTA>
mkdir fasta-files
cd fasta-files
for f in *.fas; do dnadiff ref.fas $f; done
<to be added: plot in R seq diff vs collection date>
# 3. Remove mis-assembled and reverse-complemented genomes
bioseq -d'file:'
# 4. Remove genomes with more than 10 non-ATCG bases
bioseq -d'ambig:10'
# 5. Run mafft (not run; takes too long)
# 6. Run snp-sites
snp-sites
java -jar -Xmx1g TCS.jar
  • 3:10-3:20: interactive visualization with BuTCS
    • Load graph file
    • Load group file
    • Load haplotype file
  • 3:20-3:30: Q & A