Year 2020 and EEB BootCamp 2020: Difference between pages

From QiuLab
(Difference between pages)
Jump to navigation Jump to search
imported>Weigang
 
imported>Weigang
 
Line 1: Line 1:
=Covid-19 outbreak=
<center>Bioinformatics Boot Camp for Ecology & Evolution: '''Pathogen Evolutionary Genomics'''</center>
[http://cov.borreliabase.org/bb-cov/ a phylo-genome browser]
<center>Thursday, Aug 6, 2020, 2 - 3:30pm</center>
<center>'''Instructors:''' Dr Weigang Qiu & Ms Saymon Akther</center>
<center>'''Email:''' weigang@genectr.hunter.cuny.edu</center>
<center>'''Lab Website:''' http://diverge.hunter.cuny.edu/labwiki/</center>
<center>
{| class="wikitable"
|-
! Lyme Disease (Borreliella) !! CoV Genome Tracker !! Coronavirus evolutuon
|-
| [[File:Lp54-gain-loss.png|300px|thumbnail| Gains & losses of host-defense genes among Lyme pathogen genomes (Qiu & Martin 2014)]] ||
[[File:Cov-screenshot-1.png|300px|thumbnail| [http://cov.genometracker.org/ Haplotype network] ]]
||
[[File:Cov-screenshot-2.png|300px|thumbnail| Spike protein alignment ]]
|}
</center>
----


=Algorithm & tools for Bb plasmid nomenclature=
==Case studies from Qiu Lab==
# Reference: email changes with Sherwood on Feb 27-28, 2020
* [http://borreliabase.org Comparative genomics of worldwide Lyme disease pathogens]
# To reduce the amount of manual curation/judgement calls,  it seems we need to automate the plasmid call using the following algorithms (which should work for the majority of cases):
* [http://cov.genometracker.org Covid-19 Genome Tracker]
## Identify PFam32 genes using BLAST or HMMER
 
## Build a NJ tree with sequences from a PFam32 database
==CoV genome data set==
## Calculate some kind of group consistency score at each clade level
* N=565 SARS-CoV-2 genomes collected during January & February 2020. Data source & acknowledgement [http://gisaid.org GIDAID] (<em>Warning: You need to acknowledge GISAID if you reuse the data in any publication</em>)
## Identify presence/absence of a cluster with other 3 partition genes
* Download file: [http://diverge.hunter.cuny.edu/~weigang/qiu-akther.tar.gz data file]
## Assign plasmid names (single names for most, a few composite names)
* Create a directory, unzip, & un-tar
# Modification: species-tree/gene-tree reconciliation: Your nicely illustrated tree reminds me that a more rational/formal (than %diff cutoff) way for delineating orthologous (same name) & paralogous (different names) PFam32 groups should be the so-called “species tree / gene tree reconciliation” algorithm. This algorithm would identify each branch on the gene tree (i.e., your tree) as either due to “duplication” (creating paralogs, long branches) or “speciation” (creating orthologs, short branches). Then assign new plasmid names to each major/ancestral duplication branch (not counting recent duplications within a single species or strain). By this algorithm, the lp56 group (node pointed by green lines) is valid, since there is no genome appearing more than once among its descendants. By this algorithm, all nodes indicated by blue lines are all valid, regardless level of sequence difference (e.g., VA1, cp26). By this algorithm, we made an overcall on lp28-9 and lp28-1 (orange node), which should be a single paralogous group, since there is no genome as multiple descendants. The somewhat deep divergence simply suggests fast evolution.
<syntaxhighlight lang='bash'>
mkdir QiuAkther
mv cov-camp.tar.gz QiuAkther/
cd QiuAkther
tar -tzf cov-camp.tar.gz # view files
tar -xzf cov-camp.tar.gz # un-zip & un-tar
</syntaxhighlight>
* View files
<syntaxhighlight lang='bash'>
file TCS.jar
ls -lrt # long list, in reverse timeline
less Jan-Feb.mafft # an alignment of 565 CoV2 genomes in FASTA format; "q" to quit
less cov-565strains-617snvs.phy # non-gapped SNV alignment in PHYLIP format
wc hap.txt # geographic origins
head hap.txt
wc group.txt # color assignment
cat group.txt
less cov-565strains.gml # graph file (output)
</syntaxhighlight>
 
==Bioinformatics Tools & Learning Goals==
* BpWrapper: commandline tools for sequence, alignment, and tree manipulations (based on BioPerl).
** [https://github.com/bioperl/p5-bpwrapper Github Link]
** [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2074-9/figures/1 Flowchart from publication]
* Haplotype network with TCS [https://pubmed.ncbi.nlm.nih.gov/11050560/ PubMed link]
* Web-interactive visualization with [http://D3js.org D3js]
** [https://github.com/sairum/tcsBU Github link]
** [https://cibio.up.pt/software/tcsBU/index.html Web tool]
** [https://academic.oup.com/bioinformatics/article/32/4/627/1744448 Paper]
 
==Tutorial==
* 2-2:30: Introduction on pathogen phylogenomics
* 2:30-2:45: Demo: sequence manipulation with BpWrapper
<syntaxhighlight lang='bash'>
bioseq --man
bioseq -n Jan-Feb.mafft
bioaln --man
bioaln -n -i'fasta' Jan-Feb.mafft
bioaln -l -i'fasta' Jan-Feb.mafft
bioaln -n -i'phylip' cov-565strains-617snvs.phy
bioaln -l -i'phylip' cov-565strains-617snvs.phy
FastTree -nt cov-565strains-617snvs.phy > cov.dnd
biotree --man
biotree -n cov.dnd
biotree -l cov.dnd
<syntaxhighlight>
* 2:45-3:00: build haplotype network with TCS
<syntaxhighlight lang='bash'>
java -jar -Xmx1g TCS.jar
<syntaxhighlight>
* 3:00-3:15: interactive visualization with BuTCS
* 3:15-3:30: Q & A

Revision as of 07:23, 26 July 2020

Bioinformatics Boot Camp for Ecology & Evolution: Pathogen Evolutionary Genomics
Thursday, Aug 6, 2020, 2 - 3:30pm
Instructors: Dr Weigang Qiu & Ms Saymon Akther
Email: weigang@genectr.hunter.cuny.edu
Lab Website: http://diverge.hunter.cuny.edu/labwiki/
Lyme Disease (Borreliella) CoV Genome Tracker Coronavirus evolutuon
Gains & losses of host-defense genes among Lyme pathogen genomes (Qiu & Martin 2014)
Spike protein alignment

Case studies from Qiu Lab

CoV genome data set

  • N=565 SARS-CoV-2 genomes collected during January & February 2020. Data source & acknowledgement GIDAID (Warning: You need to acknowledge GISAID if you reuse the data in any publication)
  • Download file: data file
  • Create a directory, unzip, & un-tar
mkdir QiuAkther
mv cov-camp.tar.gz QiuAkther/
cd QiuAkther
tar -tzf cov-camp.tar.gz # view files
tar -xzf cov-camp.tar.gz # un-zip & un-tar
  • View files
file TCS.jar
ls -lrt # long list, in reverse timeline
less Jan-Feb.mafft # an alignment of 565 CoV2 genomes in FASTA format; "q" to quit
less cov-565strains-617snvs.phy # non-gapped SNV alignment in PHYLIP format
wc hap.txt # geographic origins
head hap.txt
wc group.txt # color assignment
cat group.txt
less cov-565strains.gml # graph file (output)

Bioinformatics Tools & Learning Goals

Tutorial

  • 2-2:30: Introduction on pathogen phylogenomics
  • 2:30-2:45: Demo: sequence manipulation with BpWrapper

<syntaxhighlight lang='bash'> bioseq --man bioseq -n Jan-Feb.mafft bioaln --man bioaln -n -i'fasta' Jan-Feb.mafft bioaln -l -i'fasta' Jan-Feb.mafft bioaln -n -i'phylip' cov-565strains-617snvs.phy bioaln -l -i'phylip' cov-565strains-617snvs.phy FastTree -nt cov-565strains-617snvs.phy > cov.dnd biotree --man biotree -n cov.dnd biotree -l cov.dnd <syntaxhighlight>

  • 2:45-3:00: build haplotype network with TCS

<syntaxhighlight lang='bash'> java -jar -Xmx1g TCS.jar <syntaxhighlight>

  • 3:00-3:15: interactive visualization with BuTCS
  • 3:15-3:30: Q & A