Southwest-University: Difference between revisions

Revision as of 08:52, 16 July 2019

Biomedical Genomics July 8-19, 2019 Instructor: Weigang Qiu, Ph.D.
Professor, Department of Biological Sciences, City University of New York, Hunter College & Graduate Center
Adjunct Faculty, Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weil Cornell Medical College Office: B402 Belfer Research Building, 413 East 69th Street, New York, NY 10021, USA Email: weigang@genectr.hunter.cuny.edu Lab Website: http://diverge.hunter.cuny.edu/labwiki/

Host: Shunqin Zhu (祝顺琴), Ph.D.
Associate Professor, School of Life Science, South West University

Figure 1. Gains & losses of host-defense genes among Lyme pathogen genomes (Qiu & Martin 2014)

Course Overview

Welcome to BioMedical Genomics, a computer workshop for advanced undergraduates and graduate students. A genome is the total genetic content of an organism. Driven by breakthroughs such as the decoding of the first human genome and next-generation DNA -sequencing technologies, biomedical sciences are undergoing a rapid and irreversible transformation into a highly data-intensive field.

Genome information is revolutionizing virtually all aspects of life sciences including basic research, medicine, and agriculture. Meanwhile, use of genomic data requires life scientists to be familiar with concepts and skills in biology, computer science, as well as data analysis.

This workshop is designed to introduce computational analysis of genomic data through hands-on computational exercises, using published studies.

The pre-requisites of the course are college-level courses in molecular biology, cell biology, and genetics. Introductory courses in computer programming and statistics are preferred but not strictly required.

Learning goals

By the end of this course successful students will be able to:

Describe next-generation sequencing (NGS) technologies & contrast it with traditional Sanger sequencing
Explain applications of NGS technology including pathogen genomics, cancer genomics, human genomic variation, transcriptomics, meta-genomics, epi-genomics, and microbiome.
Visualize and explore genomics data using RStudio
Replicate key results using a raw data set produced by a primary research paper

Web Links

Install R base: https://cloud.r-project.org
Install R Studio (Desktop version): http://www.rstudio.com/download
Download: R datasets
A reference book: R for Data Science (Wickharm & Grolemund)

Quizzes and Exams

Student performance will be evaluated by attendance, three (4) quizzes and a final report:

Attendance: 50 pts
Assignments: 5 x 10 = 50 pts
Open-book Quizzes: 2 x 25 pts = 50 pts
Take-home Mid-term: 50 pts
Final presentation: 50 pts

Total: 250 pts

Course Schedule

Date & Hour Tutorials & Lectures Assignment Quiz & Exam

July 8 (Mon), 8:40-12:10

Introduction; R Tutorial I;

File:R-part-1-small.pdf

Lecture slides

Assignment #1 (create a WORD document including scripts & graphs (i.e., compile your work into a lab report, due tomorrow)

Install R/R studio and the "tidyverse" package on your own computer
Recreate Script 1 & Mini-Practical
Show help page for function "seq"
Download dataset
- Create a new folder (e.g., Desktop/rtutor)
- Create a sub-folder (e.g., Desktop/rtutor/data/)
- Download from http://www.r4all.org/the-book/datasets
- Save to the sub-folder
- Unzip the file

July 9 (Tu), 8:40-12:10

R Tutorial II,

File:R-part-2.pdf

Lecture slides

Assignment #2

The following is a portion of the dataset of Mycobacterium growth (kindly shared by Aswad from Dr Xie's lab). It shows OD (optical density) values. Transform this table ("wide" format) into the "tall/tidy" format (use paper & pen, no need to use R studio or any computer program):

Hour	Control	Gene	Control.with.Arg	Gene.with.Arg
0	0.06	0.022	0.031	0.01
4	0.087	0.102	0.082	0.081
8	0.113	0.185	0.086	0.135

In R studio, read the dataset from the file "FlowerColourVisits.csv" and save it into an object named as "flower"
- Show head, tail, dimension of the data frame "flower"
- Show data summary with "summary" & "glimpse" commands. Which column is a categorical data type?
- Select the column named "colour"
- Select rows from the 3rd to the 20th
- Select the 3rd, 10th, and 20th rows
- Select only the rows that have the colour of "red" (hint use colour=="red"
- Create a new column, named "logVisit", that is log(1+number.of.visit)
- Sort the "flower" data by the column "number.of.visit"
- Perform the following data transformation using the chaining operator (i.e., "%>%"): Select rows from the 3rd to the 20th, then filter by colour of "red", and then show head
- Obtain the mean number of visit for each colour as a group (Hint: use "group_by" & "summarise")

July 10 (Wed), 8:40-12:10

R Tutorial III

File:R-part-3.pdf

Lecture slides

Assignment #3

Task	Graph
Use the "iris" dataset to reproduce the plot shown at right (Hint: load data with `data(iris)`)
Use the "flower" dataset (see Assignment #2 on how to load data) to reproduce the plot shown at right

Quiz I

July 11 (Thur), 8:40-12:10

Intro to NGS;
File:Intro-and-NGS.pdf
R Tutorial IV;
File:R-part-4.pdf
Lecture slides

Take-home mid-term (50 pts):

File:Mid-term-Chong-Qing.pdf

July 12 - 14 (Fri, Sat & Sun) (Weekend break; No class)

July 15 (Mon), 8:30-12:10

Case Study 1. Fish microbiome

Dataset:
File:Trout.txt
Slides for R Tutorial Part V:
File:R-part-5.pdf
Lecture slides
Slide for trout microbiome:
File:Case-1-Trout-Microbiome.pdf
Lecture slides

R code for today's lecture

# Case Study 1. Trout microbiome
# Date: Monday, July 15, 2019
# Author: Weigang Qiu

library(tidyverse)
# Load data
setwd("C:/Users/lai/Dropbox/Courses/ChongQing-2019/")

trout <- read_csv("Trout.txt")
glimpse(trout)

# Exercise 1. Transform into a long table
trout.long <- gather(trout, 2:29, key = "sample", value = "read.cts") 

# Exercise 2. filter out phyla < 1%
trout.ph.cts <- trout.long %>% group_by(phylum) %>% summarise(phy.sum = sum(read.cts)) # counts in each phylum
trout.ph.perc <- trout.ph.cts %>% mutate(ph.perc = phy.sum/sum(phy.sum) * 100) # get percentages
trout.ph.hi <- trout.ph.perc %>% filter(ph.perc > 1) # select phyla > 1%

# The above could be combined using pipes
# trout.ph.hi <- trout.long %>% group_by(phylum) %>% summarise(cts = sum(read.cts)) %>% mutate(perc = cts/sum(cts) * 100) %>% filter(perc >= 1)

# Exercise 3. get phylum counts in each sample
trout.ph <- trout.long %>% filter(phylum %in% trout.ph.hi$phylum) # select only the hi-frequency phyla
trout.ph <- trout.ph %>% group_by(sample, phylum) %>% summarise(total.cts = sum(read.cts)) # counts in each sample
trout.ph <- trout.ph %>% mutate(per.cts = total.cts/sum(total.cts) * 100) # add perc
trout.ph %>% group_by(sample) %>% summarise(sum(per.cts)) # check

# Exercise 4. plot by sample
ggplot(data = trout.ph, aes(x=sample, y=per.cts, fill=phylum)) + geom_bar(stat = 'identity')

# Exercise 5. group by diet
trout.ph <- trout.ph %>% mutate(diet = str_remove(sample, "_[1234]")) # add a new column "diet" use regular expression
trout.diet <- trout.ph %>% group_by(diet, phylum) %>% summarise(cts.by.diet = sum(total.cts)) %>% mutate(per.by.diet = cts.by.diet/sum(cts.by.diet)*100)
trout.diet %>% group_by(diet) %>% summarise(sum(per.by.diet)) # check
ggplot(data = trout.diet, aes(x=diet, y=per.by.diet, fill=phylum)) + geom_bar(stat = 'identity') # per, stacked

Assignment #4 (finalized at 8:25pm)

Define microbiome
Explain how 16S ribosomal RNA read counts are used to quantify bacterial composition
Run the above code (or your own code) & compose a figure legend explaining the following final graph (explain x-axis, y-axis, and what colors represent). Which diet is most similar to the control diet ("E")?

July 16 (Tu), 8:30-12:10

Case Study 2. Transcriptome

Dataset:
File:Cancer-array.txt

Today's code:

# Calculate alpha diversity (at species level)
trout <- read_csv("Trout.txt")
trout.long <- gather(trout, 2:29, key = "sample", value = "read.cts") # tranform into a long table
trout.long <- trout.long %>% mutate(diet = str_remove(sample, "_[1234]")) # add diet group variable
trout.sp <- trout.long %>% group_by(diet, species) %>% summarise(total.cts = sum(read.cts)) # count species reads in each diet
trout.sp2 <- trout.sp %>% filter(species != 's__') %>% filter(total.cts > 0) # remove species with un-specified & zeros reads
trout.sp2 %>% group_by(diet) %>% count() # count num. species per diet
trout.sp2 <- trout.sp2 %>% group_by(diet) %>% mutate(frq = total.cts/(sum(total.cts))) %>% mutate(log.frq = log2(frq)) # add columns for frequency and its log2()
trout.sp2 %>% group_by(diet) %>% summarise(-sum(frq * log.frq)) # shannon = -sum(p*log2(p)) # Shannon diversity for each diet

# ANOVA to compare two phyla: Fusobacteria and Bacteroidetes
trout <- read_csv("Trout.txt")
trout.long <- gather(trout, 2:29, key = "sample", value = "read.cts") # tranform into a long table
trout.ph <- trout.long %>% group_by(sample, phylum) %>% summarise(cts = sum(read.cts)) # count reads for each phylum in each sample
trout.ph <- trout.ph %>% group_by(sample) %>% mutate(perc = cts/sum(cts) *100) # calculate percentages
trout.fuso <- trout.ph %>% filter(phylum == 'p__Fusobacteria') # select rows for a phylum
trout.fuso <- trout.fuso %>% mutate(diet = str_remove(sample, "_[1234]")) # add diet 
ggplot(data = trout.fuso, aes(x=diet, y=perc, color=diet)) +  geom_point(size=3, alpha=0.5) + theme_bw() # plot
lm.fuso <- lm(data = trout.fuso, perc ~ diet) # run anova model
summary(lm.fuso) # show difference with reference ("dietA") and p values
anova(lm.fuso) # show overall signficance

mean.fuso <- trout.fuso %>% group_by(diet) %>% summarise(mean.prc = mean(perc)) # calculate mean percentages for each diet
ggplot(data = trout.fuso, aes(x=diet, y=perc, color=diet)) +  geom_point(size=3, alpha=0.5) + geom_point(data = mean.fuso, aes(x=diet, y=mean.prc), shape = 10, size=6) + theme_bw() # add mean values

Assignment #5 (finalized @5:00pm)

Run ANOVA to test if the percentages for the phylum "Bacteroidetes" are significantly different among the 7 diets.
Show graph with mean percentages values

July 17 (Wed), 8:30-12:10 Case Study 3. Lyme Disease Quiz II

July 18 (Thur), 8:30-12:10

Final presentations (4 slides, 5 minute)

Slide 1. (1 min; 5 pts). Introduction: background, question, & signficance
Slide 2. (1 min; 10 pts). Material & Methods: sample size, replicates, control, sequencing technology, software tools, statistical analysis
Slide 3. (2 min; 15 pts). Results: a graph: title, legends, caption, main R commands
Slide 4. (1 min; 5 pts). Discussion, conclusion & questions
Slide style (10 pts). Use more figures, less words; show bullets & outlines, not complete sentences
Presentation style (5 pts). Speak loudly, slowly, and clearly. Do not read from slides.

Papers & Datasets

Omics Application	Paper link	Data set	NGS Technology
Microbiome	Rimoldi_etal_2018_PlosOne	S1 Dataset File:Trout.txt	16S rDNA amplicon sequencing
Transcriptome	Wang_etal_2015_Science	Tables S2 & S4	RNA-Seq
Transcriptome & Regulome	Nava_etal_2019_BMCGenomics	Tables S2 & S3	RNA-Seq & CHIP-Seq
Proteome	Qiu_etal_2017_NPJ	(to be posted)	SILAC
Population genomics (Lyme)	Di_etal_2018_JCM	Data & R codes	Amplicon sequencing (antigen locus)
Population genomics/GWAS (Human)	Simonti_etal_2016_Science	Table S2	whole-genome sequencing (WGS); 1000 Genome Project (IGSR)
TB surveillance	Brow_etal_2015	Sequence Archives	Whole-genome sequencing (WGS)

@@ Line 235: / Line 235: @@
 |-
 | TB surveillance || [https://jcm.asm.org/content/53/7/2230 Brow_etal_2015]  || [https://www.ebi.ac.uk/ena/data/view/PRJEB9206 Sequence Archives]|| Whole-genome sequencing (WGS)
-|-
-| Example || Example || Example || Example
-|-
-| Example || Example || Example || Example
-|-
-| Example || Example || Example || Example
 |}

Southwest-University: Difference between revisions

Revision as of 08:52, 16 July 2019

Contents

Course Overview

Learning goals

Web Links

Quizzes and Exams

Course Schedule

Papers & Datasets

Navigation menu

Southwest-University: Difference between revisions

Revision as of 08:52, 16 July 2019

Course Overview

Learning goals

Web Links

Quizzes and Exams

Course Schedule

Papers & Datasets

Navigation menu

Search