Revision as of 18:24, 1 February 2021

Spring 2021

Participants

Project 1. Covid mutation analysis

# Shared by Afsana
# UNIX COMMANDS W COVID-19 GENOME MUTATION SET (CREATE COUNT OF SYNONYMOUS & NON-SYNONYMOUS MUTATIONS FOR EACH GENE)
grep synonymous cov-snps.tsv | cut -f3 | sort | uniq -c
grep synonymous cov-snps.tsv | cut -f3 | sort | uniq -c > count_synoymous_covid
grep missense  cov-snps.tsv | cut -f3 | sort | uniq -c > count_missense_covid
paste count_missense_covid count_synoymous_covid 
paste count_missense_covid count_synoymous_covid | tr -s ' ' 
paste -d ' '  count_missense_covid count_synoymous_covid | tr -s ' '
paste -d ' '  count_missense_covid count_synoymous_covid | tr -s ' ' | sed ' '
paste -d ' '  count_missense_ccovid count_synoymous_covid | tr -s ' ' | sed "s/ //"
paste -d ' '  count_missense_covid count_synoymous_covid | tr -s ' ' | sed "s/ //" | cut -f 1-3 -d ' '
paste -d ' '  count_missense_covid count_synoymous_covid | tr -s ' ' | sed "s/ //" | cut -f 1-3 -d ' ' > count_mutations_covid.tsv
paste -d ' '  count_missense_covid count_synoymous_covid | tr -s ' ' | sed "s/ //" | cut -f 1-3 -d ' '

# Shared by Roman
# compare syn vs nonsyn
library(tidyverse)
library(ggrepel)
ggplot(data=covid, aes(x=Missense,y=Synonymous, label=Gene)) + geom_point() + geom_text_repel() + geom_smooth(method="lm")

Fall 2020

Participants

Eamen Ho: Volunteer research assistant
Ramandeep Singh: BIOL 48002
Desiree Pante: BIOL 48001
Afsana Rahman: Volunteer research assistant
Roman Shimonov: BIOL 48002
Justin Hiraldo: BIOL 48002
Zaheen Hossain: Volunteer research assistant
Jerry Sebastian: Volunteer research assistant
Ariel Cebelinski: Volunteer research assistant

Schedule

Tuesdays at 12 noon - 2pm by Zoom
Sept 1, 2020. Week 1. Meet & Greet; Intro to projects
Sept 8, 2020. Week 2. Presentations (background, data, and methods), based on assigned readings

Project 1. Structure & evolution of multipartite genome of Lyme disease bacteria

Participants: Desiree & Ramon (Summer 2020), Jerry
Readings
- Review: deCenzo & Finan (2017).
Data set: lp54 & cp26 plasmids
TO DO:
- Week 1. 9/8/2020, 12 noon: 5-slides presentation on multipartite bacterial genome evolution (based on the paper above)
- Week 2. 9/15, 12 noon: Use prorgram codonO to calculate codon bias (SCUO) for replicons (n=23) on Borrelia burgdorferi B31 genome
- Week 3. 9/22, 12 noon: codonO paper presentation (Jerry)

Project 2. OspC Cross-reactivity analysis

Participants: Justin, Roman
Readings: Ivanova et al (2009)
Tool: ImageJ
Data set (to be sent)
To Do
- Week 1. 9/8/2020 12 noon: 5-slide presentation on background, material & methods, and data capture using ImageJ
- Week 2. 9/15: Create Excel sheet to capture immunoblot intensities on C3H mice & P.lucus. Capture background for each serum. Getting ready to makes plots in R/Rstudio

Project 3. Clostridium transcriptome analysis

Participants: Eaman, Zaheen
Readings
Data set: posted on "genometracker.org"
- Wild type transcriptome at 12 hour, paired-end read files:
- /home/azureuser/18134XR-29-01_S0_L001_R1_001.fastq.gz
- /home/azureuser/18134XR-29-01_S0_L001_R2_001.fastq.gz
To Do
- Week 1. 9/8/2020 12 noon:
  - A short presentation on C. diff transcriptome (one of the 2 papers above)
  - Demo on read quality using FastQC and mapping reads to reference genomes with bowtie
- Week 2. Use HT-Seq to quantify RNA abundance for C. diff genes.
  - HTSeq installed
  - Try this protocol first
- Commands

According to: reference; Bowtie website

bioseq -i'genbank' R20291.gb > ref.fa # make FASTA file
bowtie2-build ref.fa index # build index
# -S: sam output (otherwise bam) 
bowtie2 -x index -S 18134XR.sam -1 ../18134XR-29-01_S0_L001_R1_001.fastq.gz -2 ../18134XR-29-01_S0_L001_R2_001.fastq.gz
# ref.gff3: need to run sed "s/Chromosome/FN545816/"
# need to use "-i"; default is "gene_id"
conda activate qiulab # change environment to access htseq
htseq-count -m union --stranded=yes 18134XR-29-01.sam ~/xingmin-cdiff/ref.gff3 -i=Parent > 18134XR-29-01.counts
samtools view -b 18134XR-29-01.sam -o 18134XR-29-01.bam # compress sam file into bam file

Project 4. Protein classification using natural language processing

Participants: Afsana & Ariel
Goal: Classify protein sequences
Week 1. 9/8/2020 Readings:
- Rives et al (2019)
- Lan et al (2019)
Week 2. Find/Explore ALBERT resources & Tutorials
Code from Hansaim Lim
Transformer: Pretrained models in natural language processing
DNAbert paper

DNAbert: github code

Including Albert
Google albert library: github

Sample BioPython script:

#!/usr/bin/env python

import sys
import json
from Bio import SeqIO

alnFile = sys.argv[1] # read file as the first argument
seqList = [] # initialize a list
for record in SeqIO.parse(alnFile, "fasta"):
    seqList.append({"id": record.id,
                    "seq": str(record[0:3].seq) # use the str() function to convert object to string
                }) # get residue2 1-3

print(json.dumps(seqList)) # print to JSON format
exit

@@ Line 18: / Line 18: @@
 </syntaxhighlight>
-<syntaxhighlight lang="R">
+<syntaxhighlight lang="bash">
 # Shared by Roman
 # compare syn vs nonsyn

Undergrad Research Experience: Difference between revisions

Revision as of 18:24, 1 February 2021

Contents

Spring 2021

Participants

Project 1. Covid mutation analysis

Fall 2020

Participants

Schedule

Project 1. Structure & evolution of multipartite genome of Lyme disease bacteria

Project 2. OspC Cross-reactivity analysis

Project 3. Clostridium transcriptome analysis

Project 4. Protein classification using natural language processing

Navigation menu

Undergrad Research Experience: Difference between revisions

Revision as of 18:24, 1 February 2021

Spring 2021

Participants

Project 1. Covid mutation analysis

Fall 2020

Participants

Schedule

Project 1. Structure & evolution of multipartite genome of Lyme disease bacteria

Project 2. OspC Cross-reactivity analysis

Project 3. Clostridium transcriptome analysis

Project 4. Protein classification using natural language processing

Navigation menu

Search