Lab Orientation

From EvoBioLabatHunter
Jump to navigation Jump to search

Session 1

  1. Pre-orientation: Obtain lab accounts (Yozen)
  2. Day 1. 12:00 - 12:30. Lab overview
  3. Day 1. 12:30 - 1:30. Unix Part 1 (Weigang);
  4. Day 1. 1:30 - 2:30. Lunch break
  5. Day 1. 2:30 - 3:00. BoreliaBase.org (Lia) Slides: File:BorreliaBase-intro.pptx
  6. Day 1. 3:00 - 4:00. bp-Wrapper (Saymon): tutorials
  7. Day 1. 4:00 - 4:30. Servers & cluster usage (Tutorial )

Session 2

  1. Day 2. 12:00 - 12:30. Phylogenetics/Tree Quizzes (Weigang)
  2. Day 2. 12:30 - 1:30. R. Download data set from http://diverge.hunter.cuny.edu/~weigang/data-sets-for-biostat/intern_data.csv2 & save as "rna_seq.csv"
  3. Day 2. 1:30 - 2:30. Lunch break
  4. Day 2. 2:30 - 3:30. SQL & SQL-embeded Perl or Python
  5. Day 2. 3:30 - 4:30. Unix Part 2
  6. Day 2. 4:30 - 5:00. Lab Databases: bb3-dev, pa2, genome_var (weigang)

Assignments

Assignments. (Q1 & Q2 Due 1pm, Wed, June 1st, 2016; The rest Due Noon, Monday, June 7st, 2016)
  1. Log in lab account (first to "darwin.hunter.cuny.edu", then to "wallace") and change password (email me [weigang@genectr.hunter.cuny.edu] if you have trouble logging in)
  2. Unix exercises: U10.1, U14.1, U16.1, U18.1, U27.1 (with emacs or vi), U29.1 & U29.2 (with emacs or vi)
  3. Borreliabase exercises:
    1. Download B31 genome, ORF, and protein sequences
    2. Download ospA ortholog alignments (nucleotide & protein)
    3. Download pf32 paralog alignments
    4. Use BLAST to identify which gene(s) in the B31 genome contain this DNA sequence: "caagattaatattattgcaatgatattaactttaatttgcacctcatgcgcaccttttagcaaaatcgatcctaaagcaaatgcaaacactaagccaaaaaaaatcaccaatccgggggaaaacacccaaaattttgaagataaatctggagaccttagcacttctgatgaaaaaattatggaaactatcgcttcaga"
    5. Use BLAST to identify which genes(s) in the B31 proteome contain this amino-acid sequence:"MGINSTSFYSLNMKVKPLDNVKVRKALSFAIDRKTLTESVLN"
  4. Use bioseq to answer all the questions below. Submit only the command that you used to find the answers.
    1. Use accession # CP002316.1 to retrieve the genbank file from NCBI. Save the output to CP002316.1.gb file.
    2. Extract the sequences in FASTA format from file CP002316.1.gb and save the output to CP002316.1.fas file. Use the file CP002316.1.fas to answer the following question.
    3. Count the number of sequences in the file?
    4. Using one single command, pick the first 10 sequences from the file and find the length of them. (Hint: use pipe)
    5. Using single command, pick third and seventh sequences from the file and then do the 3-frame translation for both sequences. Which reading frame is correct? Specify.
    6. Using a single command, get the first 100 nucleotides of all the sequences present in the file and then do 1-frame translation for all the sub-sequences. (Hint: look for option in bioseq help page that could be use to get the subsequence and 1-frame translation. Use pipe)
  5. Use bioaln for the following exercises. Go to /home/shared/lab_tutorial and find the sequence alignment file named “ospC.aln”. Name the format of the alignment file. Use it to answer all the questions below. Submit only the command that you used to find the answers.
    1. Find the length of the alignment.
    2. Count the number of the sequences present in the alignment.
    3. How do you convert this alignment in phylip format? Save your output.
    4. Pick “B31, N40, BOL26, JD1” from the alignment and calculate their average percent identity. (Hint: look for option in bioaln help page that could be use to pick specific sequence and calculate average percent identity. Use pipe)
    5. Extract third sites from the alignment and show the alignment in match view. (Hint: look for option in bioaln help page that could be use to extract third site. Use pipe)
    6. Remove the gaps from the alignment and show the final alignment in codon view. (Hint: look for option in bioaln help page that could be use to remove gap)
  6. SQL exercises:
    1. Login the borreliabase.org database by typing:psql -h borreliabase.org -U lab -d genome_var
    2. Please write down your command to retrieve what is listed as below (don’t forget that each command should end with a “;”):
    3. Select all columns in the table “varlist” and show the first 10 rows
    4. From table “varlist”, select values stored in the columns “acc”, “refcodon”, “altcodon”, “protein_accession”
    5. In the “varlist” table, select all columns where “proj_id” value is “1” from and count the selection
    6. Select those whose “conf” value is greater than “90” and arrange your selection in an ascending order
    7. For the values in table “var”, write an expression to output the sum of the values in the “coverage” grouped by the values in column “genome_id”, limited to where “status” are all ‘f’, arrange your selection in an ascending order
    8. From table “genome”, select values in column “genus”; from table “var”, select values in column “var_id”, “status”, “conf”; from table “varlist”, select values in column “acc”, “refaa”, “altaa”, then join your selection together. What columns are the keys when you join the table?
  7. Tree Quizzes File:Pretest.pdf
  8. A scripting exercise: Write a Perl or Python script to export SNPs
  9. An R exercise in statistical analysis: Gene expression analysis using the cancer data