BIOL200 2013 and Qiu Lab Meetings: Difference between pages

From QiuLab
(Difference between pages)
Jump to navigation Jump to search
imported>Cmartin
No edit summary
 
imported>Weigang
 
Line 1: Line 1:
'''EXPERIMENT # 4'''
==Projects & Goals==
* Borrelia population genomics: Recombination & Natural Selection (Published)
* Borrelia pan-genomics (Submitted)
* Positive and negative selection in Borrelia ORFs and IGS (In submission)
* Dr Bargonetti's project (Summer 2013)
* A population genomics pipeline using MUGSY-FastTree (Summer 2013): [[Population_Genomics_Course|Project page]]
* Borrelia Genome Database & Browser (Summer 2013) [[media:Web.png|Version 2 screen shot]]
* Pseudomonas population genomics (Summer 2013) [[Pseudomonas_population_genomics|Project page]]
*Hypothesis Testing: Do host-interacting genes show adaptive codon usage? (Summer 2013): [[Borrelia_codon_usage|Project page]]
* Phylogenomics browsing with JavaScript/JQuery, Ajax, and [http://http://www.jsphylosvg.com/ jsPhylosvg]
* Frequency distribution of ospC types in wild tick populations (Fall 2013) [[strain_natural_frequency|Project page]]
----


'''BIOL 200 Cell Biology II LAB, Spring 2013'''
==Lab meeting: June 13, 2013==
* Weigang: IGS paper submission should be done by Thursday.
* Che/Slav: Workshop update (Meeting at 3:30pm?)
* Che: SILAC project (Meeting at 4pm?)
* Zhenmao: Tick processing & paired-end Illumina sequencing
* Pedro: Updates on "ncbi-orf" table
* Girish: phyloSVG extension; QuBi video
* Saymon and Deidre: consensus start-codons
* Reeyes and Raymond: Pseudomonas DB; fleN alignment and phylogeny
* Valentyna: BLASTn results (4:30pm?)


Hunter College of the City University of New York
==Lab meeting: May 23, 2013==
* <font color="red">May 24, Friday: End of School Year Party in the Park (we leave from Hunter @ 1:30pm)</font>
* Recommended reading of the week: [http://www.genetics.org/content/194/1/199.abstract Detecting Neanderthal genes using the D' homoplasy statistic]
* Weigang: IGS paper submission
* Che: Thesis update/SILAC project/Summer teaching
* Zhenmao: Manuscript update: Material & Methods; Results (Tables and Figures)
* Pedro: Catlyst web framework
* Girish: cp26 phylogenomic analysis
* Saymon and Deidre: consensus start-codons
----


==Course information==
==Lab meeting: May 16, 2013==
'''Instructors:''' TBD
* Weigang: IGS paper submitted yet?
 
* Che: Thesis update. Chapter 3. Evolution of ospA/ospB gene family
'''Class Hours:''' Room TBD HN; TBD
* Pedro/Zhenmao: Can we wrap up the BLAST identification of ospC types?
 
* Girish: Fetch cp26 sequences from DB; Run MUGSY & FastTree
'''Office Hours:''' Room 830 HN; Thursdays 2-4pm or by appointment
* Saymon/Deidre: Identification of consensus start-codon positions
 
* Pedro/Girish: orth_get/orth_igs website development. Catalyst. Implement graphics (genome map & phylogeny) query interface
'''Contact information:'''
* Raymond: start the Pseudomonas summer project
* Dr. Weigang Qiu: weigang@genectr.hunter.cuny.edu, 1-212-772-5296
----
 
==Foundational Readings==
 
* Molecular phylogenetics
 
* Population genetics
==Experiment #4==
* Genomics
 
* Systems Biology
 
----
===<span style="color: DodgerBlue;font-weight:bold;font-size:large;">The Tree of Life and Molecular Identification of Microorganisms<span>===
==Informatics Architecture==
 
* Operating Systems: Linux OS/Ubuntu, Mac OS
===Objective===
* Programming languages: BASH, Perl/BioPerl, R
<span style="color: Crimson;font-weight:bold;">To classify microorganisms and determine their relatedness using molecular sequences.</span>
* Relational Databases: PostgreSQL
 
* Software architecture
===LAB REPORT GRADING GUIDE===
** bb2: Borrelia Genome Database
CELL BIO II Experiment #4:
** bb2i: an Perl API for bb2
*'''Introduction'''<span style="font-weight:bold;color:OrangeRed;"> 1 point</span> ''':'''
** DNATweezer: Perl wrappers of most frequently used BioPerl modules, including Bio::Seq, Bio::SimpleAlign, and Bio::Tree [https://sourceforge.net/p/dnatwizzer/home/Home/]
  Statement of objectives or aims of the experiment in the student’s own words.
** SimBac: A Perl/Moose package for simulating bacterial genome evolution [http://sourceforge.net/projects/bacsim/files/]
  (not to be copied from the Lab Manual)
** Borrelia Ortholog Retriever: Download ortholog alignments from 23 Borrelia spp genomes. Search by gene names and IDs.[http://borreliagenome.org/orth_get/]
*'''MATERIALS AND METHODS'''<span style="font-weight:bold;color:OrangeRed;"> 0 points</span> ''':'''
* Hardware Setup
  This should be a brief synopsis and must include any changes or deviations
** NSF File Server
  from the procedures outlined in the Lab Manual. Specify which organisms were
** Database and Application Server
  used to create the phylogram.
** Web Server
*'''RESULTS'''<span style="font-weight:bold;color:OrangeRed;"> 4 points</span> ''':'''
** Linux Workstations
  A print out of the phylogram will suffice.
----
*'''DISCUSSION'''<span style="font-weight:bold;color:OrangeRed;"> 4 points</span> ''':'''
==Perl Challenges==
  Responses to discussion questions.
{| class="wikitable"
*'''SUMMARY |CONCLUSION'''<span style="font-weight:bold;color:OrangeRed;"> 1 point</span> ''':'''
! Problem
  Two sentence summary of your findings.
! Input
*'''REFERENCES'''<span style="font-weight:bold;color:OrangeRed;"> 1 point</span> ''':'''
! Output
  Credit is given for pertinent references obtained from sources other than the Lab Manual.
|-
  This point is in addition to the 10 for the lab report..
| DNA transcription
 
| A DNA sequence, in 5'-3' direction (e.g., aaatttaaaagacaaaaagactgctctaagtcttgaaaatttggttttcaaagatgat)
===INTRODUCTION===
| An RNA sequence, in 5'-3' direction
{| class="collapsible collapsed wikitable"
|-
|- style="background-color:lightsteelblue;"
| Genetic code
! Introduction
| None
|- style="background-color:powderblue;"
| 64 codons, one per line (using loops)
| Evolution can be defined as descent with modification.  In other words, changes in the nucleotide sequence of an organsim’s genomic DNA is inherited by the next generation. According to this, all organisms are related through descent from an ancestor that lived in the distant past. Since that time, about 4 billion years ago, life has undergone an extensive process of change as new kinds of organisms arose from other kinds existing in the past.<br /> The evolutionary history of a group is called a phylogeny, and can be represented by a phylogram (Figure 1).  A major goal of evolutionary analysis is to understand this history.  We do not have direct knowledge of the path of evolution, as by definition, extinct organisms no longer exist. Therefore, phylogeny must be inferred indirectly. Originally, evolutionary analysis was based upon the organisms’ morphology and metabolism.  This is the basis for the Linnaean classification scheme (the “Five Kingdoms” scheme).  However, this method can lead to mistaken relationships. Different species living in the same environment may have similar morphologies in order to deal with specific environmental factors. Thus these similarities have nothing to do with how related the organisms are, but are a direct result of shared surroundings.  However, with the advent of genomics, organisms can be grouped based upon their sequence relatedness.  Since evolution is a process of inherited nucleotide change, analyzing DNA sequence differences allows for the reconstruction of a better phylogenetic history.<br/>
|-
| Random sequence 1
| None
| Generate a random DNA sequence (e.g., 1000 bases) with equal base frequencies
|-
| Random sequence 2
| None
| Generate a random DNA sequence with biased base frequencies, e.g., 10% G, 10% C, 40% T, and 40% A.
|-
|-
|[[File:TreeLife.png|thumb|center|alt= The Tree of Life.|Tree of life based on 16S ribosomal RNA (image credit: NR Pace, Science 1997).]]
| Graphics I
#Display the absolute path of your home directory
| a categorical dataset, e.g., Biology
#List files in your home directory in long format & ordered by their time stamps
| a bar graph & a pie char, using GD::Simple or Postscript::Simple
#List files in the "/data/yoda/b/student.accounts/bio425_2011/" directory from your home directory
#Copy of the file "/data/yoda/b/student.accounts/bio425_2011/data/GBB.seq" into your home directory
#Count the number of lines in the file "GBB.seq"
#Show the first five lines of the file "GBB.seq" & save it to a file with arbitrary name
#Show your last ten commands using "history"
|-style="background-color:powderblue;"
| '''Read''' Chapter 1
|}
 
===February 5===
*'''Chapter 1.''' Central Dogma & Wet Lab Tools [[Media:Molecular_Biology_and_Genomics.pdf|Lecture Slides Ch.1-Che]]
*'''Beginning Perl''' ([[Media:Bio425_beginning_perl.pdf‎|Beginning Perl, Part 1 Slides]])
*'''Homework:''' (this assignment *will* be graded.)
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
! Assignment #2
|- style="background-color:powderblue;"
| '''Before you begin...'''<br />
Do this ONLY ONCE: <pre>echo "source /data/yoda/b/student.accounts/bio425_2011/bio425.profile" >> ~/.bash_profile</pre>
Alternatively, you can open ~/.bash_profile in a text editor (ask me if don't know how) and paste the line: <pre>source /data/yoda/b/student.accounts/bio425_2011/bio425.profile</pre> at the end.
|-style="background-color:powderblue;"
| '''Beginning Perl'''<br />
For the homework, read up to page 221 in Appendix 1. For February 26, read all of Appendix 1.
 
There are '''two choices''' for the homework. The first is recommended for novices. The second is for those who are either comfortable with Perl, or feel the need for a challenge this early. Only complete ONE of these assignments, as I will only accept one. Please follow the guidelines listed [[#Programming Assignment Expectations|above]].
 
# Copy the code from page 221 in a new file. (Remember to put the code from the slides in the beginning of the file and to declare all variables on first use!) You must alter the code so that the resulting program accomplishes the following four tasks:
##Instead of taking the average of 10 numbers, ask the user how many numbers to average and use that number instead. (Hint: see how the code asks for each number). This must be stored in a new variable.
##If the number the user gave was 0 or negative, print a message telling the user so, and exit immediately. You can exit using <pre>exit;</pre>
##The code always prints 'Enter another number:'. Change it so that on the '''first time only''' it instead prints 'Enter a number:'.
##Just before printing the average, print a message saying 'The numbers to average are: '. Then print out out all the numbers the user entered.
#More advanced programmers can try this assignment (you may wish to read all of Appendix 1 now): create a script which can take as input one or more DNA sequences from a file and translate directly to the correct amino acid sequence (single-letter format). You may implement this program in Perl however you wish, with as much complexity as you wish, as long as it meets the guidelines above and satisfies the following four criteria:
##The format of the input file it reads must be: one DNA sequence per line, so that each DNA sequence is separated by a new line character. '''Also assume you are given the coding strand.'''
##The name of the input file cannot be hard coded. You may either ask the user for the file location/name or take it as a command line argument.
##It must tolerate all upper-case, lower-case or mixed-case sequences in the input
##For every input DNA sequence, output the DNA sequence, the equivalent RNA, and the peptide sequence. The output '''must''' be informative, ie:
##:Input: atgcgtcgataa
##:Output: augcgucgauaa
##:Peptide: MRR*
#:Additionally, the program cannot use any outside dependencies/modules such as BioPerl (supposing you know how to use it.) Also note that STOP codons are denoted by a '<nowiki>*</nowiki>'
|-style="background-color:powderblue;"
| '''Problems'''<br />
(pg.31-32): 1.2, 1.3, 1.5,1.9, 1.10, 1.11
|}
 
===February 12===
'''NO CLASS'''
 
(Read Chapter 6 for next class)
 
===February 19===
'''Yozen will not be lecturing'''
 
*Chapter 6. Gene and Genome Structures [Lecture Slides [[Media:Chapter_6.pdf|Lecture Slides Ch.6-Che]]
*'''Tutorial:''' ORF Prediction using GLIMMER
* '''Homework:''' This homework will be graded.
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
! Assignment #3
|- style="background-color:powderblue;"
| '''Bacterial gene identification using Glimmer'''<br />
Remember to first log in to mysql by doing: <pre>ssh mysql</pre>
#Copy the Lyme disease bacterium lp17 plasmid file "/data/yoda/b/student.accounts/bio425_2011/data/lp17.fas" into your home directory.
#Run long-orf, extract, build-icm, and glimmer3.
#Show your commands and "cat" the final output.
#Describe key elements of a prokaryotic gene in addition to the open reading frame.
#Textbook Questions (pg152-153): 6.6, 6.9, 6.15
|-style="background-color:powderblue;"
| '''Read''' All of Appendix 1.
|}
 
===February 26===
*Appendix 1. More PERL ([[Media:Bio425_more_perl.pdf|Lecture Slides]])
*'''Homework'''
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
! Assignment #4
|-style="background-color:powderblue;"
| '''Beginning Perl'''<br />
This time, both novices and experienced programmers do the same homework, with one small difference in the use of the program.
 
Recall from the first class where I introduced the FASTA-format. In this format, sequence data is recorded as follows:
<pre>>SequenceID_info1_info2
atgcgtgatg...</pre>
 
Of course, the ID portion is itself not standardized, and the sequence can also be an amino acid sequence. For simplicity, let's assume that in the ID field, you have a "Strain" name followed by a "protein" name, separated by an underscore (_). You will write a program to read a FASTA file with the ID format described above, and a nucleotide sequence. For both novice-level and experienced level programmers, your program will:
 
# Pick out the strain name, the protein name, and the nucleotide sequence.
# Calculate he length of each sequence.
# Calculate the GC content (in percent) of each sequence.
# Calculate the percent composition of each nucleotide (base composition).
 
'''Novice-level task:'''
 
Your program will just print the above information '''for all sequences''', in a readable form. Sample output could be:
<pre>Strain: B31
Protein: ospA
Seq Length: 819
GC content: 33.58%
Base composition: A 42.98 %, T 23.44 %, C 14.77 %, G 18.80 %</pre>
If your percentages have more than 2 decimal places, '''that's OK.'''
 
'''Experienced-level task:'''
 
The only difference from novices is that your program will '''ask the user for the name of a strain and protein, separated by an underscore''' (ie, B31_opsA). Once given that input, it will print the exact same output as above, but only for the sequence described by that input. If the input doesn't exist, it will say so and exit. Your program will '''continue to ask the user for the sequence ID''' until the user types 'quit' or they give an invalid sequence ID. You can do this by using a while loop.
 
'''Notes'''
 
Calculating the GC content and the base composition is easy if you make use of the tr (transliterate) function as described at the bottom of page 232, and divide the result by the sequence length. GC content is just the sum of total G and C nucleotides, divided by the sequence length. I do want '''percents''', so remember to multiply the results by 100 and to append a '%' at the end.
 
Getting the strain name and the protein name separately can be accomplished with the split() function (check new slides or search on the internet).
 
You will test your program the with the file /data/yoda/b/student.accounts/bio425_2011/data/Borrelia_osp.dna.fasta as input. You don't have to include the file itself with your homework, but I do still want you to copy the program output and submit it with your assignment.
 
Again, the program cannot use any outside dependencies/modules such as BioPerl (supposing you know how to use it.) Besides that, you can implement it however you like. If you know about references, '''it is possible to do this assignment without using them.'''
|}
 
===March 5===
 
*Chapter 2. Data Search and Alignments [[Media:Chapter2.pdf|Lecture Slides Ch.2-Che]]
*Object-Oriented PERL & BioPerl (Link to [http://www.bioperl.org/wiki/Main_Page Bioperl] site and [http://www.bioperl.org/wiki/HOWTOs HOWTOs])
*'''Homework:'''
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
! Assignment #5
|-style="background-color:powderblue;"
| '''BioPerl Assignment'''
 
For this assignment, you will use the .predict file you made with glimmer in [[#February_19 | assignment 3]].
 
If connecting from home: open gedit '''before''' logging on to mysql.
 
For BioPerl to work, you '''must''' log on to mysql.
 
'''Complete the assignment by following these steps.''' Make sure each part works '''before''' trying to solve the next part:
# Make a perl script that reads each line from the .predict file that describes a gene (skip the heading line).
# Save each line ('''hint:''' array, anyone?)
# Now, in the same script, use '''Bio::SeqIO''' to read the lp17.fas file '''and get a Bio::Seq object.'''
# Go through each line saved from the .predict file. Remember: these are predicted orfs:
## For each of these, '''extract the start and stop positions and "strand" values''' (the three values following the orf name).
## If the strand starts with a '-', it means the orf is on the reverse complement, so you need to use the Bio::Seq method "revcom".
## Now, extract the orf sequence using the start & stop values using the Bio::Seq method "subseq", paying special attention to sequences on on the '-' strand.
## Print both the DNA sequence AND the protein sequence.
 
See these sample scripts for how to use revcom and subseq:
<pre>../bio425_2011/sample-perl-scripts/revcom_translate_seq.pl
../bio425_2011/sample-perl-scripts/subseq.pl
</pre>
 
And I linked to the HOWTO above in case you forgot.
 
'''Output should be informative:'''
<pre>
ORF: orf00002
DNA: ...
Protein: ...
</pre>
|-style="background-color:powderblue;"
| '''Read'''
'''For next class, read CH 3'''
|}
 
===March 12===
*Chapter 3. Molecular Evolution [[Media:CH3.pdf|Lecture Slides Ch.3-Che]]
* '''Homework:''' (TBA)
 
===March 19===
*REVIEW Session for MID-TERM EXAMS
<!--*Assignment #7. '''(To be posted)'''
Questions & Problems (pg.54-55): 2.1, 2.2, 2.3, 2.4-->
 
===March 26===
*MID-TERM
<!--*Assignment #8. '''(To be posted)'''
Questions & Problems (pg.75-76): 3.1, 3.2, 3.3 (use first ten codons), 3.4, 3.5, 3.7-->
 
===April 2===
*'''Chapter 4.''' Phylogenetics I. Distance Methods  [[Media:CH4.pdf|Lecture Slides Ch.4-Che]]
*"Tree Thinking" Puzzles - ([http://diverge.hunter.cuny.edu/~weigang/lab-website/SummerWorkshop/Baum_etal05_sup_part1.pdf Download])
*'''Tutorial:''' PROTDIST and NEIGHBOR using [http://mobyle.pasteur.fr/cgi-bin/portal.py#welcome Mobyle Pasteur]
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
! Assignment #6
|-style="background-color:powderblue;"
| '''Chapter 4 ''' Questions & Problems (pg.95-96): 4.1, 4.3, 4.4, 4.7, 4.8
|}
|}
===April 9===
*'''Chapter 5.''' Phylogenetics II. Character-Based Methods  [[Media:CH4.pdf|Lecture Slides Ch.5-Che]]
*'''Tutorial:''' DNAML and bootstrap analysis using [http://mobyle.pasteur.fr/cgi-bin/portal.py#welcome Mobyle Pasteur]
<!--*Assignment #10. '''(To be posted)'''
Questions & Problems (pg.115-116): 5.1, 5.2, 5.3, 5.4-->
===April 16===
*'''Topic:''' Relational Database and SQL
*'''Tutorial:''' the Borrelia Genome Database
*'''Homework:''' SQL-embedded PERL
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
! Assignment #7
|- style="background-color:powderblue;"
| '''SQL-embedded PERL'''<br />
Continue work on the assignment we began in class. It is reproduced below, with some added functionality.
Your script will:
# Retrieve TEN orfs from the orf table that belong to the strain Pko.
# Find and store the sequences described by those orfs and their lengths.
# Determine if the orf is on the reference or reverse complement strand, and use that information to print the correct sequence.
# Print the orf name, sequence, and the length for each orf.
# '''In addition to printing the above information to the screen,''' write out the sequence information '''(in FASTA format)''' to a file
called "Pko_orfs.fasta". The sequence ID should be of the form:
Pko_orfname
Note that the above will require the use of BioPerl.
For those looking for extra challenges, you can try adding the following:
* Ask the user for the strain and contig *names* that they want orfs from, and only retrieve those rows. This means you must find a way
of obtaining their respective IDs from just their names. Make sure the sequence IDs are informative. They should look like this:
strainname_contigname_orfname
* If asking users for input, fail if they gave a strain or contig name which does not exist in the database.
* Also if asking users for input, the output file's name should be changed to reflect the chosen strain.
* Ask the user the minimum length the orf is allowed to be, and only print orfs as long, or longer, than what the user specifies.
Sample scripts will go up slowly, over time, including example SQL statements.
|-style="background-color:powderblue;"
| '''Questions from Text''' <br /> (pg.115-116): 5.1, 5.3
|}
===April 23===
'''NO CLASSES''' (Spring recess)
===April 30===
*'''Topic:''' Statistics
*'''In-class exercise:''' [https://docs.google.com/document/d/1wq-s8WpqyURVeGiLUxhEyBvHRDrK__Cr7XjkuLicP-c/edit?hl=en&authkey=CJ2g4qsI R basics and short demonstration of a simple boxplot]
*'''Tutorial:''' Statistical Visualization using R  [[Media:R-implementations.pdf|Lecture Slides-Che]]
<!--*Assignment #12. '''(To be posted)'''
R Exercises-->
===May 7===
*'''Chapter 6''' (Gene Expression) & '''Chapter 8''' (Proteomics)
*'''Tutorial:''' Array Data Visualization and Analysis ([[Media:Array_Data_Visualization_and_Analysis.pdf| Micro-Array Analysis Slides]])
*'''Homework:'''Data Analysis using R
{| class="collapsible collapsed wikitable"
|- style="background-color:lightsteelblue;"
! Assignment #8
|-style="background-color:powderblue;"
| '''Part 1 Data Analysis:'''
For this assignment, you will use sample data to answer the question: '''Do men and women have different body temperatures?'''
The file '''temps.txt''' located in ../bio425_2011/data on eniac, contains body temperature data for a sample of adults.
Use a hypotheses test with α = .05 to answer the above question of interest.
NOTE: For this part of the assignment you will need to turn in your answer to the question with p-values in addition to the R syntax used. '''Indicate your null hypothesis'''.
'''Part 2 Gene Expression Data Analysis:'''
Using the files '''GSM129276_cy3.txt''' & '''GSM129276_cy5.txt''' located in ./bio425_2011/data on eniac, conduct an analysis to produce a histogram of fold changes.
In addition to the histogram, you will need to turn in the R syntax used in every step of the analysis in R, along with an explanation as to why the step was necessary.
|-style="background-color:powderblue;"
| '''Read'''
'''For next class, read CH 7'''
|}
===May 14===
*'''Chapter 7.''' Protein Structure Prediction
<!--*Assignment #14 (Final Comprehensive Project). '''(To be posted)'''-->
===May 21===
*Final Project Due (TBA)
==Useful Links==
===Unix Tutorials===
*A very nice [http://www.ee.surrey.ac.uk/Teaching/Unix/ UNIX tutorial] (you will only need up to, and including, tutorial 4).
*FOSSWire's [http://files.fosswire.com/2007/08/fwunixref.pdf Unix/Linux command reference] (PDF). Of use to you: "File commands", "SSH", "Searching" and "Shortcuts".
===Perl Help===
* Professor Stewart Weiss has taught CSCI132, a UNIX and Perl class. His slides go into much greater detail and are an invaluable resource. They can be found on his course page [http://compsci.hunter.cuny.edu/~sweiss/course_materials/csci132/csci132_f10.php here].
* Perl documentation at [http://perldoc.perl.org perldoc.perl.org]. Besides that, running the perldoc command before either a function (with the -f option ie, perldoc -f substr) or a perl module (ie, perldoc Bio::Seq) can get you similar results without having to leave the terminal.
===Bioperl===
* BioPerl's [http://www.bioperl.org/wiki/HOWTOs HOWTOs page].
* BioPerl-live [http://doc.bioperl.org/bioperl-live developer documentation]. (We use bioperl-live in class.)
* Yozen's tutorial on [http://diverge.hunter.cuny.edu/wiki/HOWTO:Bioperl-live_on_Mac_OS_X installing bioperl-live on your own Mac OS X machine]. (Let me know if there are any issues!).
* [https://spreadsheets.google.com/pub?key=0AjfPzjrqY7BndHpyRHlDZUlGcktINm1IbXVzX1QzMXc&single=true&gid=0&output=html A small table] showing some methods for BioPerl modules with usage and return values.
===SQL===
* [https://docs.google.com/document/d/1zYLPeenwsqPYchkpXnndzphBbTKqX2GjjLHDxlBnt78/edit?hl=en&authkey=CLnh_88K SQL Primer], written by Yozen.
===R Project===
* Install location and instructions for [http://lib.stat.cmu.edu/R/CRAN/bin/windows/base/ Windows]
* Install location and instructions for [http://lib.stat.cmu.edu/R/CRAN/ Mac OS X]
* For users of Ubuntu/Debian:
sudo apt-get install r-base-core
* For users of Fedora/Red Hat:
su -
yum install R
===Utilities===
*An [https://chrome.google.com/webstore/detail/nlbjncdgjeocebhnmkbbbdekmmmcbfjd RSS button extension] for chrome. Can add feeds to Google Reader and others.
*A [https://chrome.google.com/webstore/detail/hcamnijgggppihioleoenjmlnakejdph similar extension] which adds a "Live bookmarks"-like feature to Chrome (like Firefox's RSS bookmarks).
===Other Resources===
* [http://www.ccrnp.ncifcrf.gov/~toms/papers/primer/primer.pdf Information Theory Primer] by Thomas D. Schneider. Useful in understanding sequence logo maps.
© Weigang Qiu, Hunter College, Last Update Jan 2013

Revision as of 17:44, 11 June 2013

Projects & Goals

  • Borrelia population genomics: Recombination & Natural Selection (Published)
  • Borrelia pan-genomics (Submitted)
  • Positive and negative selection in Borrelia ORFs and IGS (In submission)
  • Dr Bargonetti's project (Summer 2013)
  • A population genomics pipeline using MUGSY-FastTree (Summer 2013): Project page
  • Borrelia Genome Database & Browser (Summer 2013) Version 2 screen shot
  • Pseudomonas population genomics (Summer 2013) Project page
  • Hypothesis Testing: Do host-interacting genes show adaptive codon usage? (Summer 2013): Project page
  • Phylogenomics browsing with JavaScript/JQuery, Ajax, and jsPhylosvg
  • Frequency distribution of ospC types in wild tick populations (Fall 2013) Project page

Lab meeting: June 13, 2013

  • Weigang: IGS paper submission should be done by Thursday.
  • Che/Slav: Workshop update (Meeting at 3:30pm?)
  • Che: SILAC project (Meeting at 4pm?)
  • Zhenmao: Tick processing & paired-end Illumina sequencing
  • Pedro: Updates on "ncbi-orf" table
  • Girish: phyloSVG extension; QuBi video
  • Saymon and Deidre: consensus start-codons
  • Reeyes and Raymond: Pseudomonas DB; fleN alignment and phylogeny
  • Valentyna: BLASTn results (4:30pm?)

Lab meeting: May 23, 2013

  • May 24, Friday: End of School Year Party in the Park (we leave from Hunter @ 1:30pm)
  • Recommended reading of the week: Detecting Neanderthal genes using the D' homoplasy statistic
  • Weigang: IGS paper submission
  • Che: Thesis update/SILAC project/Summer teaching
  • Zhenmao: Manuscript update: Material & Methods; Results (Tables and Figures)
  • Pedro: Catlyst web framework
  • Girish: cp26 phylogenomic analysis
  • Saymon and Deidre: consensus start-codons

Lab meeting: May 16, 2013

  • Weigang: IGS paper submitted yet?
  • Che: Thesis update. Chapter 3. Evolution of ospA/ospB gene family
  • Pedro/Zhenmao: Can we wrap up the BLAST identification of ospC types?
  • Girish: Fetch cp26 sequences from DB; Run MUGSY & FastTree
  • Saymon/Deidre: Identification of consensus start-codon positions
  • Pedro/Girish: orth_get/orth_igs website development. Catalyst. Implement graphics (genome map & phylogeny) query interface
  • Raymond: start the Pseudomonas summer project

Foundational Readings

  • Molecular phylogenetics
  • Population genetics
  • Genomics
  • Systems Biology

Informatics Architecture

  • Operating Systems: Linux OS/Ubuntu, Mac OS
  • Programming languages: BASH, Perl/BioPerl, R
  • Relational Databases: PostgreSQL
  • Software architecture
    • bb2: Borrelia Genome Database
    • bb2i: an Perl API for bb2
    • DNATweezer: Perl wrappers of most frequently used BioPerl modules, including Bio::Seq, Bio::SimpleAlign, and Bio::Tree [1]
    • SimBac: A Perl/Moose package for simulating bacterial genome evolution [2]
    • Borrelia Ortholog Retriever: Download ortholog alignments from 23 Borrelia spp genomes. Search by gene names and IDs.[3]
  • Hardware Setup
    • NSF File Server
    • Database and Application Server
    • Web Server
    • Linux Workstations

Perl Challenges

Problem Input Output
DNA transcription A DNA sequence, in 5'-3' direction (e.g., aaatttaaaagacaaaaagactgctctaagtcttgaaaatttggttttcaaagatgat) An RNA sequence, in 5'-3' direction
Genetic code None 64 codons, one per line (using loops)
Random sequence 1 None Generate a random DNA sequence (e.g., 1000 bases) with equal base frequencies
Random sequence 2 None Generate a random DNA sequence with biased base frequencies, e.g., 10% G, 10% C, 40% T, and 40% A.
Graphics I a categorical dataset, e.g., Biology a bar graph & a pie char, using GD::Simple or Postscript::Simple