# QuBi/modules/biol203-geno-pheno-association

- BIOL 203 Summer 2020 - Bioinformatics Exercises for Lab 11

## Test phenotype-genotype association

### Introduction: GWAS & Contingency Test

Genome-Wide Association Study (GWAS) is a method for mapping phenotypes to genotypes. In a typical GWAS study, frequencies of alleles (e.g., C or T at position 785) are determined in a sample of affected individuals (the "cases" e.g. disease) as well as in a sample of unaffected individuals (the "controls"). For example, the following table shows results of a hypothetical case-control study at a locus segregating with two alleles (C and T):

Table 1. Sample Genotype Frequencies

T/T | T/C | C/C | Total | |
---|---|---|---|---|

Case | 0 | 24 | 127 | ? |

Control | 9 | 68 | 114 | ? |

Total | ? | ? | ? | ? |

Association between the genotype and the phenotype could be assessed with a contingency table analysis. In this case, Χ^{2} = 26.4, p<0.0005, suggesting a significant association between genotypes and diseases. (By comparing the expected and observed counts, one could conclude that the C/C genotypes are over-represented in disease cases.)

1. Perform an online contingency table analysis using the hypothetical data in Table 1. Click on "other contingency tables" and do a 2-rows and 3-columns test with the data above. Your Χ^{2} should be 26.4.

2. Deriving from Table 1, fill the following table with allele counts. Then perform a 2-by-2 contingency table analysis using the link above. For example, in the controls, the number of T alleles is: 18 + 68 = 86 , because homozygotes have two alleles and heterozygotes have one.

Is there a statistically significant association between alleles and disease phenotype? Which allele (C or T) is over-represented in (i.e., statistically associated with) disease cases?

Table 2. Sample Allele Frequencies

T | C | Total | |
---|---|---|---|

Case | ? | ? | ? |

Control | ? | ? | ? |

Total | ? | ? | ? |

### Test association with locus A

Following the above two examples, perform both the genotype and allele association tests using the class data.

Table 3a. Genotype counts at Locus A

A1/A1 | A1/A2 | A2/A2 | Row Sum | |
---|---|---|---|---|

Taster | ? | ? | ? | ? |

Non-Taster | ? | ? | ? | ? |

Column Sum | ? | ? | ? | ? |

Calculate allele counts & then test for association

Table 3b. Allele counts at Locus A

A1 | A2 | Row Sum | |
---|---|---|---|

Taster | ? | ? | ? |

Non-Taster | ? | ? | ? |

Column Sum | ? | ? | ? |

### Test association with Locus B

Table 4a. Genotype counts at Locus B for each phenotype

B1/B1 | B1/B2 | B1/B3 | B2/B2 | B2/B3 | B3/B3 | Row Sum | |
---|---|---|---|---|---|---|---|

Taster | ? | ? | ? | ? | ? | ? | ? |

Non-Taster | ? | ? | ? | ? | ? | ? | ? |

Column Sum | ? | ? | ? | ? | ? | ? | ? |

Calculate allele counts & then test for association Table 4b. Allele counts at Locus A

B1 | B2 | B3 | Row Sum | |
---|---|---|---|---|

Taster | ? | ? | ? | ? |

Non-Taster | ? | ? | ? | ? |

Column Sum | ? | ? | ? | ? |

## Web Exercise. Search for gene information using NCBI online databases

- Point your browser to the NCBI Human Genome Resource page
- Copy and paste sequence provided on Blackboard- this is the sequence of the gene associated with the taster phenotype
- Expand the "Algorithm parameters" tab and change "Expect threshold" to 0.00001 (10e-5). Define "expect value" in your owns words after watching the linked Youtube video.
- Press "BLAST". Copy & Paste the top hit in your final lab report.
- Briefly describe the function of the gene based on information gathered on the locus page

## Lab Report IV

- Your report should include the following results:
- A printout of contingency test for Locus A, including expected counts, observed counts, chi-square statistic, degree of freedom, and p values
- Same as above for Locus B
- A printout of alignment for the top BLAST hits for the sequence provided

Additional questions to include in your report:

- State what is the
*null hypothesis*in a chi-square test & what is the*alternative hypothesis* - Explain what probability is represented by the p-value.
- What can you conclude when p-value is
**below**the threshold of significance (e.g., p = 0.05)? - What would you conclude when p-value is
**above**the critical value? - Is there a statistically significant association between one of the alleles tested and the Taster phenotype?
- Which genotype is over-represented in the Non-Tasters?
- Which allele is over-represented in the Non-Tasters?
- Are there exceptions? What are possible causes for exceptions?
- Define e-value in a BLAST search

- State what is the