Data quality control in genetic case control association studies pdf

Genetic factors are likely to affect the occurrence of numerous common diseases, and therefore identifying and characterizing the associated risk or protection will be important in improving the understanding of etiology and potentially for. The traditional approach for analysis of case control studies is prospective logistic regression. For each study design our goal is to achieve control similar to that obtained for a familybased study, but with the convenience found in a populationbased. Anderson 1,2, fredrik h pettersson 1, geraldine m clarke 1, lon r cardon 3, andrew p. Test whether genetic polymorphisms alleles are associated with disease status. Practice of epidemiology on information coded in gene. In fact, this is the sine qua non of associationbased genetic studies. Understand the conditions under which population stratification can occur. Genetic association an overview sciencedirect topics. However, performing genetic association studies in a correct manner requires specific knowledge of genetics, statistics, and bioinformatics. N and m, where n is a normal allele and m is an allele with high risk. The probability 34 of all 5 studies detecting the association is only 0. Pdf basic statistical analysis in genetic casecontrol.

The central theme in casecontrol genetic association studies is to e ciently identify genetic markers associated with casecontrol status. Here the basis of inference is formed by the likelihood of the disease d outcome data conditional on covariate information x, ignoring the fact that under the case control sampling design, data are observed on x conditional on d. Combining casecontrol and casetrio data from the same. Powerful statistical methods are critical to accomplishing this goal. Data quality control in genetic casecontrol association. Exploration of a diversity of computational and statistical.

This protocol describes how to perform basic statistical analysis in a populationbased genetic association case control study. Regardless of whether a single biallelic snp is under consideration in a candidate gene study or thousands in genomewide association studies, analyses are usually carried out 1 snp at a time, with subsequent adjustment for multiple testing 9, 10. In genetics, a genomewide association study gwa study, or gwas, also known as whole genome association study wga study, or wgas, is an observational study of a genomewide. Genomic control, a new approach to geneticbased association.

Robust trend tests for genetic association in casecontrol. Basic statistical analysis in genetic casecontrol studies. Describe what is meant by population stratification. Statistical analysis of genomewide association gwas data. Three lectures on casecontrol genetic association analysis. Robust statistical tests of genetic association for the case. Genetic casecontrol association studies in neuropsychiatry. Pdf basic statistical analysis in genetic casecontrol studies. The steps described involve the identification and removal of dna samples and markers that introduce bias to the study. Genetic epidemiology association studies and power. Analysis of genetic variants using unrelated subjects in the casecontrol design. A consequence of the rapid developments in the field of genetic association study is the large number of publications. While the protocol applies to genotypes after they have been determined called from probe intensity data, it is still important to understand how the genotype calling was conducted. Despite the many similarities between genetic association studies and classical observational epidemiologic studies that is, crosssectional, casecontrol, and cohort of.

Here the basis of inference is formed by the likelihood of the disease d outcome data condi. Analysis of casecontrol studies of genetic and environmental. Rare genetic variants of large effect influence risk of type. Indeed, case control genetic association studies have already contributed to identifying genes associated with complex disorders, as in the cases of apolipoprotein e4 with lateonset alzheimer disease 31 and factor v gene with venous thrombosis.

This protocol describes how to perform basic statistical analysis in a populationbased genetic association casecontrol study. Genetic association studies genetic association studies candidate gene and genomewide association studies often case control study design basic idea. Despite the many similarities between genetic association studies and classical observational epidemiologic studies that is, crosssectional, casecontrol, and cohort of lifestyle and environmental factors, genetic association studies present several specific challenges, including an unprecedented volume of new data and the likelihood. Consider data for a casecontrol study of genetic association as in table 1. Data quality control in genetic casecontrol association studies. This paper aims to provide a guideline for conducting genetic analyses by introducing key concepts and by sharing scripts that can be used for data analysis.

Quality control is the system of actions which have the aim to measure the quality of the product manufactured at the company and to approve or disapprove. Common statistical issues in genomewide association studies. We describe how to use plink, a tool for handling snp data, to perform assessments of failure rate per individual and per snp. Genomewide association studies and crisprcas9mediated gene. Strengthening the reporting of genetic association studies. Aug 26, 2010 this protocol deals with the quality control qc of genotype data from genomewide and candidategene case control association studies, and outlines the methods routinely used in key studies from. Be familiar with the methods used to address population stratification.

For analysis of casecontrol genetic association studies, it has recently been shown that geneenvironment independence in the population can be leveraged to increase ef. A genetic association case control study compares the frequency of alleles or genotypes at genetic marker loci, usually singlenucleotide polymorphisms snps see box 1 for a glossary of terms. What is a false positive negative association and how can a genomewide study minimize these types of errors. Nyholt 1 human genetics volume 109, pages 564 565 2001 cite this article.

Basic statistical analysis in genetic casecontrol studies geraldine m clarke 1, carl a anderson 2, fredrik h pettersson 1, lon r cardon 3, andrew p morris 1, and krina t zondervan 1. The st rengthening the reporting of genetic association studies strega initiative builds on the st rengthening the re porting of ob servational studies in e pidemiology strobe statement. Statistical methods to test for association in casecontrol gwa studies allele counting chisquare test logistic regression multiple testing and power example. A variety of methods have been proposed to this end, mostly statistical in nature and differing in assumptions and type of model employed. Regardless of whether a single biallelic snp is under consideration in a candidate gene study. Here we enumerate some of the challenges in qc of gwas data and describe the approaches that the electronic medical records and genomics emerge network is using for quality assurance in gwas data, thereby minimizing. The traditional approach for analysis of casecontrol studies is prospective logistic regression. Free case study samples and examples on quality control are 100% plagiarized at writing service you can buy a custom case study on quality control topics.

Similar to previous type 1 diabetes genetic association studies 11,18, the case control design of our gwas metaanalysis did not allow for matching of case subjects to control subjects within the same european population because of the lack of availability of control samples in each participating case cohort. The rapidly evolving evidence on genetic associations is crucial to integrating human genomics into the practice of medicine and public health 1,2. This protocol details the data quality assessment and control steps that are typically carried out during casecontrol association studies. Data quality control in genetic case control association studies carl a. This article provides a broad outline of the design and analysis of such studies, focusing on casecontrol studies in candidate genes or regions. The goal of a genetic association study is to establish. Discuss how population stratification may affect the. A popular statistical method is the modelfree pearsons chisquare test. In addition to outlining the published ideas on this method, we describe several extensions. The simulated data used here have passed standard quality control. Common statistical issues in genomewide association. Case control studies are observational because no intervention is attempted and no attempt is made to alter the course of the disease.

What is a false positive negative association and how can a genomewide. A casecontrol study also known as casereferent study is a type of observational study in which two existing groups differing in outcome are identified and compared on the basis of. Dec 30, 2005 consider data for a case control study of genetic association as in table 1. What is the relationship between genomic coverage and the power of genetic. Quality control qc procedures for gwas are computationally intensive, operationally challenging, and constantly evolving. The casecontrol study design is often used in the study of rare diseases or as a preliminary study where little is known about the association between the risk factor and disease of interest. In fact, this is the sine qua non of association based genetic studies. Context the search for disease susceptibility genes. Genetic association studies are used to find candidate genes or genome regions that contribute to a specific disease by testing for a correlation between disease status and genetic variation. This protocol deals with the quality control qc of genotype data from genomewide and candidategene casecontrol association studies, and outlines the methods routinely. Traditional epidemiological studies focus on assessing the impact of specific risk factors on disease risk in populations.

Gwas for multiple sclerosis ms data cleaning quality control results. Genetic epidemiology association studies and power considerations. The goal of a genetic association study is to establish statistical associations between. This protocol details the steps for data quality assessment and control that are typically carried out during casecontrol association studies. The steps described involve the identification and removal of dna samples and markers that introduce bias. This protocol deals with the quality control qc of genotype data from genomewide and candidate gene casecontrol association studies. Teoa,b introduction genomewide association study gwas is increasingly common as an experimental design for investigating the genetic basis of common diseases and complex traits in humans. Samples in genetic casecontrol association analyses yaning yang1. Casecontrol association studies are an increasingly popular approach to identifying genes that cause neuropsychiatric disorders. A genetic association casecontrol study compares the frequency of alleles or genotypes at genetic marker loci, usually singlenucleotide polymorphisms snps see box 1 for a glossary.

The principal line of investigation in genome wide association studies gwas is the identification of main effects, that is individual single nucleotide polymorphisms snps which are associated with the trait of interest, independent of other factors. The transition from genetic linkage analyses to association studies risch and merikangas, 1996. We aimed to identify novel rare or lowfrequency maf genetic markers associated with casecontrol status. Samples in genetic casecontrol association analyses. The goal is to retrospectively determine the exposure to the risk factor of interest from each of the two groups of individuals. Discuss how population stratification may affect the interpretation of case control genetic association studies. Analysis of genetic variants using unrelated subjects in the case control design.

836 785 1405 276 106 341 725 981 883 510 350 1325 669 972 1188 347 1464 1437 354 1058 848 1197 1229 561 588 1001 1216 1135 857 402 498 1189 532 1263 58 251 5 367 1090 582 26 1371 1033 815 1115 863