Screening a large reference sample to identify very low frequency sequence variants: Comparisons between two genes
Most human sequence variation is in the form of single-nucleotide polymorphisms (SNPs). It has been proposed that coding-region SNPs (cSNPs) be used for direct association studies to determine the genetic basis of complex traits. The success of such studies depends on the frequency of disease-associated alleles, and their distribution in different ethnic populations. If disease-associated alleles are frequent in most populations, then direct genotyping of candidate variants could show robust associations in manageable study samples. This approach is less feasible if the genetic risk from a given candidate gene is due to many infrequent alleles. Previous studies of several genes demonstrated that most variants are relatively infrequent (<0.05). These surveys genotyped small samples (n<75) and thus had limited ability to identify rare alleles. Here we evaluate the prevalence and distribution of such rare alleles by genotyping an ethnically diverse reference sample that is more than six times larger than those used in previous studies (n=450). We screened for variants in the complete coding sequence and intron-exon junctions of two candidate genes for neuropsychiatric phenotypes: SLC6A4, encoding the serotonin transporter; and SLC18A2, encoding the vesicular monoamine transporter. Both genes have unique roles in neuronal transmission, and variants in either gene might be associated with neurobehavioral phenotypes.