Affymetrix

An evaluation of the Bayesian Robust Linear Modeling using Mahalanobis Distance (BRLMM) Genotyping Algorithm

Finny Kuruvilla, Todd Green, David Altshuler, Mark Daly, and Stacey Gabriel
Broad Institute of MIT and Harvard
April 24, 2006

Genotyping algorithms should be accurate, unbiased against heterozygotes, and afford high-call rates. The genotyping algorithm currently embedded in the Affymetrix software, known as the Dynamic Modeling (DM) algorithm, has been shown to have bias against calling heterozygotes and sometimes suffers from low overall call rates. The Dynamic Modeling algorithm has a key inherent limitation: it cannot examine multiple chips at a time, thus preventing "learning" from aggregate data sets about how each SNP behaves. Speed and colleagues have proposed a method known as Robust Linear Modeling using Mahalanobis distance (RLMM) that overcomes this and other limitations (ref. 1). By introducing a Bayesian estimation component, Affymetrix has further advanced this method developed by Rabbee and Speed, particularly for SNPs of a low minor allele frequency. We have collaborated with Affymetrix to test the Bayesian RLMM, also known as BRLMM (pronounced "bee realm"), using data sets generated at the Broad Institute of MIT and Harvard.

To evaluate the BRLMM algorithm in the context of the 500K Affymetrix genotyping platform, we first evaluated 48 CEPH samples (from the International Hapmap consortium) that were run at the Broad Institute using both the DM and BRLMM algorithms (Table 1). All DM calls were done at the threshold of 0.26. Call rates (overall, homozygote, and heterozygote) as well as concordance with independently generated genotypes were the primary metrics employed. Concordance represents the percent that DM or BRLMM agree with Hapmap data and thus combines the error rates of both Hapmap and either genotyping method. The following represent a summary table for all SNPs for which Hapmap data were available (approximately 176,000 SNPs).

48 CEPH samples (Sty)
DM
BRLMM
Overall call rates
95.9%
98.6%
Ave. homozygote call rates
97.7%
98.5%
Ave. heterozygote call rates
90.6%
99.0%
Overall concordance with Hapmap
98.7%
99.1%
Ave. homozygote concordance
99.0%
99.0%
Ave. heterozygote concordance
97.8%
99.4%

Table 1.

Substantial improvements in average call rates were seen with BRLMM (98.6% compared to 95.9%), as well as a lack of bias against heterozygotes (99.0% average heterozygote call rate compared to 90.6%). Overall concordance with known Hapmap genotypes improved, particularly concordance for heterozygotes. Since overall Hapmap accuracy is estimated to be about 99.5%, these concordance data suggest that BRLMM accuracy is 99.5% or slightly better.

We next tested the BRLMM and DM algorithms in a large data set without independent genotype information, in slightly over 1200 unselected samples collected for an association study. The samples were collected in trio (father, mother, child) fashion. Samples were scanned at the Broad Institute. Overall BRLMM call rates were significantly improved over those of DM (Figure 1).

Text Box: Figure 1

Figure 1.

One of the known problems of DM is its tendency to disproportionately drop heterozygote calls as overall call rate decreases. BRLMM was tested for this same property. As is shown in figure 2 (red circles), the fraction of heterozygous calls drops as the overall call rate drops, indicating that heterozygotes are preferentially missed. When the same data is called with the BRLMM algorithm (blue circles), no such relationship is observed.

Figure 2.


Thus, consistent with the Hapmap data, BRLMM does not appear to have a calling bias against heterozygotes. The sample outliers in Figure 2 (Sty) that showed a rise in heterozygosity with lower overall call rate would be removed with the standard filter criteria of only accepting samples with call rate > 90%. It should be noted that these outliers were present with both DM and BRLMM.

Since independent genotypes were not available for these samples, adherence to Hardy-Weinberg equilibrium was calculated as a measure of genotype accuracy. Hardy-Weinberg performance was examined by comparing generated p-values against the expected uniform distribution in Q-Q plots. As can be seen (Figure 3), with DM many SNPs (14.6% Nsp, 13.9% Sty) have very low HWE p values (P<0.01), as compared to 1% expected by chance. In contrast, wtih BRLMM fewer SNPs (3.4% Nsp, 4.0% Sty) have p < 0.01.

Figure 3.

Mendel errors can serve as a confirmation of sample relatedness and genotyping accuracy. After genotyping all ~1200 samples, trios were discarded that had greater than 4000 Mendel errors. Also discarded were SNPs with greater than 4 Mendel errors per SNP, SNPs with overall call rate of less than 90%, samples with call rates of less than 90%, and SNPs with Hardy-Weinberg p-value < .001. These standard filters gave the following final summary statistics (note that the below data are only for autosomes):

Nsp
DM
BRLMM
Samples (478 before filters)
478 (100%)
478 (100%)
SNPs (256,553 before filters)
199,282
(77.7%)
249,636
(97.3%)
Sample call rate
97.4%
99.2%
Mendel error rate (per genotype, per trio)
0.09%
0.10%

 

Sty
DM
BRLMM
Samples (739 before filters)
721
(97.5%)
726
(98.2%)
SNPs (233,477 before filters)
195,289
(83.6%)
225,035
(96.4%)
Sample call rate
97.6%
99.1%
Mendel error rate (per genotype, per trio)
0.15%
0.11%

Table 4.

In conclusion, BRLMM significantly outperformed the DM algorithm in achieving higher call rates, improved concordance with independent genotypes (Hapmap), decreased bias against heterozygotes, fewer SNPs out of Hardy-Weinberg equilibrium, fewer samples and SNPs lost after the application of standard filter criteria, and fewer Mendel errors. Based on significant improvements in every quality control metric examined, we see no reason to not immediately replace DM with BRLMM in all applications.


1. Rabbee N and Speed TP. A genotype calling algorithm for Affymetrix SNP arrays. Bioinformatics 22:7-12 (2006).