Cross-Reactive DNA Microarray Probes Lead to False Discovery of Autosomal Sex-Associated DNA Methylation

2012 
To the Editor: The majority of the significant sex-associated DNA-methylation sites at autosomal CpG loci reported by Numata et al.1 do not reflect a true biological phenomenon. Rather, the conclusions in this paper reflect a technical artifact created by the presence of cross-reactive autosomal probes hybridizing to both autosomal and sex chromosomes. Numata et al.1 used the Illumina Infinium HumanMethylation27K microarray to assess genome-wide DNA methylation. This microarray uses 50 nt probes to target 27,578 CpG sites covering ∼13,000 genes. So that one can distinguish between the methylated and unmethylated alleles, DNA is treated with sodium bisulfite for converting unmethylated cytosines to uracil. Then, PCR amplification converts uracil to thymidine. In contrast, methylated cytosines remain cytosines. In this microarray, two probes are designed for each CpG site—one is designed for the methylated allele (cytosine), and the other is designed for the unmethylated allele (thymidine). On the Illumina Infinium HumanMethylation27K microarray, there is a subset of probes that target autosomal loci but cross-react with genomic regions on the sex chromosomes. Because one of the X chromosomes in females is heavily methylated as a result of X inactivation,2 autosomal CpG loci targeted by probes that overlap these heavily methylated loci create spurious signals and therefore appear more methylated in females than in males. On the other hand, autosomal probes cross-hybridizing to unmethylated X chromosome loci that escape X inactivation show lower methylation in females than in males. Likewise for the Y chromosome, probes that cross-hybridize also shift the DNA-methylation level of the originally targeted autosomal CpG loci to produce a spurious increase in the methylation signals in males compared to females. We can identify cross-reactive probes on the 27K microarray as having highly identical matches to nontargeted loci by first mapping probe sequences against the in silico sodium-bisulfite-converted reference genome (hg18) by using BLAT.3 In addition, the end nucleotide of the probes and the nontargeted loci are required to be the same for cross-hybridization to occur because array signals are derived from single-base extension of fluorescently tagged nucleotides at one end of the probes that correspond to the targeted CpGs. In Table 1, we have appended the potential cross-hybridizing targets of probes corresponding to the top ten autosomal genes described by Numata et al.1 as having the most significant sex differences. Using the same microarray platform, Liu et al. and Adkins et al.4–6 also reported the same overlapping set of autosomal sex-associated DNA-methylation sites, which we have found to be the result of technical artifact.3 The claim by Numata et al.1 that 5% of autosomal loci (or 1,333 CpGs) have significant differential methylation associated with sex is likely to be an overestimate because of the presence of autosomal probes cross-hybridizing to the sex chromosomes. The full list of CpG sites proposed to have significant sex differences was not published, so they could not be evaluated. Of course, this does not exclude the possibility that there are indeed true autosomal sex-associated sites of DNA methylation in humans because two of the top ten autosomal sex-associated CpG sites reported by Numata et al.1 are not targeted by cross-reactive probes. Notably, several other studies have observed true autosomal sex-associated DNA methylation by using targeted molecular approaches.7,8 Table 1 Cross-reactive Autosomal Probes Lead to False Discovery of Sex-Associated DNA Methylation The recognition of falsely discovered autosomal sex-associated DNA-methylation sites in our laboratory led us to perform a series of bioinformatic analyses to identify other potential cross-hybridizing targets. We found 6%–10% of the 27,578 probes in the Illumina Infinium 27K microarray to be cross-reactive and to thereby potentially generate false positives and reduce the power of downstream analyses.3 For example, of the top 100 most significant CpG loci reported by Numata et al.1 to be associated with developmental stages, age, expression, cis-mQTLs (methylation quantitative trait loci) or trans-mQTLs, we found a substantial overlap between cis- and trans-mQTLs and CpG loci targeted by cross-reactive probes on the basis of our previously published list of cross-reactive probes3 (Table 2). We also observed significantly higher proportions of cross-reactive probes in cis- and trans-mQTLs (23% and 28%, respectively)1 than in the entire array (6%–10%).3 This raises the possibility that the cross-hybridizing targets might overlap underlying SNPs and might thereby create spurious signals for which intensities depend on SNP genotypes. The fact that cross-reactive probes have multiple targets, i.e., an increased chance of hybridizing to a SNP variant rather than a single unique target, could explain the observed enrichments. Table 2 Proportion of Significant CpGs Targeted by Cross-reactive Probes or SNPs in Numata et al. Study Another potential source of error in generating data from the 27K microarray arises from probes targeting polymorphic CpGs (i.e., SNPs at either cytosine or guanine).3 A total of 907 (3%) CpG sites overlapping SNPs (dbSNP build 132) are targeted by this microarray (see Web Resources). Notably, a large proportion (29%) of the top 100 most significant cis-mQTLs reported by Numata et al.1 are linked to probes targeting loci that are polymorphic CpGs (i.e., SNPs overlapping CpG sites) (Table 2). In these cases, the methylation changes are likely to be a reflection of the underlying polymorphism. That is, the methylation level reflects alternate haplotypes of in-cis-associated SNPs so that non-CG variants of polymorphic CpGs would be detected as unmethylated loci.3 Further, the proportion of polymorphic CpGs is 2–4× higher in the top 100 most significant trans-mQTLs (14%), mQTLs in African Americans (9%), and mQTLs in individuals of European descent (mentioned as Caucasian in Numata et al.1) (8%) than in the entire array (3%) (Table 2). This further demonstrates the effect of polymorphisms at CpG loci on the evaluation of DNA methylation, suggesting that any quantitative association (e.g., between methylation and gene expression) involving these CpG loci could be greatly perturbed by the underlying genotypes of the population studied. The existence of cross-reactive probes and polymorphic CpGs in the Illumina Infinium 27K microarray reflects the human genome’s natural diversity, which results from homologous and repetitive sequences and SNPs. Therefore, investigators should exercise caution when significant associations are found at CpG sites that are either polymorphic or targeted by cross-reactive probes. Biological interpretation requires validation of the detected methylation by other approaches such as sodium-bisulfite pyrosequencing. That Numata et al.1 and three other studies4–6 reported parallel findings previously shown to be the result of cross-reactive probes3 is concerning. False discovery has the potential to be used for inappropriately generating hypotheses or inferring biological significance. Considering that the Illumina platform is one of the most widely used DNA-methylation microarrays, we hope this letter will serve as a cautionary note for researchers who use Illumina Infinium DNA-methylation microarrays.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    26
    Citations
    NaN
    KQI
    []