Abstract The relative scarcity of the results reported by genetic association studies (GAS) prompted many research directions. Despite the centrality of the concept of association in GASs, refined concepts of association are missing; meanwhile, various feature subset selection methods became de facto standards for defining multivariate relevance. On the other hand, probabilistic graphical models, including Bayesian networks (BNs) are more and more popular, as they can learn nontransitive, multivariate, nonlinear relations between complex phenotypic descriptors and heterogeneous explanatory variables. To integrate the advantages of Bayesian statistics and BNs, the Bayesian network based Bayesian multilevel analysis of relevance (BN-BMLA) was proposed. This approach allows the processing of multiple target variables, while ensuring scalability and providing a multilevel view of the results of multivariate analysis. This chapter discusses the use of Bayesian BN-based analysis of relevance in exploratory data analysis, optimal decision and study design, and knowledge fusion, in the context of GASs.
Heroin dependence is a debilitating psychiatric disorder with complex inheritance. Since the dopaminergic system has a key role in rewarding mechanism of the brain, which is directly or indirectly targeted by most drugs of abuse, we focus on the effects and interactions among dopaminergic gene variants.
Összefoglaló. A fejlett társadalmak egészségügyi rendszereinek legnagyobb kihívását az öregedéssel összefüggő, korfüggő betegségek jelentik. Annak megértéséhez, hogy az egyes genetikai variánsoknak mi a szerepük egy korfüggő betegség kialakulásában, meg kell ismerkednünk magával az öregedési folyamattal, az egészséges hosszú élettel asszociált, valamint az adott populációra jellegzetes variánsokkal is. A Semmelweis Egyetem Genomikai Medicina és Ritka Betegségek Intézete a Nemzeti Bionika Program keretén belül a Magyar Genomikai Egészségtárház felállítását tűzte ki célul, időskoruk mellett is egészséges önkéntesek teljesgenom-szekvenciáinak és kapcsolódó fenotípusadatainak katalogizálásával és elemzésével, létrehozva az első magyar teljes genomi referencia-adatbázist. Fontos szempont volt, hogy a kutatás az egészséges öregedést vizsgáló nemzetközi projektekhez is kapcsolódást biztosítson, így lehetőséget teremtve a különböző országokból származó adatok harmonizálására és közös elemzésére. A kutatás résztvevőinek 49%-a 70–80 éves, 36%-a 81–90 éves, 14%-uk pedig 90 év feletti; a nemek aránya 44/56%-os megoszlást mutatott a férfiak és a nők között. A résztvevők csaknem fele (46%) egyedül él. Magas a felsőfokú végzettségűek aránya (46%), a résztvevők 61%-a hosszú időn át sportolt, 70%-uk sosem dohányzott. A vizsgálati alanyok szülei is magas életkort éltek meg, az édesapáknál 74,3, az édesanyák esetében pedig 80,47 év volt a halálozáskori átlagéletkor. Adattárházunk elsőként tervez hozzáférést biztosítani egy magyar teljes genomi referencia-adatbázishoz, amely a genetikusan meghatározott betegségek és fenotípusok kutatásában és a klinikai gyakorlatban is alapvető fontosságú. A projekt bioinformatikai fejlesztései a genetikai/genomikai információk többszintű elérését támogatják a személyes adatok védettségét megőrző statisztikai elemzési és mesterségesintelligencia-eljárások segítségével. Orv Hetil. 2021; 162(27): 1079–1088. Summary. Genetics has proven to be a a successful approach in the study of ageing. To understand the role of each genetic variant in the development of an age-dependent disease, we need to become familiar with the ageing process itself and with the population-specific variants. The Institute of Genomic Medicine and Rare Disorders of the Semmelweis University within the framework of the National Bionics Program set up a data collection, the Hungarian Genomic Data Warehouse, by cataloging and analyzing complete genome sequences and related phenotype data of healthy volunteers, which also serves as a reference national Hungarian genomic database. The structure of the data warehouse allows interoperability with the most important international research projects on ageing. 49% of the participants in the Hungarian Genomic Data Warehouse were 70–80 years old, 36% were 81–90, 14% over 90 years old. The gender ratio was 44/56% between men and women. The proportion of people with higher education is high (46%), 61% of the participants played sports for a long time, and 70% never smoked. The parents of the participants also lived a high age, with an average age at death of 74.3 years for fathers and 80.47 years for mothers. The Hungarian Genomic Data Warehouse can provide vital and timely support in personalized medicine, especially in the research and diagnosis of genetically inherited disorders. The long-term goal of these bioinformatic developments is to provide access at multiple levels to the genomic data using privacy-preserving data analysis methods in genomics. Orv Hetil. 2021; 162(27): 1079–1088.
Electronic, nanopore based single molecule real-time DNA sequencing technology offers very long, albeit lower accuracy reads in sharp contrast to existing next-generation sequencing methods, which offer short, high-accuracy reads in abundance. We provide a systematic review of the error characteristics of this new sequencing platform, and demonstrate the most challenging aspects in the field of whole gene sequencing through the human HLA-DQA2 gene using long-range PCR products on multiplexed samples. We consider the limitations of these errors for the applications of this technology, and also indicate prospective improvements and expected thresholds with respect to these errors.
Abstract Synaptosomal‐associated protein 25 (SNAP‐25) plays a crucial role in exocitosis. Single nucleotide polymorphisms (rs3746544 and rs1051312) in the 3′ un‐translated region of the SNAP‐25 gene have been described to be in association with attention‐deficit hyperactivity disorder. As the disease affects millions of children world‐wide, understanding the genetic background of attention‐deficit hyperactivity disorder is of crucial importance. Efficient and reliable PCR‐RFLP protocols were elaborated for the genotyping of the rs3746544 and rs1051312 SNPs employing a high‐throughput capillary electrophoresis method for fragment analysis. A novel real‐time PCR‐based technique was used applying sequence specific TaqMan probes to haplotype the two SNPs, and the G–C haplotype could not be detected in a large Caucasian population ( N =1376). These findings have been confirmed by molecular biology tools as well as by the PHASE Bayesian computational approach. In silico analyses have suggested that the two SNPs might alter microRNA binding and thus have an effect on SNAP‐25 production. We have demonstrated that this biological information can be revealed only by direct haplotype analysis emphasizing the importance of our novel molecular haplotye analysis protocol. Results of the study of the two SNPs might shed light on the association of SNAP‐25 variants and pathological phenotypes at the molecular level.
The low concordance between different variant calling methods still poses a challenge for the wide-spread application of next-generation sequencing in research and clinical practice. A wide range of variant annotations can be used for filtering call sets in order to improve the precision of the variant calls, but the choice of the appropriate filtering thresholds is not straightforward. Variant quality score recalibration provides an alternative solution to hard filtering, but it requires large-scale, genomic data.We evaluated germline variant calling pipelines based on BWA and Bowtie 2 aligners in combination with GATK UnifiedGenotyper, GATK HaplotypeCaller, FreeBayes and SAMtools variant callers, using simulated and real benchmark sequencing data (NA12878 with Illumina Platinum Genomes). We argue that these pipelines are not merely discordant, but they extract complementary useful information. We introduce VariantMetaCaller to test the hypothesis that the automated fusion of measurement related information allows better performance than the recommended hard-filtering settings or recalibration and the fusion of the individual call sets without using annotations. VariantMetaCaller uses Support Vector Machines to combine multiple information sources generated by variant calling pipelines and estimates probabilities of variants. This novel method had significantly higher sensitivity and precision than the individual variant callers in all target region sizes, ranging from a few hundred kilobases to whole exomes. We also demonstrated that VariantMetaCaller supports a quantitative, precision based filtering of variants under wider conditions. Specifically, the computed probabilities of the variants can be used to order the variants, and for a given threshold, probabilities can be used to estimate precision. Precision then can be directly translated to the number of true called variants, or equivalently, to the number of false calls, which allows finding problem-specific balance between sensitivity and precision.VariantMetaCaller can be applied to small target regions and whole exomes as well, and it can be used in cases of organisms for which highly accurate variant call sets are not yet available, therefore it can be a viable alternative to hard filtering in cases where variant quality score recalibration cannot be used. VariantMetaCaller is freely available at http://bioinformatics.mit.bme.hu/VariantMetaCaller .