Abdallah Amr Mahmoud

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Biljana Novković

Hokkaido University

Andrew Terpolovsky

Varuna Bamunusinghe

Karatuğ Ozan Bircan

European Bioinformatics Institute

Puya G. Yazdi

University of California, Irvine

Madhuchanda Bose

Augusta University

Adriano De Marino

Igenomix

Umar S. Khan

National University of Sciences and Technology

Sandra Bohn

University of Southern Mississippi

Manfred Grabherr

Inland Norway University of Applied Sciences

Cooperative Institutions

University of Florida

Harvard University

National University of Medical Sciences

Scripps Research Institute

Scripps (United States)

Scripps Institution of Oceanography

La Jolla Alcohol Research

Scripps Health

European Bioinformatics Institute

University of Southern Mississippi

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

SumStatsRehab: an efficient algorithm for GWAS summary statistics assessment and restoration

BMC Bioinformatics (2022)

Mykyta Matushyn Madhuchanda Bose Abdallah Amr Mahmoud Lewis Cuthbertson Carlos Tello

Generating polygenic risk scores for diseases and complex traits requires high quality GWAS summary statistic files. Often, these files can be difficult to acquire either as a result of unshared or incomplete data. To date, bioinformatics tools which focus on restoring missing columns containing identification and association data are limited, which has the potential to increase the number of usable GWAS summary statistics files.SumStatsRehab was able to restore rsID, effect/other alleles, chromosome, base pair position, effect allele frequencies, beta, standard error, and p-values to a better extent than any other currently available tool, with minimal loss.SumStatsRehab offers a unique tool utilizing both functional programming and pipeline-like architecture, allowing users to generate accurate data restorations for incomplete summary statistics files. This in turn, increases the number of usable GWAS summary statistics files, which may be invaluable for less researched health traits.

Genome-wide Association Study

Summary statistics

Statistic

USable

Identification

10.1186/s12859-022-04920-7

Cite

Citations (6)

SumStatsRehab: An Efficient Algorithm for GWAS Summary Statistics Assessment and Restoration

Research Square (Research Square) (2022)

Puya Yazdi Manfred Grabherr Biljana Novković Umar Khan Varuna Bamunusinghe

Abstract Background : Generating polygenic risk scores for diseases and complex traits requires high quality GWAS summary statistic files. Often, these files can be difficult to acquire either as a result of unshared or incomplete data. To date, bioinformatics tools which focus on restoring missing columns containing identification and association data are limited, which has the potential to increase the number of usable GWAS summary statistics files. Results : SumStatsRehab was able to restore rsID, effect/other alleles, chromosome, base pair position, effect allele frequencies, beta, standard error, and p-values to a better extent than any other currently available tool, with minimal loss. Conclusions : SumStatsRehab offers a unique tool utilizing both functional programming and pipeline-like architecture, allowing users to generate accurate data restorations for incomplete summary statistics files. This in turn, increases the number of usable GWAS summary statistics files, which may be invaluable for less researched health traits.

Genome-wide Association Study

Summary statistics

Statistic

USable

Identification

10.21203/rs.3.rs-1359902/v1

Cite

Citations (0)

A comparative analysis of current phasing and imputation software

PLoS ONE (2022)

Adriano De Marino Abdallah Amr Mahmoud Madhuchanda Bose Karatuğ Ozan Bircan Andrew Terpolovsky

Whole-genome data has become significantly more accessible over the last two decades. This can largely be attributed to both reduced sequencing costs and imputation models which make it possible to obtain nearly whole-genome data from less expensive genotyping methods, such as microarray chips. Although there are many different approaches to imputation, the Hidden Markov Model (HMM) remains the most widely used. In this study, we compared the latest versions of the most popular HMM-based tools for phasing and imputation: Beagle5.4, Eagle2.4.1, Shapeit4, Impute5 and Minimac4. We benchmarked them on four input datasets with three levels of chip density. We assessed each imputation software on the basis of accuracy, speed and memory usage, and showed how the choice of imputation accuracy metric can result in different interpretations. The highest average concordance rate was achieved by Beagle5.4, followed by Impute5 and Minimac4, using a reference-based approach during phasing and the highest density chip. IQS and R 2 metrics revealed that Impute5 and Minimac4 obtained better results for low frequency markers, while Beagle5.4 remained more accurate for common markers (MAF>5%). Computational load as measured by run time was lower for Beagle5.4 than Minimac4 and Impute5, while Minimac4 utilized the least memory of the imputation tools we compared. ShapeIT4, used the least memory of the phasing tools examined with genotype chip data, while Eagle2.4.1 used the least memory phasing WGS data. Finally, we determined the combination of phasing software, imputation software, and reference panel, best suited for different situations and analysis needs and created an automated pipeline that provides a way for users to create customized chips designed to optimize their imputation results.

Imputation (statistics)

10.1371/journal.pone.0260177

Cite

Citations (20)

A comparative analysis of current phasing and imputation software

bioRxiv (Cold Spring Harbor Laboratory) (2021)

Adriano De Marino Abdallah Amr Mahmoud Madhuchanda Bose Karatuğ Ozan Bircan Andrew Terpolovsky

Abstract Whole-genome data has become significantly more accessible over the last two decades. This can largely be attributed to both reduced sequencing costs and imputation models which make it possible to obtain nearly whole-genome data from less expensive genotyping methods, such as microarray chips. Although there are many different approaches to imputation, the Hidden Markov Model remains the most widely used. In this study, we compared the latest versions of the most popular Hidden Markov Model based tools for phasing and imputation: Beagle 5.2, Eagle 2.4.1, Shapeit 4, Impute 5 and Minimac 4. We benchmarked them on three input datasets with three levels of chip density. We assessed each imputation software on the basis of accuracy, speed and memory usage, and showed how the choice of imputation accuracy metric can result in different interpretations. The highest average concordance rate was achieved by Beagle 5.2, followed by Impute 5 and Minimac 4, using a reference-based approach during phasing and the highest density chip. IQS and R 2 metrics revealed that IMPUTE5 obtained better results for low frequency markers, while Beagle 5.2 remained more accurate for common markers (MAF>5%). Computational load as measured by run time was lower for Beagle 5.2 than Impute 5 and Minimac 4, while Minimac utilized the least memory of the imputation tools we compared. ShapeIT 4, used the least memory of the phasing tools examined, even with the highest density chip. Finally, we determined the combination of phasing software, imputation software, and reference panel, best suited for different situations and analysis needs and created an automated pipeline that provides a way for users to create customized chips designed to optimize their imputation results.

Imputation (statistics)

Beagle

Concordance

Phaser

10.1101/2021.11.04.467340

Cite

Citations (4)

Empowering GWAS Discovery through Enhanced Genotype Imputation

medRxiv (Cold Spring Harbor Laboratory) (2023)

Adriano De Marino Abdallah Amr Mahmoud Sandra Bohn Jon Lerga-Jaso Biljana Novković

Abstract Genotype imputation, crucial in genomics research, often faces accuracy limitations, notably for rarer variants. Leveraging data from the 1000 Genomes Project, TOPMed and UK Biobank, we demonstrate that Selphi, our novel imputation method, significantly outperforms Beagle5.4, Minimac4 and IMPUTE5 across various metrics (12.5%-26.5% as measured by error count) and allele frequencies (13.0%-27.1% for low-frequency variants).This improvement in accuracy boosts variant discovery in GWAS and improves polygenic risk scores.

Imputation (statistics)

Genome-wide Association Study

1000 Genomes Project

Minor allele frequency

10.1101/2023.12.18.23300143

Cite

Citations (0)

Retracing Human Genetic Histories and Natural Selection Using Precise Local Ancestry Inference

bioRxiv (Cold Spring Harbor Laboratory) (2023)

Jon Lerga-Jaso Biljana Novković Deepu Unnikrishnan Varuna Bamunusinghe Marcelinus R. Hatorangan

Abstract In an increasingly diverse world, including admixed individuals in genomic studies is imperative for equity and portability. A crucial first step is precise local ancestry inference (LAI). We have developed Orchestra, a LAI model with unprecedented accuracy, and trained on over 10,000 single-origin individuals from 35 worldwide populations. We employed Orchestra to delve into genetic relationships and demographic histories, with a focus on Latin Americans, a prime example of admixture, and the Ashkenazi Jewish, whose origins have long been debated. Finally, Orchestra enabled us to map signatures of selection, notably identifying trace Scandinavian ancestry in British samples and unveiling an immune-rich region linked to respiratory infections. Our work advances the field of LAI and holds promise for improvements in future applications for admixed populations. One-Sentence Summary Orchestra unveils Latino and Ashkenazi ancestral roots and a candidate Viking locus under selection in the British population

10.1101/2023.09.11.557177

Cite

Citations (1)