Data from: The architecture of an empirical genotype-phenotype map

2018 
Recent advances in high-throughput technologies are bringing the study of empirical genotype-phenotype (GP) maps to the fore. Here, we use data from protein binding microarrays to study an empirical GP map of transcription factor (TF) binding preferences. In this map, each genotype is a DNA sequence. The phenotype of this DNA sequence is its ability to bind one or more TFs. We study this GP map using genotype networks, in which nodes represent genotypes with the same phenotype, and edges connect nodes if their genotypes differ by a single small mutation. We describe the structure and arrangement of genotype networks within the space of all possible binding sites for 525 TFs from three eukaryotic species encompassing three kingdoms of life (animal, plant, and fungi). We thus provide a high-resolution depiction of the architecture of an empirical GP map. Among a number of findings, we show that these genotype networks are “small-world” and assortative, and that they ubiquitously overlap and interface with one another. We also use polymorphism data from Arabidopsis thaliana to show how genotype network structure influences the evolution of TF binding sites in vivo. We discuss our findings in the context of regulatory evolution.,The architecture of an empirical genotype-phenotype mapThis DRYAD package contains files from: Aguilar-Rodriguez, J., Peel, L., Stella, M., Wagner, A., and Payne, J. L. The architecture of an empirical genotype-phenotype map. This package contains the network files in GML format for the genotype space of transcription factor (TF) binding sites ('genotype_space.gml'), 525 genotype networks of TF binding sites, and 66 genotype networks of DNA binding domains. The genotype networks of TF binding sites are classified in three directories according to their species provenance ('Arabidopsis_thaliana', 'Mus_musculus,' and 'Neurospora_crassa'). Each network file is named with the TF name. More information about these networks can be found in Table S1. The genotype networks of DNA binding domains are within a 'domains' sub-folder that can be found inside each of the three species folders. Each file is named with the DNA binding domain class. Each network file has the following vertex attributes: - id: vertex identification number. - sequence: the nucleotide sequence of the binding site. - reversecomplement: the reverse complement of 'sequence.' Genotype network of TF binding sites have the following additional vertex attributes: - Escore: the enrichment score in protein binding microarrays of the sequence. - PartitionSBM: Information about the stochastic block model partition group where the vertex is found: '0', '1', or 'None'. 'None' is for vertices not found in the dominant genotype network. - PartitionBA: Information about the binding affinity partition group where the vertex is found: '0', '1', or 'None'. 'None' is for vertices not found in the dominant genotype network. For questions regarding these data, contact Joshua Payne at joshua.payne@env.ethz.ch or Andreas Wagner at andreas.wagner@ieu.uzh.ch.dryad.zip,
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []