This page offers all data associated with our paper "Expanding the Atlas of Functional Missense Variants in Human Genes" (Weile et al., submitted) for download. If you're interested in using any of our data please cite our paper and feel free to contact us. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
The authors gratefully acknowledge funding by the National Institutes of Health and the National Human Genome Research Institute (NIH/NHGRI) Center of Excellence in Genomic Science (CEGS) Initiative, the Canadian Excellence Research Chair (CERC) and the Ontario Ministry of Research and Innovation (MRI) and the Canadian Institute for Advanced Research (CIFAR).
Although we now routinely sequence human genomes, we cannot currently confidently identify functional variants. Here we developed a deep mutational scanning framework that produces exhaustive maps for human missense variants by combining random codon-mutagenesis and multiplexed functional variation assays with computational imputation and regularization. We applied this framework to four proteins corresponding to six human genes: UBE2I (encoding SUMO E2 conjugase), SUMO1 (small ubiquitin-like modifier), TPK1 (thiamin pyrophosphokinase), and CALM1/2/3 (three genes encoding the protein calmodulin). The resulting maps recapitulate known protein features, and serve to confidently identify pathogenic variation. Analysis of large-scale phenotypic screens suggests that assays potentially amenable to deep mutational scanning are already available for 57% of human disease genes.
We employ a modular workflow encompassing five phases: (1) Mutagenesis; (2) Library generation; (3) Selection of functional variants; (4) Assay readout; and (5) Imputation and Regularization. Mutagenesis was performed using our custom POPCode mutagenesis protocol, which ensures even coverage across all possible amino acid changes, beyond those reachable by single nucleotide variants. Libraries were then generated by en-masse Gateway cloning into pools of barcoded complementation and Y2H vectors. Genotype and barcode identity of each clone in the library was then determined using kiloSEQ. Selection for functional variants was achieved through yeast complementation and Y2H assays, which couple overall functionality of the protein or its ability to interact with other proteins to yeast fitness, respectively. The readout of the of these pooled competitive growth assays was then obtained via two different sequencing methods: Barcode Sequencing (BarSEQ) and TileSEQ. Finally, fitness scores are calculated, brought to a common scale, missing data is imputed using machine learning and less confidently measured data points are regularized. See here for extended methodological details.
The data below is free to use under a Creative Commons Attribution-ShareAlike 4.0 International License. Software elements are free to use under the GNU Lesser General Public License 3.0.
UBE2I Complementation Timeseries BarSEQ counts
UBE2I Complementation TileSEQ counts
SUMO1 Complementation TileSEQ counts
TPK1 Complementation TileSEQ counts
CALM1/2/3 Complementation TileSEQ counts
UBE2I Y2H BarSEQ scores (per clone)
UBE2I Y2H BarSEQ scores (per mutation)
UBE2I Complementation Timeseries BarSEQ scores (per clone)
UBE2I Complementation Timeseries BarSEQ scores (per mutation)
UBE2I Complementation TileSEQ scores
SUMO1 Complementation TileSEQ scores
UBE2I Complementation TileSEQ scores re-scaled
SUMO1 Complementation TileSEQ scores re-scaled
TPK1 Complementation TileSEQ scores re-scaled
CALM1/2/3 Complementation TileSEQ scores re-scaled
UBE2I Complementation joint scores
UBE2I complementation machine learning features
UBE2I complementation scores (imputed and regularized)
UBE2I complementation scores (flipped, imputed and regularized)
SUMO1 complementation machine learning features
SUMO1 complementation scores (imputed and regularized)
SUMO1 complementation scores (flipped, imputed and regularized)
TPK1 complementation machine learning features
TPK1 complementation scores (imputed and regularized)
TPK1 complementation scores (flipped, imputed and regularized)
CALM1/2/3 complementation machine learning features
CALM1/2/3 complementation scores (imputed and regularized)
CALM1/2/3 complementation scores (flipped, imputed and regularized)
Calmodulin variant pathogenicity calls
UBE2I intragenic epistasis
UBE2I intramolecular distance matrix
UBE2I genetic interactions VS intramolecular distances