CryoDRGN-AI: Neural ab initio reconstruction of challenging cryo-EM and cryo-ET datasets

1: Department of Electrical Engineering, Stanford University, Stanford, CA, USA
2: SLAC National Accelerator Laboratory, Menlo Park, CA, USA
3: Department of Computer Science, Princeton University, Princeton, NJ, USA
4: Department of Physiology and Cellular Biophysics, Columbia University, New York, NY, USA
5: Department of Anesthesiology, Columbia University Irving Medical Center, New York, NY, USA
6: Irving Institute for Clinical and Translational Research, Columbia University, New York, NY, USA
7: Structural Biology Initiative, CUNY Advanced Science Research Center, New York, NY, USA
Code bioRxiv User Guide
teaser

CryoDRGN-AI is a deep learning system for ab initio reconstruction of dynamic biomolecular complexes from cryo-EM and cryo-ET imaging datasets.

Abstract

Proteins and other biomolecules form dynamic macromolecular machines that are tightly orchestrated to move, bind, and perform chemistry. Cryo-electron microscopy (cryo-EM) and cryo-electron tomography (cryo-ET) can access the intrinsic heterogeneity of these complexes and are therefore key tools for understanding their function. However, 3D reconstruction of the collected imaging data presents a challenging computational problem, especially without any starting information, a setting termed ab initio reconstruction. Here, we introduce cryoDRGN-AI, a method leveraging an expressive neural representation and combining an exhaustive search strategy with gradient-based optimization to process challenging heterogeneous datasets. Using cryoDRGN-AI, we reveal new conformational states in large datasets, reconstruct previously unresolved motions from unfiltered datasets, and demonstrate ab initio reconstruction of biomolecular complexes from in situ data. With this expressive and scalable model for structure determination, we hope to unlock the full potential of cryo-EM and cryo-ET as a high-throughput tool for structural biology and discovery.

Method

method

Architecture overview. CryoDRGN-AI uses an expressive neural network model for ab initio heterogeneous reconstruction. For each observed image, a reconstructed image is generated by a differentiable forward model. The latent variables (poses and conformations) are stored in look-up tables. The whole pipeline can therefore be seen as an autodecoder. Poses are first estimated using hierarchical pose search (HPS) and then refined by stochastic gradient descent (SGD).
pose error

Two-step strategy for pose estimation. We plot the pose error during training on a synthetic 80S ribosome dataset (100,000 particles, 128 × 128, 3.77 Å/pix). Pose error is quantified by the mean out-of-plane angular error (mean and standard deviation over 5 runs). HPS is used during the first five epochs, to provide a good initialization for SGD. Gradient descent is 20x faster and leads to more accurate poses since it is not limited by the resolution of the search grid.

Results

Pre-catalytic Spliceosome (EMPIAR-10180 [1])

spliceosome

CryoDRGN-AI reconstructs the continuous bending motion of the pre-catalytic spliceosome while clustering a set of particle images corresponding to broken or denatured density maps in the latent space. A histogram of the two first principal components of the predicted conformations is shown. . Density maps 1-3 show a continuous bending motion of the molecule. An outlier cluster resembling broken or junk particles and two randomly selected particles images from that cluster. Dashed lines indicate outlines of the extended spliceosome.
[1] Plaschka, C., Lin, P. C., & Nagai, K. (2017). Structure of a pre-catalytic spliceosome. Nature, 546(7660), 617-621.

Assembling Bacterial Large Ribosomal Subunit (EMPIAR-10076 [2])

50S

CryoDRGN-AI recovers the compositional heterogeneity in the assembly states of the bacterial large ribosomal subunit, while also grouping previously unclassified and discarded particles in the latent space. Conformations are embedded in a two-dimensional space with UMAP. The distribution of the published labels is shown in this space with a Gaussian kernel density estimation (KDE) and four of the maps obtained by 𝑘-means clustering (𝑘 = 6). Dashed lines indicate outlines of the fully mature 50S ribosome.
[2] Davis, J. H., Tan, Y. Z., Carragher, B., Potter, C. S., Lyumkis, D., & Williamson, J. R. (2016). Modular assembly of the bacterial large ribosomal subunit. Cell, 167(6), 1610-1622.

SARS-CoV-2 Spike [3]

spike

CryoDRGN-AI reconstructs both the closed and open states of the spike protein fully ab initio and resolve the conformational change of the receptor binding domain. Predicted conformations are clustered with 𝑘-means and the distribution of two first principal components of each cluster is approximated with Gaussian KDE. Two sampled density maps correspond to the closed and open states of the receptor binding domain (RBD) with docked atomic models (PDB:6VXX, PDB:6VYB).
[3] Walls, A. C., Park, Y. J., Tortorici, M. A., Wall, A., McGuire, A. T., & Veesler, D. (2020). Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell, 181(2), 281-292.

DSL1/Snare Complex [4]


CryoDRGN-AI ab initio reconstruction of a highly heterogeneous dataset of the DSL1/SNARE complex. The trajectory was generated by simulating a brownian motion in the PC1/2 space, biased towards the origin with a harmonic potential.
[4] DAmico, K. A., Stanton, A. E., Shirkey, J. D., Travis, S. M., Jeffrey, P. D., & Hughson, F. M. (2024). Structure of a membrane tethering complex incorporating multiple SNAREs. Nature Structural & Molecular Biology, 1-9.

Ankyrin Supercomplex [5]

ankyrin

Heterogeneous reconstruction of the human erythrocyte ankyrin-1 complex and identification of a new "supercomplex" state. PCA on the latent space of predicted conformations and sampled density maps from their corresponding location in latent space. Conformations can be distinguished according to micelle composition. The six classes from Vallese et al. [5] (1 to 6) are recovered. CryoDRGN-AI also reveals the presence of a new "supercomplex" state (7), which simultaneously contains the rhesus heterotrimer (Rh), the aquaporin (AQP1), the band 3-I (B3-I), the band 3-II (B3-II), the band 3-III (B3-III) dimers and an unknown protein Y in the micelle. Y could correspond to the unknown protein reported in Vallese et al. [5].
[5] Vallese, F., Kim, K., Yen, L. Y., Johnston, J. D., Noble, A. J., Calì, T., & Clarke, O. B. (2022). Architecture of the human erythrocyte ankyrin-1 complex. Nature structural & molecular biology, 29(7), 706-718.

BibTeX

...