Accurate molecular property prediction is important across all fields of chemistry. Deep neural networks (DNNs) have become increasingly popular due to their ability to train automatically, avoiding the incredibly tedious process of constructing and extending traditional property estimation schemes. However, DNNs require large amounts of training data, are challenging to interpret, require large amounts of memory to load even during inference, and have severe difficulties incorporating qualitative chemical knowledge, which are often desired for molecular property prediction tasks. Here we present PySIDT (https://github.com/zadorlab/PySIDT), a software for training and running inference on Subgraph Isomorphic Decision Trees (SIDTs). SIDTs are graph-based decision trees made of nodes associated with molecular substructures. Inference is done by descending target molecular structures down the decision tree to nodes with matching subgraph isomorphic substructures and making predictions based on the final (most specific) nodes matched. SIDTs scale down well to dataset sizes much smaller than is feasible for DNNs. As trees of molecular substructures, SIDTs are inherently readable and easy to visualize, making them easy to analyze. They are also straightforward to extend and retrain, facilitate uncertainty estimation, and enable easy integration of expert knowledge. We demonstrate the SIDT approach discussing its application to a diverse range of molecular prediction tasks: rate coefficient estimation, diffusion coefficient estimation, thermochemistry estimation, transition state bond stretch prediction, pKa prediction, stability of molecular structures, stability of surface structures, and prediction of surface lateral interaction energetics. Additionally, we demonstrate the power of the SIDT algorithms in two direct learning curve vanilla comparisons with the popular DNN-based software Chemprop and the popular gradient boosted trees-based software XGBoost on enthalpy of formation and rate coefficient prediction tasks. In particular, in the enthalpy of formation case, vanilla PySIDT is able to outperform vanilla Chemprop and XGBoost across the full range of training/validation set sizes out to 11,560 data points.
Johnson, M. S., Pang, H.-W., Doner, A. C., Green, W. H., Zádor, J.: PySIDT: Subgraph Isomorphic Decision Trees for Molecular Property Prediction. J. Phys. Chem. A, 2025, 129, 44, 10228–10239 https://doi.org/10.1021/acs.jpca.5c04483
Microkinetic models (MKMs) are widely used within the computational heterogeneous catalysis community to investigate complex reaction mechanisms, to rationalize experimental trends, and to accelerate the rational design of novel catalysts. However, constructing these models requires computationally expensive and manually tedious density functional theory (DFT) calculations for identifying transition states for each elementary reaction within the MKM. To address these challenges, we demonstrate a novel protocol that uses the open-source kinetics workflow tool Pynta to automate the iterative training of a reactive machine learning potential (rMLP). Specifically, using the silver-catalyzed partial oxidation of methanol as a prototypical example, we first demonstrate our workflow by training an rMLP to accelerate the parallel calculation of DFT-quality transition states for all 53 reactions, achieving a 7× speedup compared to a DFT-only strategy. Detailed analysis of our training curriculum reveals the shortcomings of using an adaptive sampling scheme with a single rMLP model to describe all reactions within the MKM simultaneously. We show that these limitations can be overcome using a balanced “reaction class” approach that uses multiple rMLP models, each describing a single class of similar transition states. Finally, we demonstrate that our Pynta-based workflow is also compatible with large pretrained foundational models. For example, by fine-tuning a top-performing graph neural network potential trained on the OC20 dataset, we observe an impressive 20× speedup with an 89% success rate in identifying transition states. This work highlights the synergistic potential of integrating automated tools with machine learning to advance catalysis research.
Price, T., Sivakumar, S., Johnson, M. S., Zádor, J., Kulkarni, A.: An Automated Pynta-based Curriculum for ML-Accelerated Calculation of Transition States. J. Phys. Chem. C 2025, 129, 16, 7751–7761 https://doi.org/10.1021/acs.jpcc.5c00305
A proper representation of chemical kinetics is vital to understanding, modeling, and optimizing many important chemical processes. In liquid and surface phases, where diffusion is slow, the rate at which the reactants diffuse together limits the overall rate of many elementary reactions. Commonly, the textbook Smoluchowski theory is utilized to estimate effective rate coefficients in the liquid phase. On surfaces, modelers commonly resort to much more complex and expensive Kinetic Monte Carlo (KMC) simulations. Here, we extend the Smoluchowski model to allow the diffusing species to undergo chemical reactions and derive analytical formulas for the diffusion-limited rate coefficients for 3D, 2D, and 2D/3D interface cases. With these equations, we are able to demonstrate that when species react faster than they diffuse they can react orders of magnitude faster than predicted by Smoluchowski theory, through what we term “the reactive transport effect”. We validate the derived steady-state equations against particle Monte Carlo (PMC) simulations, KMC simulations, and non-steady-state solutions. Furthermore, using PMC and KMC simulations, we propose corrections that agree with all limits and the computed data for the 2D and 2D/3D interface steady-state equations, accounting for unique limitations in the associated derived equations. Additionally, we derive equations to handle couplings between diffusion-limited rate coefficients in reaction networks. We believe these equations should make it possible to run much more accurate mean-field simulations of liquids, surfaces, and liquid–surface interfaces accounting for diffusion limitations and the reactive transport effect.
Johnson, M. S., Mueller, J. N., Daniels, C., Najm, H. N., Zádor, J.: Diffusion limited kinetics in reactive systems. J. Phys. Chem. A, 2024, 128, 18, 3685–3702 https://doi.org/10.1021/acs.jpca.4c00727
We present TChem, a performance portable software toolkit to enable the analysis of complex kinetic mechanisms. The software provides tools for gas-phase and surface chemistry, thermodynamic properties, and implements formulations for several canonical reactor models. Analytical derivatives necessary to construct Jacobian matrices corresponding to all implemented functionalities are available through automatic differentiation. TChem uses the Kokkos framework to achieve portability across multiple heterogeneous computing platforms with a single version of the code. We implement a hierarchical parallelism framework to enable efficient chemical source term and thermodynamic property evaluations over the number of samples assigned to the local computing device. We analyze parallel efficiency results extracted from test cases for thermodynamic properties, source terms, and Jacobians evaluations on Intel Xeon CPUs and NVIDIA Volta GPUs.
Kim, K, Díaz-Ibarra, O. H., Najm, H. N., Zádor, J., Safta, C.: TChem: A performance portable parallel software toolkit for analyzing complex kinetic models. Computer Physics Communication, 2023 285 108628. https://doi.org/10.1016/j.cpc.2022.108628
Presentation: https://cassyni.com/events/NKR7P8BSG38tMNrUjVHPwj
The Pynta workflow code accelerates reaction discovery for heterogeneous catalysis. Pynta automates many of the usual, but tedious and error prone steps of this work. More crucially, it also features a new saddle point estimating algorithm, termed as harmonically forced saddle point (HFSP) search, which allows the fast generation of many site-specific saddle point guesses. There are filtered with a harmonic filter that aims to keep the lower energy ones only, to be submitted for a full DFT calculation.
Johnson, M. S., Gierada, M., Hermes, E. D., Bross, D. H., Sargsyan, K., Najm, H. N., Zádor, J.: Pynta - An automated workflow for calculation of surface and gas-surface kinetics. Journal of Chemical Information and Modeling, 2023. https://doi.org/10.1021/acs.jcim.3c00948
Scientific Achievement
A method to compute anharmonic adsorbate partition functions via phase space integration has been extended to include frustrated rotation. It has been applied to CHOH on Cu(111) and CO on Pt(111). The results, which agree with the quantum state counting benchmark, indicate that anharmonicity contributes significantly to the thermodynamic properties of both systems, although more so for the weakly bound CHOH-Cu(111). Obtaining more accurate thermophysical properties favorably impacts microkinetic mechanisms, as surface coverages depend heavily on these estimates.
Significance and Impact
The presented method provides accurate free energies of adsorbates, which improve microkinetic mechanisms and subsequent reactor models. With the increasing availability and progression of supercomputing resources, advanced methods beyond the standard analytical approximations become relevant and useful tools to considerably improve chemical process predictions.
Research Details
We produced training data for the potential energy surfaces by performing FT calculations on the ANL Theta system.
Minima-preserving neural network (MP-NN) surrogates are constructed and used within the integration routine.
The method has been implemented in our open-source package, ADTHERM.
Katrin Blöndal, Khachik Sargsyan, David H. Bross, Branko Ruscic and
C. Franklin Goldsmith, ACS Catal. 13, 19-32 (2023) https://doi.org/10.1021/acscatal.2c04246
Scientific Achievement
Optimization of molecular geometries is best done in internal coordinates: bonds lengths, bond angles, and dihedral angles. However, these coordinates are coupled, creating complicated constraints and consequently loss of efficiency for widely used optimization algorithms. Our new optimization strategy interprets the space of all possible molecular geometries as a manifold, and molecular geometry displacements are taken along the geodesics on these manifold, inherently satisfying the necessary constraints. Compared to the traditional approach, this substantially reduces the number of steps required to reach convergence on a molecular geometry optimization benchmark
Significance and Impact
Geometry optimization is a key aspect in any computational study of molecules. This step may be ratelimiting in automatic potential energy surface exploration frameworks suitable for use on exascale computational resources. Our geodesic optimization method dramatically speeds up optimization of molecules, thereby increasing the throughput in such applications.
Research Details
Our method has been implemented in our open-source geometry optimization code, Sella.
The method is a drop-in replacement for existing optimization codes using redundant internal coordinates.
It can also be used in saddle point optimization for reaction network exploration.
Eric D. Hermes, Khachik Sargsyan, Habib N. Najm, and Judit Zádor, The Journal of Chemical Physics, 2021, 155, 094105. https://doi.org/10.1063/5.0060146
Scientific Achievement
We found that contributions of translational anharmonicity are significant for H adsorbed on a Cu(111 ) surface, by computing thermodynamic properties via a phase space integration (PSI) approach. The approach is in excellent agreement with a quantum state counting benchmark over the temperature range of interest, especial ly compared to the most commonly used models, the harmonic oscillator (HO) and the free translator (FT). Obtaining more accurate adsorbate thermophysical properties favorably impacts microk inetic mechanisms for heterogeneous catalysis, as surface coverages depend heavily on these estimates.
Significance and Impact
The method is to be a part of the Sandia National Laboratories’ ECC computational framework for automated chemistry, an effort to provide accurate microkinetic mechanisms. As we transition into the exascale, accurate methods that require computational power become relevant and useful tools to considerably improve chemical process predictions.
Research Details
We produced training data for the potential energy surface by performing DFT calculations on the ANL Theta system.
A minima-preserving neural network (MP-NN) is constructed and used within the PSI routine.
Our method has been implemented in our open-source phase space integration code, AdTherm.
Katrín Blöndal, Khachik Sargsyan, David H. Bross, Branko Ruscic and C. Franklin Goldsmith, The Journal of Physical Chemistry C, doi: 10.1021/acs.jpcc.1c04009
Scientific Achievement
A new method has been developed for locating saddle points on potential energy surfaces in atomistic simulations using iterative Hessian diagonalization. Our method scales better with respect to system size and converges more reliably than traditional approaches. The method is implemented in a new open-source software package, Sella, which provides a convenient way to use our method in combination with more than 40 electronic structure packages for molecules, solids and other atomic systems.
Significance and Impact
Automated workflows to explore reactive potential energy landscapes combined with current peta- and upcoming exascale computing resources promise an unprecedented acceleration of the rate at which we can understand complex chemical phenomena at the atomic scale. Robust and efficient optimizers are the linchpins of these frameworks. Especially difficult in this context is to have reliable search methods for first order saddle points, these lowest energy passage points between the intermediates of a reaction networks, which ultimately determine the importance and rate of the various chemical processes.
Research Details
We use iterative diagonalization to update an approximate Hessian. Efficiency is gained by using it both to progress towards the saddle point and as a preconditioner for subsequent Hessians. Balancing the cost of diagonalization and the desired accuracy to make the right geometry steps results in a substantial reduction in the total cost of optimization compared to other low-scaling algorithms. Sella is being incorporated into automated potential energy landscape exploration codes, including KinBot (for gas-phase molecules) and the newly-developed pynta (for heterogeneous catalysis).
Eric D. Hermes, Khachik Sargsyan, Habib N. Najm, and Judit Zádor, Journal of Chemical Theory and Computation, 2019, 15, 6536-6549.
Scientific Achievement
A sophisticated microkinetic model of the CO2 methanation for a Ni(111) facet was automatically compiled using the Reaction Mechanism Generator (RMG). 5000 mechanisms were generated in a probabilistic approach by perturbing all energetic parameters in a correlated manner over their entire uncertainty range to explore all the possible chemistry. Propagation of this uncertainty through microkinetic modeling reveals a broad spectrum of possible activity of the Ni(111) facet for the CO2 methanation, with some microkinetic models in quantitative agreement with experimental results.
Significance and Impact
The study highlights the effect of DFT-based uncertainties on the construction of the mechanism. Further, we demonstrate that with the current error bars of ab-initio heats of formation or activation barriers, it is challenging to make conclusive statements about the activity of a metal facet or the rate-determining step. However, the study reveals that only a few adsorbates and elementary steps govern the activity in the uncertainty range. Computationally expensive higher-level theory methods can be employed to determine their energetic parameters more accurately. The procedure provides a universal framework for investigating other heterogeneously catalyzed reactions.
Research Details
Mechanism Generation for CO2 methanation on Ni(111) with RMG using linear-scaling and Brønsted-Evans-Polanyi relations
Mean-field microkinetic modeling and comparison to temperature scanning methanation experiments from a Ni/SiO2 catalyst
Degree of rate control and global uncertainty analysis reveal important adsorbates and elementary steps
Kreitz, B.; Sargsyan, K.; Blöndal, K.; Mazeau, E. J.; West, R. H.; Wehinger, G. D.; Turek, T.; Goldsmith, C. F., Quantifying the Impact of Parametric Uncertainty on Automatic Mechanism Generation for CO2 Hydrogenation on Ni(111). JACS Au 2021, 1, 1656-1673., https://doi.org/10.1021/jacsau.1c00276
Scientific Achievement
The computational singular perturbation (CSP) method has been applied in gas phase chemical kinetic modeling, as well as biochemical modeling, but not hitherto in heterogeneous catalysis. It has been used for diagnosis of dynamical system response, time integration of stiff ordinary differential equations (ODEs), and for providing means for chemical model reduction. In this work, we extend its formulation to handle differential-algebraic equation (DAE) systems that are typically used in heterogeneous catalysis models. Using this new formulation, we demonstrated dynamical analysis of a catalytic system, highlighting important fast/slow processes, and identifying cause-and-effect relationships. We also used the DAE analysis results to examine the quality of different quasi steady state approximations, identifying choices resulting in minimal errors relative to the ODE formulation.
Significance and Impact
CSP analysis is useful for dynamical systems that exhibit stiffness resulting from a wide range of time scales, as is the case e.g. in heterogeneous catalysis. Its main strategy is to decouple fast and slow dynamics through an alternative representation of the original system of equations, thereby providing a framework for dynamical analysis and reduction. Its application in catalytic DAE systems enables improved precision in the analysis of system dynamics, which facilitates diagnostic studies of chemical kinetic model behavior, assessment of the quality of DAE approximations, extraction of understanding of system response, and chemical model reduction in catalytic systems.
Research Details
CSP analysis for DAE/ODE of catalysis systems has been implemented in open-source code, CSPlib: https://github.com/sandialabs/CSPlib
CSP computations can be executed on modern computing platforms, i.e, GPUs and many-core CPUs
Oscar Díaz-Ibarra, Kyungjoo Kim, Cosmin Safta, Judit Zádor & Habib N. Najm (2021) Using computational singular perturbation as a diagnostic tool in ODE and DAE systems: A case study in heterogeneous catalysis, Combustion Theory and Modelling, https://doi.org/10.1080/13647830.2021.2002417