Our paper titled Efficient high-dimensional variational data assimilation with machine-learned reduced-order models was just accepted for publication in Geoscientific Model Development (GMD). Thanks to Argonne National Laboratory collaborators: Romit Maulik, Vishwas Rao, Jiali Wang, Emil Constantinescu, Bethany Lusch, Prasanna Balaprakash, Ian Foster and Rao Kotamarthi

Plain-language description


A physical system, such as the atmosphere, can be characterized by our prior knowledge, in the form of a mathematical model, plus a set of observations from various sensors. Data assimilation (DA) combines our prior knowledge of the system, i.e. the mathematical model, with observations to estimate of state of the system at a desired time. In this paper, we propose a data assimilation approach that uses a machine learning emulator to replace the (usually expensive) mathematical model adopted in 4-dimensional data assimilation (also referred to as 4D-Var). Our results indicate that machine-learning-assisted data assimilation is faster than traditional model-based data-assimilation by 4 orders of magnitude, allowing computations to be performed on a workstation rather than a dedicated high-performance computer.

Abstract


Data assimilation (DA) in geophysical sciences remains the cornerstone of robust forecasts from numerical models. Indeed, DA plays a crucial role in the quality of numerical weather prediction and is a crucial building block that has allowed dramatic improvements in weather forecasting over the past few decades. DA is commonly framed in a variational setting, where one solves an optimization problem within a Bayesian formulation using raw model forecasts as a prior and observations as likelihood. This leads to a DA objective function that needs to be minimized, where the decision variables are the initial conditions specified to the model. In traditional DA, the forward model is numerically and computationally expensive. Here we replace the forward model with a low-dimensional, data-driven, and differentiable emulator. Consequently, gradients of our DA objective function with respect to the decision variables are obtained rapidly via automatic differentiation. We demonstrate our approach by performing an emulator-assisted DA forecast of geopotential height. Our results indicate that emulator-assisted DA is faster than traditional equation-based DA forecasts by 4 orders of magnitude, allowing computations to be performed on a workstation rather than a dedicated high-performance computer. In addition, we describe accuracy benefits of emulator-assisted DA when compared to simply using the emulator for forecasting (i.e., without DA). Our overall formulation is denoted AIEADA (Artificial Intelligence Emulator-Assisted Data Assimilation).