## Abstract

Surrogate models play a vital role in overcoming the computational challenge in designing and analyzing nonlinear dynamic systems, especially in the presence of uncertainty. This paper presents a comparative study of different surrogate modeling techniques for nonlinear dynamic systems. Four surrogate modeling methods, namely, Gaussian process (GP) regression, a long short-term memory (LSTM) network, a convolutional neural network (CNN) with LSTM (CNN-LSTM), and a CNN with bidirectional LSTM (CNN-BLSTM), are studied and compared. All these model types can predict the future behavior of dynamic systems over long periods based on training data from relatively short periods. The multi-dimensional inputs of surrogate models are organized in a nonlinear autoregressive exogenous model (NARX) scheme to enable recursive prediction over long periods, where current predictions replace inputs from the previous time window. Three numerical examples, including one mathematical example and two nonlinear engineering analysis models, are used to compare the performance of the four surrogate modeling techniques. The results show that the GP-NARX surrogate model tends to have more stable performance than the other three deep learning (DL)-based methods for the three particular examples studied. The tuning effort of GP-NARX is also much lower than its deep learning-based counterparts.

## 1 Introduction

Modeling of dynamic systems plays a vital role in the analysis, design, health monitoring, and control of various dynamic engineering systems, such as chemical processes [1], civil infrastructure [2], battery systems [3], and vibratory mechanical systems [4]. A dynamic system can be linear or nonlinear. The modeling of nonlinear dynamic systems, in general, is more complicated than that of linear systems.

In order to describe the complex nonlinear behaviors of dynamic systems, various approaches have been developed for different types of systems using analytical methods or sophisticated computer simulations. Analytical models obtained through theoretical modeling based on simplifications and assumptions usually have a low requirement on the computational effort. However, the prediction accuracy is usually sacrificed due to model simplifications [5]. With the development of high-performance computing and advanced computational methods, high-fidelity computer simulations are becoming more common in the design and analysis of various dynamic systems. They play an essential role in the design of reliable complex nonlinear engineering systems, such as hypersonic aircraft [6], off-road vehicles [7], and large civil infrastructure [8]. While the sophisticated computer simulation models significantly increase the prediction accuracy, the required computational effort is also drastically increased, which poses challenges to the associated analysis, design, model updating (e.g., a digital twin), and model predictive control [9].

Aims to overcome the computational challenges introduced by the sophisticated computer simulation models, surrogate models constructed using machine learning techniques are usually used to substitute the original computer simulation models or experiments. As data-driven techniques, surrogate models construct a model of the original model by establishing a relationship between inputs and outputs of the complex dynamic system based on data. It provides a promising way of efficiently predicting the nonlinear dynamic behavior of various systems without sacrificing accuracy. It also allows for forecasting the nonlinear dynamic behavior in a reasonable time horizon in the future based on a certain amount of historical data. Due to these advantages of dynamic surrogate models, their applications can be seen in not only engineering design, but also many other fields where dynamic predictive models play an essential role, including disease transmission modeling [10], medical and environmental science [11], among others.

In the past decades, various surrogate modeling techniques have been developed to model nonlinear dynamic behavior. According to the fundamental differences in the modeling forms, the current approaches can be classified into two groups, namely, *input–output* models and *state-space* models [12]. For surrogate models in input–output forms, they are constructed by directly describing the relationship between the inputs and outputs based on observation data [12]. For state-space models, the surrogate models are represented in state-space forms [12]. The state-space model-based methods can be further classified into two categories: linear state-space models and nonlinear state-space models. Various mature techniques have been developed for the learning of linear state-space models in the field of dynamic system identification [13]. For the learning of nonlinear state-space models, it usually requires a nonlinear transformation of a vector of past input/output samples to a state vector [14]. Due to the importance of state-space models in control, numerous approaches have been developed in recent years using machine learning and/or optimization-based methods to learn nonlinear state-space models. For instance, Masti and Bemporad developed an autoencoder-based approach for the learning of nonlinear state-space models [15]; Deshmukh and Allison proposed a derivative function-based surrogate modeling method to learn the nonlinear state-space models [16]; Gedon et al. proposed a deep state-space model method for learning of nonlinear temporal models in the state-space form [17], and an optimization-based approach is suggested in Ref. [18] for system identification of nonlinear state-space models. Both the input–output form and state-space form of surrogate models are widely used in the analysis and modeling of nonlinear dynamic systems. They have their own advantages and disadvantages. The input–output form of surrogate models does not require an explicit definition of a Markovian state or any information about the internal states. It is applicable to any type of dynamic system with different complexities since it directly builds a function for the inputs and outputs. The disadvantage of the input–output form of surrogate models is that the order of the model is high due to the number of lags required to capture the relationship between inputs and outputs [18]. The nonlinear state-space form of surrogate models usually has a lower order (i.e., lower number of inputs) and is more efficient than its input–output counterparts. But it typically requires the definition of internal states and the surrogate modeling methods are much more complicated to implement in practice [18].

This paper focuses on surrogate models of nonlinear dynamic systems in the *input–output* form. In the past decades, a large group of methods have been developed for the modeling of dynamic systems in this form. For instance, autoregressive integrated moving average (ARIMA) [19] has gained considerable attention due to its predictive capability for time series models. ARIMA is derived from the autoregressive models [20], moving average models (MA) [21], and the autoregressive moving average models (ARMA) [22]. In general, the main limitations of these model classes are their imposed linear regression on the past history and their deficiency in accounting for other input variables other than time. The nonlinear autoregressive model with exogenous input (NARX) models, which can overcome these limitations, have been proposed in the dynamics field based on artificial neural networks (ANN) [23] and Gaussian process (GP) regression [24]. For example, one approach is the surrogate model with nonlinear function approximation, like ANN, support vector regression [25], kernel support vector machine [26], and the other one is tree-based ensemble learning algorithm [27], such as decision tree [28] and random forest [28,29]. In recent years, ANN with the robust nonlinear function such as multi-layer perceptron network [30] has been extensively used in nonlinear dynamic predictive models for various applications. However, the lack of dependencies between inputs in the sequence processing affects the accuracy in long-term sequence prediction tasks. In contrast, GP models conquer the problems mentioned above and provide more accurate predictions in the NARX scheme. In the new era of deep learning (DL), various approaches have been proposed recently to construct surrogate models of nonlinear dynamic systems. DL surrogate models include, but are not limited to, recurrent neural network (RNN) [31], long short-term memory (LSTM) which is a type of RNN [32], convolutional neural network (CNN) [33], and hybrid structures that combine different deep neural networks (DNNs) [34].

Even though the methods as mentioned earlier have shown good performance in different applications, there is no generic surrogate modeling method that is applicable to all nonlinear dynamic systems and across different domains. Selecting an appropriate surrogate modeling method for a specific nonlinear dynamic system remains an issue that needs to be addressed. With a focus on data-driven approaches, this paper performs a comprehensive comparative study of different surrogate modeling methods under the NARX scheme to investigate the predictive capability of different methods. Four widely used approaches, namely, GP-NARX, LSTM, convolutional neural network combined with long short-term memory (CNN-LSTM), and CNN with bidirectional LSTM (CNN-BLSTM), are extensively compared using three numerical examples to investigate the advantages and disadvantages of different approaches. The three examples include a simple mathematical model, a duffing oscillator model, and a Bouc-Wen model. It is expected that the finding from this research will provide guidance and reference for the selection of surrogate models to reduce the computational effort in building predictive models for the design and analysis of nonlinear dynamic systems.

The remainder of this paper is organized as follows. Section 2 presents a generalized description of nonlinear dynamic models. Section 3 introduces surrogate models studied in this paper. Section 4 presents the comparative studies of different surrogate models. Finally, Sec. 5 summarizes the results and draws the conclusions.

## 2 Background

This section first presents the generalized dynamic predictive models in the NARX form. Then, various forecasting strategies, including one-step or multi-step methods, are briefly discussed.

### 2.1 Nonlinear Dynamic Predictive Model.

*y*

_{i}is the output at time-step

*t*

_{i},

*f*(·) is a nonlinear mapping from the regression vector $xi$ to the output space, $\theta $ is an

*s*-dimensional parameter vector included in

*f*(·), the noise

*ν*

_{i}is used to account for the fact that the output

*y*

_{i}will not in practice be a deterministic function of past data. The regression vector $xi$ may be rewritten as [24]

*t*

_{i−1},

*t*

_{i−2}, …,

*t*

_{i−p},

*u*

_{i−k},

*k*= 1, 2,…

*q*are the delayed samples of the measured input signal at time-steps

*t*

_{i−1},

*t*

_{i−2}, …,

*t*

_{i−q},

*u*

_{i}is the measured input value at the current time-step,

*φ*(·) is a nonlinear mapping from the measured input and output values to the regression vector, and

*p*and

*q*are the number of lags in the inputs and outputs, respectively.

According to the choice of regressors, current nonlinear dynamic predictive models in Eq. (1) can be classified into six main categories: (1) nonlinear finite impulse response models, which use only delayed input values *u*_{i−k} as regressors and are always stable due to the deterministic input values [35]; (2) NARX models, which use both delayed output signals *y*_{i−j} and input signals *u*_{i−k} as regressors and are also called series-parallel models [36]; (3) nonlinear output error models, which use the estimation $y^i\u2212j$ of output value *y*_{i−j} and input signals *u*_{i−k} as regressors [37]; (4) nonlinear autoregressive and moving average model with exogenous input (NARMAX) models, which use *y*_{i−j}, *u*_{i−k}, and prediction error as regressors [38]; (5) nonlinear Box-Jenkins models [39]; and (6) nonlinear state-space models [18].

### 2.2 Predictive Model Forecast Strategies.

In order to obtain the accurate prediction of future of nonlinear dynamic behavior, various strategies have been developed to build dynamic predictive models. According to the number of future prediction time-step(s), the current strategies can be classified into two groups: one-step approach and multi-step approach. The one-step approach predicts only the next time-step of a time series, while the multi-step approach may predict any number of forward time-steps. In general, the multi-step prediction approach is more likely to be adopted due to its capability of providing longer-term predictions, especially for reliability analysis/prognostics purposes.

The traditional direct multi-step forecasting is usually replaced by recursive strategies for multi-step forecasting to improve the prediction accuracy over a long-time horizon. The forecasting models with recursive strategy, in general, are better than the direct multi-step forecasting in terms of prediction accuracy and stability [40]. However, to overcome the limitation of error accumulation, some new techniques have been developed by integrating different types of dynamic models, such as multiple output strategies for multi-step forecasting [41] and the direct-recursive hybrid methods [42]. In what follows, we briefly review four classical multi-step time series forecasting methods.

#### 2.2.1 Direct Multi-Step Forecast Strategy.

*t*

_{i}, $fh(\u22c5),h=1,2,\cdots ,H$ is the

*h*th single-output model from a total of

*H*number of models, $yi\u2212j,j=1,2,\cdots ,p$ are the measured output signals at time-steps $ti\u2212j,j=1,2,\cdots ,p$, and

*p*is the number of time lags, as mentioned above. Through Eq. (3), predictions of maximally

*H*time-steps are obtained by combining predictions from each single-output model. For example, to obtain the prediction of the following two steps

*t*

_{i+1}and

*t*

_{i+2}, we may develop one model for forecasting at

*t*

_{i+1}and a separate model for forecasting at

*t*

_{i+2}as

The drawback of the direct multi-step forecast strategy in Eq. (3) is that the *H* independent models result in predictions $y^i+1,y^i+2,\cdots ,y^i+H$ without statistical dependencies. It is more complicated than the recursive strategy and requires more computational time because of the required number of predictive models.

#### 2.2.2 Recursive Multi-Step Forecast Strategy.

*t*

_{i+1}and

*t*

_{i+2}, the recursive multi-step strategy develops a one-step forecasting model

*f*(·) at time-step

*t*

_{i+1}and then uses it iteratively by adding the $y^i+1$ into the input vector for the next forecast at

*t*

_{i+2}.

The recursive multi-step forecast strategy eliminates the limitations of the direct multi-step forecast method. However, the estimation error will accumulate over time since the predicted values are used as inputs instead of the actual ones. The error accumulation would generally degrade the accuracy and amplify the uncertainty with the size of the prediction time horizon.

#### 2.2.3 Direct-Recursive Hybrid Strategies.

*t*

_{i−j}, $y^i+1$ and $y^i+2$ are the predictions at time-step

*t*

_{i+1}and

*t*

_{i+2},

*f*

_{1}(·) and

*f*

_{2}(·) are separate predictive models.

#### 2.2.4 Multiple Output Strategy.

*r*is the total number of predicted time-steps. Practical applications show that multi-output models are complex and do not have enough flexibility since the stochastic dependency behavior must be learned.

## 3 Surrogate Modeling for Nonlinear Dynamic Systems

This section first discusses the generation of training data for surrogate modeling of nonlinear dynamic systems. Following that, we briefly review four surrogate modeling techniques for nonlinear dynamic systems, including GP-NARX, LSTM, CNN-LSTM, and CNN-BLSTM.

### 3.1 Training Data Generation for Surrogate Modeling of Nonlinear Dynamic Systems.

After considering the benefits of different forecasting strategies and nonlinear dynamic structures, this paper investigates dynamic surrogate modeling for long-term prediction using recursive multi-step forecasting strategy based on short-term training data. The dynamic output is characterized by the controllable inputs and multi-step outputs of previous time-steps and iterated by the single-period prediction result. All data in this paper are collected synthetically through simulations for verification and validation purpose. This paper uses the uniform time-steps in the predictive model and controllable inputs. However, the time-step sizes are different for different case studies.

Figure 1 shows an overview of surrogate modeling for nonlinear dynamic systems. As shown in this figure, there are five main steps, namely: (a) generation of input training data, which generates training data for dynamic system model parameters and excitations; (b) training data collection, which obtains data of output based on simulations of dynamic systems; (c) data preprocessing, which processes the data into the format that is needed for the training of various surrogate models; (d) surrogate modeling training using the processed data; and (e) prediction and validation, which uses the trained surrogate model for prediction under new conditions and excitations. Next, we briefly explain the first three steps, and following that, we will discuss the training of various surrogate models in Secs. 3.2–3.5.

As shown in Fig. 1, training data generation is essential in training surrogate models for nonlinear dynamic systems. This paper first generates $\u03d1$ training samples for model parameters $\theta $ using Latin Hypercube sampling [44]. Let the generated training samples be $\theta i,i=1,2,\cdots ,\u03d1$, where $\theta i$ represents the *i*th training sample. Then, for each sample $\theta i$, as shown in Fig. 1(b), we obtain a time series response *y*_{i,j}, *j* = 1, 2, · · ·, *N*_{i} with controllable time series inputs *u*_{i,j}, *j* = 1, 2, · · ·, *N*_{i}, in which *N*_{i} is the length of the controllable input excitations of the *i*th training sample.

in which $NT=\u2211i=1\u03d1(Ni\u2212p)$ is the total number of training points and *s* is the number of dimensions of $\theta $.

The above-generated training data are then used to train various surrogate models for the nonlinear dynamic system in Secs. 3.2–3.5. Figure 2 presents a single-step univariate dynamic predictive model schematic after the surrogate modeling (i.e., Fig. 1(e)). As indicated in this figure, the emulator predicts the future values; meanwhile, observed values were replaced by forecasting from the last period. Thus, for example, $y^i+2$ is the predicted output in the second iteration $(y^i+2=f(ui+2,ui+1,\cdots ,ui\u2212q+3,y^i+1,yi,\cdots ,yi\u2212p+2)+$$vi+2)$, and the first iteration output $y^i+1$ is in place of *y*_{i+1}. All the controllable inputs switch from the actual values to the estimated ones with the incremental recursive scheme. Meanwhile, predictions are iterated in the recursive procedure with input variables to address error accumulation to a certain extent.

Next, we will briefly review the four types of surrogate modeling techniques studied in this comparative paper, including GP-NARX, LSTM, CNN-LSTM, and CNN-BLSTM.

### 3.2 GP-NARX

#### 3.2.1 Brief Review of GP-NARX.

*σ*

_{v},

*p*and

*q*are, respectively, the lags in the output and input values.

Since the nonlinear mapping function *f*(·) is usually a simulation model that requires substantial computational effort to evaluate, the direct application of the NARX model to dynamic system analysis, design, and control is computationally prohibitive. To address the computational challenge, *f*(·) can be substituted with a computationally “cheaper” surrogate model. In the GP-NARX surrogate model, the probabilistic and nonparametric Gaussian process (GP) model is used to approximate the nonlinear mapping function *f*(·).

*i*th input vector at the certain discrete time-step of

*N*

_{T}discrete time-steps, we can have the prior GP for function $f(Z)$ as

*ν*

_{i}, $INT$ is an

*N*

_{T}×

*N*

_{T}unit matrix. We can generally have $mf\u22610$, especially when we have no prior information. After observing the measured output data of $Z$, i.e., $D={(Z,y)}$, where $y=[y1,\cdots ,yi,\cdots ,yNT]T$, the posterior distribution of function $f(z)$ given measured data $D$ and hyperparameters Θ is given by

#### 3.2.2 Subsampling for GP-NARX Construction With a Large Volume of Data.

*l*

_{2}-norm of a vector. Thus, this algorithm allows us to evenly fill the design domain with a small number of representative points of the original training data.

*N*

_{S}samples from the

*N*

_{T}available samples given in Eq. (8), we first randomly select

*N*

_{in}samples and denote the selected

*N*

_{in}samples as $zt=[z1t,z2t,\cdots ,zNint]$, where $zjt$ is the

*j*th selected point. To sub-select the other

*N*

_{S}−

*N*

_{in}samples from the

*N*

_{T}−

*N*

_{in}available samples, we first compute the minimum distances between samples in $zt$ and samples in $Z$ (i.e., Eq. (8)) as follows:

*k*th sample in $Z$.

### 3.3 Long Short-Term Memory.

Recently, DNNs, such as LSTM and CNN, have emerged as new modeling methods of temporal or sequential characteristics. It has been widely applied in various domains, such as product searching [48], advertising [49], image recognition [50], etc.

Even though LSTM, CNN-LSTM, and CNN-BLSTM belong to different types of RNNs, they resemble a mapping feedforward neural network using similar algorithms. For different databases, the models have different hyperparameters and weights in different layers. This paper uses a many-to-one RNN for prediction. Figure 3 below shows an example of a many-to-one RNN with one hidden layer. In this figure, *a* = *σ*(*wx* + *b*), where *w* and *b* are weights, *σ*(·) is the activation function (i.e., sigmoid function in this example). $[x1,x2,\u2026,xNT]$ are inputs to the input layer and $y^$ is the output of the output layer. As mentioned above, this paper will investigate various methods to construct accurate surrogate models for nonlinear dynamic systems. In the following, we will briefly introduce the LSTM method, which is one of the commonly used variants of RNN.

LSTM, as a variant of RNN, is widely used for solving classification and regression problems, such as PM2.5 prediction [51] and traffic flow prediction [52]. It not only performs well in capturing short-term data dependencies for prediction but also adequately account for the explicit dependences of multiple outputs over the long-time horizon. Comparing with the traditional RNN, LSTM overcomes the shortcoming of insufficient long-term dependence in RNN due to the exploding gradient resulting from gradient propagation. Figure 4 shows the structure of an LSTM cell. As shown in this figure, in each LSTM cell, there are an input gate, a forget gate, and an output gate. The forget gate determines the information whether it should be got rid of; the input gate determines the new information, which consists of actual observed values after iterations by a dynamic model; and the output gate decides the values to the next neuron will be computed using the activation function.

*t*(

*t*= 1, 2, …,

*n*, where

*n*is the number of time-steps) and network layer

*L*(

*L*= 1, 2, …,

*N*

_{L}, where

*N*

_{L}is the number of LSTM layers). Let us denote, at time-step

*t*

_{i}, the input gate, forget gate, and output gate as

*i*

_{i},

*f*

_{i}, and

*o*

_{i}, respectively,

*h*

_{i}as a hidden layer output and

*c*

_{i}as the memory cell state. The relationships among the above variables are given as follows:

*W*with two subtitles matrixes are weights between two gates,

*W*

_{αβ}is the weight with different gates, $b\beta $ is the bias vector, where

*α*= {

*x*,

*h*} and

*β*= {

*f*,

*i*,

*c*,

*o*}. For example, $Whf(l)$ represents the

*l*th layer of LSTM's weight matrices equaling to the input vector

*h*

_{i}within the forget gate

*f*

_{i}; $bf(l)$ denotes the

*l*th layer of LSTM's bias vector corresponding to the forget gate.

For the first step, in Eq. (20), the input data ${zil,hi\u22121l}$ goes through the forget gate (σ sigmoid function), and it discharges a vector with the value of 0 and 1 to determine the latter information. It forgets the value $ci\u22121l$ when the vector value is 0, while passing $ci\u22121l$ value when the vector value is 1. Then the next step determines what information can be added to $c^i$ by using the hyperbolic tangent function in Eq. (22). Afterward, in Eq. (23) the elementwise Hadamard product function combines $c^i$ and *i*_{i} to update the new memory state outputs $cil$. Lastly, this single LSTM cell’s outputs $hil$ can be obtained through Eq. (24) using the Hadamard product function with updated memory outputs $cil$ (scaled within value between −1 and 1) and output gate results $oil$.

Figure 5 depicts a three-layer LSTM architecture. In the three layers of Fig. 5, the input states are denoted as $z=[x1,x2,\u2026,xj,\u2026,xNT]$, and the output is $y^$, where *m* is the total number of input or output datasets. *p*_{1}, *p*_{2}, and *p*_{3}, are, respectively, the neural number of layers one, two, and three. In each LSTM layer, LSTM cell output ${cil,hil}$ passes through *p*_{l} nodes according to the LSTM cell algorithms, and output $yil$ equals the *l*th hidden state response $hil$, which is transferred to the next LSTM layer for the input state. Finally, the fully connected layer connects the LSTM and output layers to obtain output features.

### 3.4 Convolutional Neural Network Combined With Long Short-Term Memory.

CNN-LSTM is a hybrid model that combines two different deep learning models, which integrates large data features and even spatial data structures with more than two dimensions. CNN is designed to perform sequential predictions using inputs like pictures and videos which cannot directly be modeled by the standard LSTM [53]. In CNN-LSTM, the jointly constructed model by CNN and LSTM works in three steps to perform dynamic surrogate modeling. For CNN, it can extract features of dynamic models. Then features collected by CNN are transferred to the following layers, such as LSTM layers or deeper LSTM network for bidirectional feature learning (BLSTM) in the forward and backward directions. Then the dense layer is used to represent the features and transfer them to the prediction layer.

Unlike the standard RNN, CNNs are constructed with neurons like LSTM neurons that can capture the features through learnable biases and weights. As shown in Fig. 7, the difference between CNN with traditional ANN is that the input of CNN can be three dimensions: width, height, and depth, which are widely used when images are inputs. In this paper, CNN transfers the inputs from one dimension into two or three dimensions to get a higher prediction accuracy. Moreover, the final output layer of CNN will reduce the multiple vectors into a single vector, all the temporal features of the dynamic models can be preserved for the following neural networks (i.e., LSTM).

*W*

_{1},

*W*

_{2},…,

*W*

_{O}} are implemented in convolution operations. The process of convolution learning is written as

*WL*is the feature map setting in the convolutional layer, $\varpi $ is the activation function (ReLU function with max (0,

*x*)),

*b*

_{c}is the bias of the convolutional layer, and * represents the convolutional operation.

Then all the collected convolutional features maps *WL* ∈ ℝ^{1×κ} (through *κ* time-steps) are transferred into LSTM layers, which has been discussed in Sec. 3.3. As shown in Fig. 8, a CNN model is adopted for feature extraction in the first half part, while an LSTM model is adopted for prediction in the other half part with the extracted dataset from the CNN layers as inputs. Through this hybrid structure, the prediction for the current step will be added into the observed values as part of the following control input according to the recursive prediction scheme discussed in Sec. 2 to perform long-term prediction for the future time-steps.

### 3.5 CNN-Bidirectional Long Short-Term Memory.

Bidirectional long short-term memory (BLSTM) is an extension of traditional LSTM. Instead of having only one forward direction like LSTM, it establishes two training directions, as shown in Fig. 9, where one is in the input order, and the other one is in a reversed order of the initial one.

*l*th bidirectional response at the time-step

*t*, $h\u2192$ is the forward direction of bidirectional layer, $h\u2190$ is the backward direction of bidirectional layer, $\iota $ is the LSTM cell (from Eqs. (16)–(21)), and

*ρ*is the function of combined forward $(h\u2192)$ and backward $(h\u2190)$ direction calculation sequences.

The surrogate model constructed using CNN-BLSTM is similar to that from CNN-LSTM.

### 3.6 Summary of Models and Implementation Procedure.

In this paper, all the above four models are implemented in python environment. For GP-NARX, the model is developed based on *Scikit-learn* package. For LSTM, CNN-LSTM, and CNN-BLSTM, the models are implemented using *Tensorflow* and *Keras* packages. In the spirit of the “Occam’s Razor” principle, when presented with competing models that perform with similar predictive power, the “optimal” model should be the one with the least complexity, by some measure, which we interpret to mean tunable parameters for the purpose of this study. To this end, we summarize the tunable parameters, advantages, and disadvantages of GP-NARX and deep learning-based models in Table 1 for the selection of models.

Model | Tunable parameters | Interpretability | Dataset size |
---|---|---|---|

GP-NARX | Covariance function, likelihood covariance noise, and number of lags | Easy to interpret | Computational complexity is O(n^{3}) |

LSTM, CNN-LSTM, CNN-BLSTM | Number of layers, number of neurons, batch size, number of epochs, dropout ratio, activation function, weight decay ratio, and number of lags | Ongoing research topic to develop interpretable models | Can handle “big data”/very large datasets |

Model | Tunable parameters | Interpretability | Dataset size |
---|---|---|---|

GP-NARX | Covariance function, likelihood covariance noise, and number of lags | Easy to interpret | Computational complexity is O(n^{3}) |

LSTM, CNN-LSTM, CNN-BLSTM | Number of layers, number of neurons, batch size, number of epochs, dropout ratio, activation function, weight decay ratio, and number of lags | Ongoing research topic to develop interpretable models | Can handle “big data”/very large datasets |

Next, we will use three numerical examples to perform a comprehensive comparative study of the aforementioned surrogate modeling techniques.

## 4 Comparative Studies

Three examples are used in this section to compare the performance of different surrogate modeling methods. The first one is a mathematical example with a two-step lag input and a one-step-ahead prediction. The second one is a duffing oscillator model. The last one is the Bouc-Wen nonlinear dynamic model. The exemplar complexity increases from Example 1 to Example 3.

*N*

_{d}is the total number of test data,

*y*

_{i}and $y^i,\u2200i=1,\cdots ,Nd,$ are, respectively, the observed and predicted values.

In addition, the deep learning models (i.e., LSTM, CNN-LSTM, and CNN-BLSTM) in all three examples are tuned to achieve the best performance. The dropout rate is tuned in the range of [0.001, 0.1] by comparing the performance of four different dropout rates, 0.001, 0.005, 0.01, and 0.1. Four different batch sizes (16, 32, 64, and 128) are also compared to determine the best batch size for different examples. The performances of LSTM models with a different number of layers (2, 3, 4, and 5) are also compared in each example to select the best number of layers. The learning rate is tuned using 0.0001, 0.001, 0.01, and 0.1. The number of neurons is tuned with 30, 40, 60, 80, and 100 to select the optimal number of neurons. For the CNN model, two layers are used for convolution and pooling. The pooling size for the CNN layer is selected as 2 after comparing the performance of three different sizes, namely, 2, 3, and 6. Even though the hyperparameters of the deep learning models might be different for different problems, the optimal hyperparameters for the three studied problems turn out to be the same, which are dropout rate—0.01, batch size—32, number of hidden layers—3, learning rate—0.001, and number of neurons—80. The number of epochs is also tuned for each example to ensure the convergence of the loss functions. The required tunning effort for the deep learning models in general is much higher than that of GP-NARX.

### 4.1 Example 1: A Mathematical Example.

*v*

_{i}is noise and follows the normal distribution $vi\u223cN(0,\sigma v2)$, $\sigma v=0.1$, and

*y*

_{i}is the response at the

*i*th time-step.

As shown in Eq. (28), it is a two-step lag nonlinear dynamic model. Since the input only includes outputs from previous time-steps, it can be considered as a one-dimensional NARX model. In order to compare the performance of the four different surrogate modeling methods, the predictive model given in Eq. (28) is assumed to be unknown. Therefore, surrogate models are constructed using training data generated using the predictive model. In this case study, five trajectories of time series data are generated using Eq. (28) as the training data. The first two time-step responses of the five time series are, respectively, [0.6, 1.2], [1.2, −0.6], [0, 0], [−1.2, −1.2], and [−0.6, 0.6]. The time series lengths are 20, 50, 50, 30, and 50, respectively. Through the five training time series, 195 training samples are obtained. Figure 11 presents the five time series training data.

Based on the training data given in Fig. 11, four surrogate models are constructed using GP-NARX, LSTM, CNN-LSTM, and CNN-BLSTM, respectively, following the methods discussed in Sec. 3. Two scenarios, namely, *with subsampling* and *without subsampling*, are considered to compare the performance of different surrogate models. For surrogate modeling with subsampling, 150 training points are sub-selected from the available data. Since the total number of training data is small, it allows us to train the GP-NARX model using all the training data for the scenario without subsampling. In order to quantitatively quantify the accuracy of different surrogate models, Table 2 gives the MSE comparison of the four surrogate models with subsampling for 14 testing cases. Following that, Table 3 lists the MSE comparison for the scenario of surrogate modeling without subsampling.

Case | Data dimension | GP-NARX | LSTM | CNN-LSTM | CNN-BLSTM | BEST |
---|---|---|---|---|---|---|

1 | (40,2) | 0.15142 | 0.02912 | 0.02915 | 0.02147 | 0.02147 |

2 | (70,2) | 0.00620 | 0.01923 | 0.01288 | 0.01113 | 0.00620 |

3 | (66,2) | 0.02138 | 0.01661 | 0.01479 | 0.01278 | 0.01278 |

4 | (80,2) | 0.09544 | 0.01809 | 0.01409 | 0.01303 | 0.01303 |

5 | (30,2) | 0.01953 | 0.02891 | 0.01390 | 0.01288 | 0.01288 |

6 | (190,2) | 0.00344 | 0.01414 | 0.01486 | 0.01359 | 0.00344 |

7 | (150,2) | 0.01348 | 0.00885 | 0.01007 | 0.00923 | 0.00885 |

8 | (200,2) | 0.01510 | 0.01208 | 0.01358 | 0.01309 | 0.01208 |

9 | (150,2) | 0.00694 | 0.01402 | 0.01628 | 0.01513 | 0.00694 |

10 | (300,2) | 0.00321 | 0.01367 | 0.01317 | 0.01270 | 0.00321 |

11 | (50,2) | 0.01329 | 0.01227 | 0.01510 | 0.01516 | 0.01227 |

12 | (30,2) | 0.00588 | 0.01020 | 0.01003 | 0.00781 | 0.00588 |

13 | (45,2) | 0.24415 | 0.04063 | 0.03120 | 0.01823 | 0.01823 |

14 | (60,2) | 0.00668 | 0.01395 | 0.01108 | 0.01012 | 0.00668 |

Average | (104,2) | 0.04330 | 0.01798 | 0.01573 | 0.01331 | 0.01331 |

Case | Data dimension | GP-NARX | LSTM | CNN-LSTM | CNN-BLSTM | BEST |
---|---|---|---|---|---|---|

1 | (40,2) | 0.15142 | 0.02912 | 0.02915 | 0.02147 | 0.02147 |

2 | (70,2) | 0.00620 | 0.01923 | 0.01288 | 0.01113 | 0.00620 |

3 | (66,2) | 0.02138 | 0.01661 | 0.01479 | 0.01278 | 0.01278 |

4 | (80,2) | 0.09544 | 0.01809 | 0.01409 | 0.01303 | 0.01303 |

5 | (30,2) | 0.01953 | 0.02891 | 0.01390 | 0.01288 | 0.01288 |

6 | (190,2) | 0.00344 | 0.01414 | 0.01486 | 0.01359 | 0.00344 |

7 | (150,2) | 0.01348 | 0.00885 | 0.01007 | 0.00923 | 0.00885 |

8 | (200,2) | 0.01510 | 0.01208 | 0.01358 | 0.01309 | 0.01208 |

9 | (150,2) | 0.00694 | 0.01402 | 0.01628 | 0.01513 | 0.00694 |

10 | (300,2) | 0.00321 | 0.01367 | 0.01317 | 0.01270 | 0.00321 |

11 | (50,2) | 0.01329 | 0.01227 | 0.01510 | 0.01516 | 0.01227 |

12 | (30,2) | 0.00588 | 0.01020 | 0.01003 | 0.00781 | 0.00588 |

13 | (45,2) | 0.24415 | 0.04063 | 0.03120 | 0.01823 | 0.01823 |

14 | (60,2) | 0.00668 | 0.01395 | 0.01108 | 0.01012 | 0.00668 |

Average | (104,2) | 0.04330 | 0.01798 | 0.01573 | 0.01331 | 0.01331 |

Note: The best model, which has the lowest MSE, is highlighted as bold font.

Case | Data dimension | GP-NARX | LSTM | CNN-LSTM | CNN-BLSTM | BEST |
---|---|---|---|---|---|---|

1 | (40,2) | 0.0354 | 0.0094 | 0.0162 | 0.0110 | 0.0094 |

2 | (70,2) | 0.0200 | 0.0096 | 0.0130 | 0.0092 | 0.0092 |

3 | (66,2) | 0.0222 | 0.0112 | 0.0128 | 0.0118 | 0.0112 |

4 | (80,2) | 0.0231 | 0.0115 | 0.0145 | 0.0100 | 0.0100 |

5 | (30,2) | 0.0071 | 0.0104 | 0.0120 | 0.0120 | 0.0071 |

6 | (190,2) | 0.0116 | 0.0127 | 0.0145 | 0.0120 | 0.0116 |

7 | (150,2) | 0.0106 | 0.0139 | 0.0156 | 0.0125 | 0.0106 |

8 | (200,2) | 0.0038 | 0.0145 | 0.0136 | 0.0122 | 0.0038 |

9 | (150,2) | 0.0123 | 0.0118 | 0.0141 | 0.0111 | 0.0111 |

10 | (300,2) | 0.0076 | 0.0118 | 0.0144 | 0.0103 | 0.0076 |

11 | (50,2) | 0.0047 | 0.0104 | 0.0144 | 0.0090 | 0.0047 |

12 | (30,2) | 0.0520 | 0.0156 | 0.0197 | 0.0140 | 0.0140 |

13 | (45,2) | 0.0870 | 0.0105 | 0.0104 | 0.0086 | 0.0086 |

14 | (60,2) | 0.0273 | 0.0083 | 0.0125 | 0.0081 | 0.0081 |

Average | (104,2) | 0.0232 | 0.0115 | 0.0141 | 0.0108 | 0.0108 |

Case | Data dimension | GP-NARX | LSTM | CNN-LSTM | CNN-BLSTM | BEST |
---|---|---|---|---|---|---|

1 | (40,2) | 0.0354 | 0.0094 | 0.0162 | 0.0110 | 0.0094 |

2 | (70,2) | 0.0200 | 0.0096 | 0.0130 | 0.0092 | 0.0092 |

3 | (66,2) | 0.0222 | 0.0112 | 0.0128 | 0.0118 | 0.0112 |

4 | (80,2) | 0.0231 | 0.0115 | 0.0145 | 0.0100 | 0.0100 |

5 | (30,2) | 0.0071 | 0.0104 | 0.0120 | 0.0120 | 0.0071 |

6 | (190,2) | 0.0116 | 0.0127 | 0.0145 | 0.0120 | 0.0116 |

7 | (150,2) | 0.0106 | 0.0139 | 0.0156 | 0.0125 | 0.0106 |

8 | (200,2) | 0.0038 | 0.0145 | 0.0136 | 0.0122 | 0.0038 |

9 | (150,2) | 0.0123 | 0.0118 | 0.0141 | 0.0111 | 0.0111 |

10 | (300,2) | 0.0076 | 0.0118 | 0.0144 | 0.0103 | 0.0076 |

11 | (50,2) | 0.0047 | 0.0104 | 0.0144 | 0.0090 | 0.0047 |

12 | (30,2) | 0.0520 | 0.0156 | 0.0197 | 0.0140 | 0.0140 |

13 | (45,2) | 0.0870 | 0.0105 | 0.0104 | 0.0086 | 0.0086 |

14 | (60,2) | 0.0273 | 0.0083 | 0.0125 | 0.0081 | 0.0081 |

Average | (104,2) | 0.0232 | 0.0115 | 0.0141 | 0.0108 | 0.0108 |

Note: The best model, which has the lowest MSE, is highlighted as bold font.

Comparing the results in Tables 2 and 3, it shows that increasing the number of training data in general can increase the accuracy of all four types of surrogate models. Since GP-NARX model can be trained using all the training data, we focus on the results in Table 3 for the comparative study. The results in Table 3 indicate that GP-NARX has a higher prediction accuracy than the RNN models (i.e., LSTM, CNN-LSTM, CNN-BLSTM) in general, when the period of dynamic prediction is long. For example, the MSE of GP-NARX prediction is 0.0038 while its counterpart of the RNN models is over 0.01 when the prediction period is 200 time-steps (i.e., Case 8). When the period for prediction is 40 time-steps (i.e., Case 1), the MSE of GP-NARX is over 0.03 while that of RNN models is less than 0.015, as shown in Table 3.

Figure 12 presents the comparison of surrogate model prediction and true response for one testing case for the scenario of surrogate modeling without subsampling. Following that, Fig. 13 gives the corresponding prediction errors of the four surrogate models for the testing case. The results plotted in Figs. 12 and 13 show that the predictive model constructed using LSTM has the highest prediction accuracy for this particular testing case.

It can be concluded from this particular mathematical example that the GP-NARX constructed based on the time series given in Fig. 11 is more suitable than RNN models for the prediction of long periods. On the other hand, RNN models, including LSTM, CNN-LSTM, and CNN-BLSTM models, are more suitable for predictions that have a similar data shape or distribution as the training data. For instance, the lengths of the training time series are between 20 to 50, and the RNN models have a higher prediction accuracy than GP-NARX when the number of prediction time-steps is close to that range. In addition, the results in Table 3 show that the prediction capability of RNN models is more robust than that of GP-NARX for this particular example, which is manifested as a lower average MSE value and a more stable performance.

### 4.2 Example 2: An Asymmetric Duffing Oscillator.

*u*(

*t*) is the input excitation given by

*c*= 10,

*k*= 2 × 10

^{4},

*k*

_{2}= 10

^{7}, and

*k*

_{3}= 5 × 10

^{9}. For demonstration purpose,

*a, b*, and

*c*are treated as controllable model parameters that can be changed for different experiments, and thus we have $\theta =[a,b,c]$.

In order to generate training data for the surrogate models, we first generate 60 training data for **θ** with a lower bound **θ*** _{L}* ∈{2, 0.1, 0.1} and an upper bound $\theta U\u2208{25,10,15}$ using the Latin Hypercube sampling method, and we have $\u03d1$ = 60 in Eq. (8). The range of

**θ**is only used for demonstration purpose in this paper. In probabilistic analysis or design of nonlinear dynamic systems, the range needs to be determined according to distributions of

**θ**. For each training dataset of

**θ**, excitations of

*u*(

*t*) are generated after adding noise to Eq. (30) based on the values of

*a*and

*b*. The length of each

*u*(t) is around 2000, and therefore we have

*N*

_{i}= {

*n*

_{i}|1200 ≤

*n*

_{i}≤ 2500},$\u2200i=1,2,\cdots ,60$ in Eq. (29). For each excitation and training data of

*c*, we solve Eq. (29) using the fourth-order fixed-step Runge–Kutta algorithm and obtain the responses of

*y*. Figure 14 shows one example of the 60 training excitations corresponding responses of

*y*.

Through cross-validation and after comparing the performance of surrogate models with different numbers of lag ranging from 5 to 15, it is found that a time lag of nine for both input excitation *u*(*t*) and output *y*(*t*) gives the best prediction accuracy in surrogate modeling for all four types of surrogate models. We, therefore, have *p* = 9 and *q* = 9. Following the procedure discussed in Sec. 3.1, we obtain training data for surrogate modeling. Through the 60 training time series, 100,211 training samples are obtained. The large volume of training data makes the training of GP-NARX computationally very challenging. We, therefore, perform subsampling of the training data using the approach discussed in Sec. 3.1.2. From the subsampling, 900 training data are sub-selected for the training of GP-NARX while maintaining the space-filling property. Figure 15 shows a comparison between the sub-selected samples and the original samples for two dimensions of the training data (i.e., *u*_{1} versus *y*_{1} and *u*_{3} versus *y*_{3}). As shown in this figure, the subsampling method can well-maintain the space-filling property while reducing the number of training data.

The sub-selected training data are used to train GP-NARX, LSTM, CNN-LSTM, and CNN-BLSTM models. Additionally, another set of LSTM, CNN-LSTM, and CNN-BLSTM models are trained using all the training data. We therefore have one GP-NARX model trained based on subsampling, two sets of LSTM, CNN-LSTM, and CNN-BLSTM models, respectively, trained with and without subsampling. After training the surrogate models, we randomly generate eight samples for **θ** = [*a, b, c*] according to the lower and upper bounds. Based on that, eight random input excitations are obtained using Eq. (30). We then assume that the original dynamic model is unknown and compare predictions of surrogate models with responses of the original dynamic model for the eight testing datasets. Since the results of LSTM, CNN-LSTM, and CNN-BLSTM models with subsampling are much worse than that of deep learning models without subsampling, GP-NARX with subsampling is directly compared with LSTM, CNN-LSTM, and CNN-BLSTM models trained using all available training data. Table 4 gives the MSE comparison of the four surrogate models for all eight testing cases. The average MSE for these eight test datasets is 2.29 × 10^{−08} for GP-NARX, the best among the four studied surrogate models. In addition, the deep learning models (LSTM, CNN-LSTM, and CNN-BLSTM) have similar performances. Thus, the results in Table 4 show that GP-NARX has a higher prediction accuracy than the other methods for this particular example. In addition, GP-NARX has a stable performance when the data changes drastically over long periods.

Case | Data dimension | MSE (×10^{−}^{8}) | ||||
---|---|---|---|---|---|---|

GP-NARX^{a} | LSTM | CNN-LSTM | CNN-BLSTM | BEST | ||

1 | (3000,9,3) | 1.29 | 28.3 | 9.59 | 18.3 | 1.29 |

2 | (2500,9,3) | 0.31 | 22.5 | 5.02 | 12.5 | 0.31 |

3 | (2700,9,3) | 3.44 | 31.8 | 10.6 | 32.6 | 3.44 |

4 | (2000,9,3) | 2.27 | 45.8 | 10.4 | 32.3 | 2.27 |

5 | (3100,9,3) | 2.75 | 42.7 | 10.4 | 36.3 | 2.75 |

6 | (1800,9,3) | 2.69 | 44.5 | 10.1 | 32.2 | 2.69 |

7 | (1900,9,3) | 2.68 | 44.9 | 11.5 | 32.1 | 2.68 |

8 | (2100,9,3) | 2.93 | 42.5 | 10.1 | 34.7 | 2.93 |

Average | (2750,9,3) | 2.29 | 37.9 | 9.7 | 28.9 | 2.29 |

Case | Data dimension | MSE (×10^{−}^{8}) | ||||
---|---|---|---|---|---|---|

GP-NARX^{a} | LSTM | CNN-LSTM | CNN-BLSTM | BEST | ||

1 | (3000,9,3) | 1.29 | 28.3 | 9.59 | 18.3 | 1.29 |

2 | (2500,9,3) | 0.31 | 22.5 | 5.02 | 12.5 | 0.31 |

3 | (2700,9,3) | 3.44 | 31.8 | 10.6 | 32.6 | 3.44 |

4 | (2000,9,3) | 2.27 | 45.8 | 10.4 | 32.3 | 2.27 |

5 | (3100,9,3) | 2.75 | 42.7 | 10.4 | 36.3 | 2.75 |

6 | (1800,9,3) | 2.69 | 44.5 | 10.1 | 32.2 | 2.69 |

7 | (1900,9,3) | 2.68 | 44.9 | 11.5 | 32.1 | 2.68 |

8 | (2100,9,3) | 2.93 | 42.5 | 10.1 | 34.7 | 2.93 |

Average | (2750,9,3) | 2.29 | 37.9 | 9.7 | 28.9 | 2.29 |

Note: The best model, which has the lowest MSE, is highlighted as bold font.

GP-NARX is trained based on subsampling.

Figure 16 presents the comparison of predictions and the true responses for one of the eight testing cases. Following that, Fig. 17 shows the corresponding kernel density estimates of error distribution for the four surrogate models. The results in Figs. 16 and 17 show that the predictive model constructed using GP-NARX has the highest prediction accuracy for this testing case (i.e., Case 1 in Table 4).

### 4.3 Example 3: The Bouc-Wen Model.

In structural engineering, many nonlinear inelastic material behaviors are of interest in structures such as reinforced concrete, steel, base isolation systems, damping devices, etc. [58] The Bouc-Wen model, proposed by Refs. [59,60], and extended by Refs. [61], is used in this work as the third example due to its flexibility to capture the behavior of many inelastic material models by just changing its tunable parameters.

*x*(

*t*), $x\u02d9(t),andx\xa8(t)$ are the displacement, velocity, and acceleration response of the SDOF system, respectively. Also,

*υ*,

*ψ*, and

*α*(

*t*) represent the mass, damping coefficient, and excitation force, respectively. The term

*F*(

*t*) is the restoring force, which according to the Bouc-Wen Model, can be expressed as follows:

*η*denotes the ratio between the post-yield stiffness and the pre-yield stiffness (i.e.,

*k*

_{i}) and

*z*(

*t*) denotes a nonobservable hysteretic displacement. As noted, Eq. (32) is modeled as the sum of a linear and a nonlinear function.

*β*,

*γ*, and $\u03c2$ are the Bouc-Wen tunable parameters that are used to model several different materials. For more details on applying the Bouc-Wen Model to an MDOF, the reader can refer to Ref. [4]. For this work, these values are set as constants. Figure 18 shows an example of input excitation and corresponding dynamic responses for a two-storey building nonlinear dynamic system.

In this example, we first generate 100 random excitations as the training time series. Based on the training excitations, 29,011 training samples are obtained. Using the subsampling approach discussed in Sec. 3.1.2, 2000 training data are sub-selected. Similar as that in Example 2, a GP-NARX model is trained using the sub-selected training data and two sets of LSTM, CNN-LSTM, and CNN-BLSTM models are, respectively, trained with the sub-selected data and with all available training data. The performance of LSTM, CNN-LSTM, and CNN-BLSTM models trained with subsampling is also much worse than that trained with all available data. We therefore also only compare GP-NARX trained using sub-selected data with deep learning models trained using all available data.

Tables 5 and 6 give the MSE comparison of the four surrogate models for all eight testing cases for the drifts of degree of freedom (DOF) 1 and 2. The average MSE of GP-NARX for the eight test datasets is 0.011 and 0.017 respectively for the drifts of the two DOFs. It implies that GP-NARX is the best among the four studied surrogate models for this particular example based on this group of training data. In addition, LSTM has the best performance for test case 1 while GP-NARX has a better overall performance than the other methods.

Case | Data dimension | GP-NARX | LSTM | CNN-LSTM | CNN-BLSTM | BEST |
---|---|---|---|---|---|---|

1 | (520,2) | 0.025 | 0.015 | 0.178 | 0.160 | 0.015 |

2 | (455,2) | 0.022 | 0.023 | 0.079 | 0.085 | 0.022 |

3 | (470,2) | 0.007 | 0.027 | 0.116 | 0.124 | 0.007 |

4 | (438,2) | 0.006 | 0.021 | 0.074 | 0.107 | 0.006 |

5 | (289,2) | 0.009 | 0.010 | 0.070 | 0.071 | 0.009 |

6 | (407,2) | 0.005 | 0.013 | 0.085 | 0.103 | 0.005 |

7 | (518,2) | 0.006 | 0.013 | 0.089 | 0.110 | 0.006 |

8 | (393,2) | 0.008 | 0.029 | 0.060 | 0.049 | 0.008 |

Average | (436,2) | 0.011 | 0.019 | 0.094 | 0.101 | 0.011 |

Case | Data dimension | GP-NARX | LSTM | CNN-LSTM | CNN-BLSTM | BEST |
---|---|---|---|---|---|---|

1 | (520,2) | 0.025 | 0.015 | 0.178 | 0.160 | 0.015 |

2 | (455,2) | 0.022 | 0.023 | 0.079 | 0.085 | 0.022 |

3 | (470,2) | 0.007 | 0.027 | 0.116 | 0.124 | 0.007 |

4 | (438,2) | 0.006 | 0.021 | 0.074 | 0.107 | 0.006 |

5 | (289,2) | 0.009 | 0.010 | 0.070 | 0.071 | 0.009 |

6 | (407,2) | 0.005 | 0.013 | 0.085 | 0.103 | 0.005 |

7 | (518,2) | 0.006 | 0.013 | 0.089 | 0.110 | 0.006 |

8 | (393,2) | 0.008 | 0.029 | 0.060 | 0.049 | 0.008 |

Average | (436,2) | 0.011 | 0.019 | 0.094 | 0.101 | 0.011 |

Note: The best model, which has the lowest MSE, is highlighted as bold font.

Case | Data dimension | GP-NARX | LSTM | CNN-LSTM | CNN-BLSTM | BEST |
---|---|---|---|---|---|---|

1 | (520,2) | 0.031 | 0.013 | 0.110 | 0.169 | 0.013 |

2 | (455,2) | 0.034 | 0.060 | 0.259 | 0.207 | 0.034 |

3 | (470,2) | 0.016 | 0.084 | 0.151 | 0.168 | 0.016 |

4 | (438,2) | 0.012 | 0.067 | 0.121 | 0.130 | 0.012 |

5 | (289,2) | 0.014 | 0.020 | 0.209 | 0.194 | 0.014 |

6 | (407,2) | 0.010 | 0.012 | 0.073 | 0.108 | 0.010 |

7 | (518,2) | 0.012 | 0.022 | 0.124 | 0.120 | 0.012 |

8 | (393,2) | 0.006 | 0.023 | 0.075 | 0.228 | 0.006 |

Average | (436,2) | 0.017 | 0.038 | 0.140 | 0.166 | 0.017 |

Case | Data dimension | GP-NARX | LSTM | CNN-LSTM | CNN-BLSTM | BEST |
---|---|---|---|---|---|---|

1 | (520,2) | 0.031 | 0.013 | 0.110 | 0.169 | 0.013 |

2 | (455,2) | 0.034 | 0.060 | 0.259 | 0.207 | 0.034 |

3 | (470,2) | 0.016 | 0.084 | 0.151 | 0.168 | 0.016 |

4 | (438,2) | 0.012 | 0.067 | 0.121 | 0.130 | 0.012 |

5 | (289,2) | 0.014 | 0.020 | 0.209 | 0.194 | 0.014 |

6 | (407,2) | 0.010 | 0.012 | 0.073 | 0.108 | 0.010 |

7 | (518,2) | 0.012 | 0.022 | 0.124 | 0.120 | 0.012 |

8 | (393,2) | 0.006 | 0.023 | 0.075 | 0.228 | 0.006 |

Average | (436,2) | 0.017 | 0.038 | 0.140 | 0.166 | 0.017 |

Note: The best model, which has the lowest MSE, is highlighted as bold font.

Figure 19 gives the random input excitation of a case study that is used to verify the accuracy of various surrogate models. Following that, Figs. 20 and 21 present the comparison of surrogate model prediction and the true response (i.e., force–displacement hysteresis) for the testing case (i.e., Case 4 in Tables 3 and 4). Figures 22–25 show the comparison of drifts from surrogate model prediction and the true response. To more intuitively compare the accuracy of different methods, Fig. 26 depicts the prediction error distributions of drifts for the four surrogate models.

## 5 Discussion and Conclusion

This paper performs a comparative study for surrogate modeling of nonlinear dynamic systems using four types of surrogate models, namely, GP-NARX, LSTM, CNN-LSTM, and CNN-BLSTM. The performances of different surrogate models for multi-step-ahead prediction are compared using three examples. The results show that (1) GP-NARX in general performs better than the other three types of surrogate modeling methods for the three particular examples; (2) the performance of GP-NARX is also relatively more robust than the other methods; (3) for some cases, deep learning-based surrogate modeling methods (i.e., LSTM, CNN-LSTM, and CNN-BLSTM) perform better than GP-NARX. But the required tuning effort of deep learning-based methods is higher than that of GP-NARX; (4) deep learning-based surrogate modeling methods are more suitable for the prediction that has a data shape which is similar to the training datasets; and (5) all the studied surrogate modeling methods can perform long-term multi-step prediction based on training data of short periods. It also demonstrates the promising potential of various machine learning methods for surrogate modeling of nonlinear dynamic systems.

Even though deep learning methods are becoming more and more popular in many fields, they do not provide a universal solution to all problems. For instance, deep learning-based methods are better than the conventional GP-NARX for several testing cases in the first example. But GP-NARX in general performs better than deep learning-based methods in the second and third examples. It is not suggested to substitute conventional machine learning methods with deep learning methods for surrogate modeling in various engineering applications without detailed investigations. It is therefore suggested that for problems that can be solved using conventional methods such as GP-NARX, following the “Occam’s Razor” principle, we should use such a method, since the required tuning effort of conventional methods is much lower. Deep learning methods (i.e., LSTM, CNN-LSTM, and CNN-BLSTM) have a significant advantage over GP-NARX in dealing with big data, such as videos, images, and high-dimensional data. Even for big data problems, GP-NARX method is still worth investigating after using dimension reduction techniques, such as autoencoder, or using subsampling method as what been discussed in this paper. The guidance of choosing surrogate models is to always start from the classical GP-NARX models whenever it is applicable, since it tends to be more robust and easier to tune than deep learning methods. In some situations, deep learning models can be used in conjunction with GP-NARX to improve the predictive capability. For instance, a CNN model can be used to reduce the high-dimensional problems into low-dimensional ones in the latent space, within which GP-NARX models can be used to capture temporal variability over time.

This paper only compared the performance of various surrogate modeling methods for deterministic predictions. Since the confidence of dynamic prediction is usually of interest to decision makers, the capability of various methods in providing probabilistic prediction is worth investigating in the future. In addition, the quality of surrogate models is affected by the data used for training, how to collect the most effective data for training surrogate models of nonlinear dynamic systems is also a very interesting research topic to be further studied.

## Acknowledgment

The US Army Corps of Engineers provided funding for this work through the U.S. Army Engineer Research and Development Center Research Cooperative Agreement W912HZ-17-2-0024. The support is gratefully acknowledged. LA-UR-21-31231 approved for public release; distribution is unlimited.

## Conflict of Interest

There are no conflicts of interest.

## Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request. The authors attest that all data for this study are included in the paper.

## References

_{2}Jets