Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 15 November 2022

Closed-form continuous-time neural networks

  • Ramin Hasani   ORCID: orcid.org/0000-0002-9889-5222 1   na1 ,
  • Mathias Lechner 1 , 2   na1 ,
  • Alexander Amini 1 ,
  • Lucas Liebenwein   ORCID: orcid.org/0000-0002-3229-6665 1 ,
  • Aaron Ray 1 ,
  • Max Tschaikowski 3 ,
  • Gerald Teschl   ORCID: orcid.org/0000-0002-1036-9173 4 &
  • Daniela Rus 1  

Nature Machine Intelligence volume  4 ,  pages 992–1003 ( 2022 ) Cite this article

103k Accesses

20 Citations

178 Altmetric

Metrics details

  • Computer science

A Publisher Correction to this article was published on 29 November 2022

This article has been updated

A preprint version of the article is available at arXiv.

Continuous-time neural networks are a class of machine learning systems that can tackle representation learning on spatiotemporal decision-making tasks. These models are typically represented by continuous differential equations. However, their expressive power when they are deployed on computers is bottlenecked by numerical differential equation solvers. This limitation has notably slowed down the scaling and understanding of numerous natural physical phenomena such as the dynamics of nervous systems. Ideally, we would circumvent this bottleneck by solving the given dynamical system in closed form. This is known to be intractable in general. Here, we show that it is possible to closely approximate the interaction between neurons and synapses—the building blocks of natural and artificial neural networks—constructed by liquid time-constant networks efficiently in closed form. To this end, we compute a tightly bounded approximation of the solution of an integral appearing in liquid time-constant dynamics that has had no known closed-form solution so far. This closed-form solution impacts the design of continuous-time and continuous-depth neural models. For instance, since time appears explicitly in closed form, the formulation relaxes the need for complex numerical solvers. Consequently, we obtain models that are between one and five orders of magnitude faster in training and inference compared with differential equation-based counterparts. More importantly, in contrast to ordinary differential equation-based continuous networks, closed-form networks can scale remarkably well compared with other deep learning instances. Lastly, as these models are derived from liquid networks, they show good performance in time-series modelling compared with advanced recurrent neural network models.

Similar content being viewed by others

recurrent neural network thesis

Highly accurate protein structure prediction with AlphaFold

recurrent neural network thesis

Quantifying cell-state densities in single-cell phenotypic landscapes using Mellon

recurrent neural network thesis

A virtual rodent predicts the structure of neural activity across behaviors

Continuous neural network architectures built by ordinary differential equations (ODEs) 2 are expressive models useful in modelling data with complex dynamics. These models transform the depth dimension of static neural networks and the time dimension of recurrent neural networks (RNNs) into a continuous vector field, enabling parameter sharing, adaptive computations and function approximation for non-uniformly sampled data.

These continuous-depth (time) models have shown promise in density estimation applications 3 , 4 , 5 , 6 , as well as modelling sequential and irregularly sampled data 1 , 7 , 8 , 9 .

While ODE-based neural networks with careful memory and gradient propagation design 9 perform competitively with advanced discretized recurrent models on relatively small benchmarks, their training and inference are slow owing to the use of advanced numerical differential equation (DE) solvers 10 . This becomes even more troublesome as the complexity of the data, task and state space increases (that is, requiring more precision) 11 , for instance, in open-world problems such as medical data processing, self-driving cars, financial time-series and physics simulations.

The research community has developed solutions for resolving this computational overhead and for facilitating the training of neural ODEs, for instance by relaxing the stiffness of a flow by state augmentation techniques 4 , 12 , reformulating the forward pass as a root-finding problem 13 , using regularization schemes 14 , 15 , 16 or improving the inference time of the network 17 .

Here, we derive a closed-form continuous-depth model that has the modelling capabilities of ODE-based models but does not require any solver to model data (Fig. 1 ).

figure 1

A postsynaptic neuron receives the stimuli I ( t ) through a nonlinear conductance-based synapse model. Here, S ( t ) stands for the synaptic current. The dynamics of the membrane potential of this postsynaptic neuron are given by the DE presented in the middle. This equation is a fundamental building block of LTC networks 1 , for which there is no known closed-form expression. Here, we provide an approximate solution for this equation which shows the interaction of nonlinear synapses with postsynaptic neurons in closed form.

Intuitively, in this work, we replace the integration (that is, solution) of a nonlinear DE describing the interaction of a neuron with its input nonlinear synaptic connections, with their corresponding nonlinear operators. This could be achieved in principle using functional Taylor expansions (in the spirit of the Volterra series) 18 . However, in the particular case of liquid time-constant (LTC) networks, we can leverage a closed-form expression for the system’s response to input. This allows one to evaluate the system’s response to exogenous input ( I ) and recurrent inputs from hidden states ( x ) as a function of time. One way of looking at this is to regard the closed-form solution as the application of a nonlinear forward operator to the inputs of each hidden state or neuron in the network, where the outputs of one neuron constitute the inputs for others. Effectively, this rests on approximating a conductance-based model with a neural mass model, of the kind used in the dynamic causal modelling of real neuronal networks 19 .

The proposed continuous neural networks yield considerably faster training and inference speeds while being as expressive as their ODE-based counterparts. We provide a derivation for the approximate closed-form solution to a class of continuous neural networks that explicitly models time. We demonstrate how this transformation can be formulated into a novel neural model and scaled to create flexible, performant and fast neural architectures on challenging sequential datasets.

Deriving an approximate closed-form solution for neural interactions

Two neurons interact with each other through synapses as shown in Fig. 1 . There are three principal mechanisms for information propagation in natural brains that are abstracted away in the current building blocks of deep learning systems: (1) neural dynamics are typically continuous processes described by DEs (see the dynamics of x ( t ) in Fig. 1 ), (2) synaptic release is much more than scalar weights, involving a nonlinear transmission of neurotransmitters, the probability of activation of receptors and the concentration of available neurotransmitters, among other nonlinearities (see S ( t ) in Fig. 1 ) and (3) the propagation of information between neurons is induced by feedback and memory apparatuses (see how I ( t ) stimulates x ( t ) through a nonlinear synapse S ( t ) which also has a multiplicative difference of potential to the postsynaptic neuron accounting for a negative feedback mechanism). One could read I ( t ) as a mixture of exogenous input to the (neural) network and presynaptic inputs from other neurons that result in a depolarization x ( t ). This depolarization is mediated by the current S ( t ) that depends upon depolarization and a reversal threshold A . LTC networks 1 , which are expressive continuous-depth models obtained by a bilinear approximation 20 of a neural ODE formulation 2 , are designed on the basis of these mechanisms. Correspondingly, we take their ODE semantics and approximate a closed-form solution for the scalar case of a postsynaptic neuron receiving an input stimulus from a presynaptic source through a nonlinear synapse.

To this end, we apply the theory of linear ODEs 21 to analytically solve the dynamics of an LTC DE as shown in Fig. 1 . We then simplify the solution to the point where there is one integral left to solve. This integral compartment, \(\int\nolimits_{0}^{t}f(I(s))\,{\mathrm{d}}s\) in which f is a positive, continuous, monotonically increasing and bounded nonlinearity, is challenging to solve in closed form since it has dependencies on an input signal I ( s ) that is arbitrarily defined (such as real-world sensory readouts). To approach this problem, we discretize I ( s ) into piecewise constant segments and obtain the discrete approximation of the integral in terms of the sum of piecewise constant compartments over intervals. This piecewise constant approximation inspired us to introduce an approximate closed-form solution for the integral \(\int\nolimits_{0}^{t}f(I(s))\,{\mathrm{d}}s\) that is provably tight when the integral appears as the exponent of an exponential decay, which is the case for LTCs. We theoretically justify how this closed-form solution represents LTCs’ ODE semantics and is as expressive (Fig. 1 ).

Explicit time dependence

We then dissect the properties of the obtained closed-form solution and design a new class of neural network models we call closed-form continuous-depth networks (CfC). CfCs have an explicit time dependence in their formulation that does not require a numerical ODE solver to obtain their temporal rollouts. Thus, they maximize the trade-off between accuracy and efficiency of solvers. Formally, this property corresponds to obtaining lower time complexity for models without numerical instabilities and errors as illustrated in Table 1 (left). For example, Table 1 (left) shows that the complexity of a p th-order numerical ODE solver is \({{{\mathcal{O}}}}(Kp)\) , where K is the number of ODE steps, while a CfC system (which has explicit time dependence) requires \({{{\mathcal{O}}}}(\tilde{K})\) , where K is the exogenous input time steps, which are typically one to three orders of magnitude smaller than K . Moreover, the approximation error of a p th-order numerical ODE solver scales with \({{{\mathcal{O}}}}({\epsilon }^{p+1})\) , whereas CfCs are closed-form continuous-time systems, thus the notion of approximation error becomes irrelevant to them.

This explicit time dependence allows CfCs to perform computations at least one order of magnitude faster in terms of training and inference time compared with their ODE-based counterparts, without loss of accuracy.

Sequence and time-step prediction efficiency

In sequence modelling tasks, one can perform predictions based on an entire sequence of observations, or perform auto-regressive modelling where the model predicts the next time-step output given the current time-step input. Table 1 (right) depicts the time complexity of different neural network instances at inference, for a given sequence of length n and a neural network of k number of hidden units. We observe that the complexity of ODE-based networks and Transformer modules is at least an order of magnitude higher than that of discrete RNNs and CfCs in both sequence prediction and auto-regressive modelling (time-step prediction) frameworks.

This is desirable because not only do CfCs establish a continuous flow similar to ODE models 1 to achieve better expressivity in representation learning but they do so with the efficiency of discrete RNN models.

CfCs: flexible deep models for sequential tasks

Additionally, CfCs are equipped with novel time-dependent gating mechanisms that explicitly control their memory. CfCs are as expressive as their ODE-based peers and can be supplied with mixed memory architectures 9 to avoid gradient issues in sequential data processing applications with long-range dependences. Beyond accuracy and performance metrics, our results indicate that, when considering accuracy per compute time, CfCs exhibit over 150 fold improvements over ODE-based compartments. We perform a diverse set of advanced time-series modelling experiments and present the performance and speed gain achievable by using CfCs in tasks with long-term dependences, irregular data and modelling physical dynamics, among others.

Deriving a closed-form solution

In this section, we derive an approximate closed-form solution for LTC networks, an expressive subclass of time-continuous models. We discuss how the scalar closed-form expression derived from a small LTC system can inspire the design of CfC models. In this regard, we define the LTC semantics. We then state the main theorem that computes a closed-form approximation of a given LTC system for the scalar case. To prove the theorem, we first find the integral solution of the given LTC ODE system. We then compute a closed-form analytical solution for the integral solution for the case of piecewise constant inputs. Afterward, we generalize the closed-form solution of the piecewise constant inputs to the case of arbitrary inputs with our novel approximation and finally provide sharpness results (that is, measure the rate and accuracy of an approximation error) for the derived solution.

The hidden state of an LTC network is determined by the solution of the following initial value problem (IVP) 1 :

where at a time step t , x ( D ×1) ( t ) defines the hidden state of a LTC layer with D cells, and I ( m ×1) ( t ) is an exogenous input to the system with m features. Here, \({w}_{\tau }^{(D\times 1)}\) is a time-constant parameter vector, A ( D ×1) is a bias vector, f is a neural network parametrized by θ and ⊙ is the Hadamard product. The dependence of f (.) on x ( t ) denotes the posibility of having recurrent connections.

The full proof of theorem 1 is given in Methods . The theorem formally demonstrates that the approximated closed-form solution for the given LTC system is given by equation ( 2 ) and that this approximation is tightly bounded with bounds given in the proof.

In the following, we show an illustrative example of this tightness result in practice. To do this, we first present an instantiation of LTC networks and their approximate closed-form expressions. Extended Data Fig. 1 shows a liquid network with two neurons and five synaptic connections. The network receives an input signal I ( t ). Extended Data Fig. 1 further derives the DE expression for the network along with its closed-form approximate solution. In general, it is possible to compile an LTC network into its closed-form expression as illustrated in Extended Data Fig. 1 . This compilation can be performed using Algorithm 1 provided in Methods .

Given an LTC system determined by the IVP in equation ( 1 ), constructed by one cell, receiving a single-dimensional time-series exogenous input I(t) with no self-connections, the following expression is an approximation of its closed-form solution:

Tightness of the closed-form solution in practice

Figure 2 shows an LTC-based network trained for autonomous driving 22 . The figure further illustrates how close the proposed solution fits the actual dynamics exhibited from a single-neuron ODE given the same parametrization. The details of this experiment are given in Methods .

figure 2

We approximate a closed-form solution for LTC networks 1 while largely preserving the trajectories of their equivalent ODE systems. We develop our solution into CfC models that are at least 100 fold faster than neural ODEs at both training and inference on complex time-series prediction tasks.

We next show how to design a novel neural network instance inspired by this closed-form solution that has well-behaved gradient properties and approximation capabilities.

Designing CfC models from the solution

Leveraging the scalar closed-form solution expressed by equation ( 2 ), we can now distil this model into a neural network model that can be trained at scale. The solution provides a grounded theoretical basis for solving scalar continuous-time dynamics, and it is important to translate this theory into a practical neural network model which can be integrated into larger representation learning systems equipped with gradient descent optimizers. Doing so requires careful attention to potential gradient and expressivity issues that can arise during optimization, which we will outline in this section.

Formally, the hidden states, x ( t ) ( D ×1) with D hidden units at each time step t , can be obtained explicitly as

where B ( D ) collapses ( x 0  −  A ) of equation ( 2 ) into a parameter vector. A ( D ) and \({w}_{\tau }^{(D)}\) are system’s parameter vectors, while I ( t ) ( m ×1) is an m -dimensional input at each time step t , f is a neural network parametrized by \(\theta =\{{W}_{Ix}^{(m\times D)},{W}_{xx}^{(D\times D)},{b}_{x}^{(D)}\}\) and ⊙ is the Hadamard (element-wise) product. While the neural network presented in equation ( 3 ) can be proven to be a universal approximator as it is an approximation of an ODE system 1 , 2 , in its current form, it has trainability issues which we point out and resolve shortly.

Resolving the gradient issues

The exponential term in equation ( 3 ) drives the system’s first part (exponentially fast) to 0 and the entire hidden state to A . This issue becomes more apparent when there are recurrent connections and causes vanishing gradient factors when trained by gradient descent 23 . To reduce this effect, we replace the exponential decay term with a reversed sigmoidal nonlinearity σ (.). This nonlinearity is approximately 1 at t  = 0 and approaches 0 in the limit t  →  ∞ . However, unlike exponential decay, its transition happens much more smoothly, yielding a better condition on the loss surface.

Replacing biases by learnable instances

Next, we consider the bias parameter B to be part of the trainable parameters of the neural network f ( −  x , −  I ;  θ ) and choose to use a new network instance instead of f (presented in the exponential decay factor). We also replace A with another neural network instance, h (. ) to enhance the flexibility of the model. To obtain a more general network architecture, we allow the nonlinearity f (− x , − I ;  θ ) present in equation ( 3 ) to have both shared (backbone) and independent ( g (. )) network compartments.

Gating balance

The time-decaying sigmoidal term can play a gating role if we additionally multiply h (. ) with (1 −  σ (. )). This way, the time-decaying sigmoid function stands for a gating mechanism that interpolates between the two limits of t  → − ∞ and t  →  ∞ of the ODE trajectory.

Instead of learning all three neural network instances f ,  g and h separately, we have them share the first few layers in the form of a backbone that branches out into these three functions. As a result, the backbone allows our model to learn shared representations, thereby speeding up and stabilizing the learning process. More importantly, this architectural prior enables two simultaneous benefits: (1) Through the shared backbone, a coupling between the time constant of the system and its state nonlinearity is established that exploits causal representation learning evident in a liquid neural network 1 , 24 . (2) through separate head network layers, the system has the ability to explore temporal and structural dependences independently of each other.

These modifications result in the CfC neural network model:

The CfC architecture is illustrated in Extended Data Fig. 2 . The neural network instances could be selected arbitrarily. The time complexity of the algorithm is equivalent to that of discretized recurrent networks 25 , being at least one order of magnitude faster than ODE-based networks.

The procedure to account for the explicit time dependence

CfCs are continuous-depth models that can set their temporal behaviour based on the task under test. For time-variant datasets (for example, irregularly sampled time series, event-based data and sparse data), the t for each incoming sample is set based on its time stamp or order. For sequential applications where the time of the occurrence of a sample does not matter, t is sampled as many times as the batch length, with equidistant intervals within two hyperparameters a and b .

Experiments with CfCs

We now assess the performance of CfCs in a series of sequential data processing tasks compared with advanced, recurrent models. We first approach solving conventional sequential data modelling tasks (for example, bit-stream prediction, sentiment analysis on text data, medical time-series prediction, human activity recognition, sequential image processing and robot kinematics modelling), and compare CfC variants with an extensive set of advanced RNN baselines. We then evaluate how CfCs compare with LTC-based neural circuit policies (NCPs) 22 in real-world autonomous lane-keeping tasks.

CfC network variants

To evaluate the proposed modifications we applied to the closed-form solution network described by equation ( 3 ), we test four variants of the CfC architecture: (1) the closed-form solution network (Cf-S) obtained by equation ( 3 ), (2) the CfC without the second gating mechanism (CfC-noGate), a variant that does not have the 1 −  σ instance shown in Extended Data Fig. 2 , (3) The CfC model (CfC) expressed by equation ( 4 ) and (4) the CfC wrapped inside a mixed memory architecture (that is, where the CfC defines the memory state of an RNN, for instance, a long short-term memory (LSTM)), a variant we call CfC-mmRNN. Each of these four proposed variants leverages our proposed solution and thus is at least one order of magnitude faster than continuous-time ODE models.

To investigate their representation learning power, in the following we extensively evaluate CfCs on a series of sequence modelling tasks. The objective is to test the effectiveness of the CfCs in learning spatiotemporal dynamics, compared with a wide range of advanced models.

We compare CfCs with a diverse set of advanced algorithms developed for sequence modelling by both discretized and continuous mechanisms. These baselines are given in full in Methods .

Human activity recognition

The human activity dataset 7 contains 6,554 sequences of humans demonstrating activities such as walking, lying, sitting, etc. The input space is formed of 561-dimensional inertial sensor measurements per time step, recorded from the user’s smartphone 26 , being categorized into six group of activities (per time step) as output.

We set up our dataset split (training, validation and test) to carefully reflect the modifications made by Rubanova et al. 7 on this task. The results of this experiment are reported in Table 2 . We observe that not only do the CfC variants Cf-S, CfC-noGate and CfC-mmRNN outperform other models with a high margin, but they do so with a speed-up of more than 8,752% over the best-performing ODE-based instance (Latent-ODE-ODE). The reason for such a large speed difference is the complexity of the dataset dynamics that causes the ODE solvers of ODE-based models such as Latent-ODE-ODE to compute many steps upon stiff dynamics. This issue does not exist for closed-form models as they do not use any ODE solver to account for dynamics. The hyperparameter details of this experiment are provided in Extended Data Fig. 3 .

Physical dynamics modelling

The Walker2D dataset consists of kinematic simulations of the MuJoCo physics engine 27 (see Methods for more details). As shown in Table 3 , CfCs outperform the other baselines by a large margin, supporting their strong capability to model irregularly sampled physical dynamics with missing phases. It is worth mentioning that, on this task, CfCs even outperform transformers by a considerable, 18% margin. The hyperparameter details of this experiment are provided in Extended Data Fig. 3 .

Event-based sequential image processing

We next assess the performance of CfCs on a challenging sequential image processing task. This task is generated from the sequential modified National Institute of Standards and Technology (MNIST) dataset following the steps described in Methods . Moreover, the hyperparameter details of this experiment are provided in Extended Data Fig. 4 .

Table 4 summarizes the results on this event-based sequence classification task. We observe that models such as ODE-RNN, CT-RNN, GRU-ODE and LSTMs struggle to learn a good representation of the input data and therefore show poor performance. In contrast, RNNs endowed with explicit memory, such as bi-directional RNNs, GRU-D, Lipschitz RNN, coRNN, CT-LSTM and ODE-LSTM, perform well on this task. All CfC variants perform well on this task and establish the state-of-the-art on this task, with CfC-mmRNN achieving 98.09% and CfC-noGate achieving 96.99% accuracy in classifying irregularly sampled sequences. It is worth mentioning that they do so around 200–400% faster than ODE-based models such as GRU-ODE and ODE-RNN.

Regularly and irregularly sampled bit-stream XOR

The bit-stream XOR dataset 9 considers the classification of bit streams by implementing an XOR function in time. That is, each item in the sequence contributes equally to the correct output. The details are given in Methods .

Extended Data Fig. 5 compares the performance of many RNN baselines. Many architectures such as Augmented LSTM, CT-GRU, GRU-D, ODE-LSTM, coRNN and Lipschitz RNN, and all variants of CfC, can successfully solve the task with 100% accuracy when the bit-stream samples are equidistant from each other. However, when the bit-stream samples arrive at non-uniform distances, only architectures that are immune to the vanishing gradient in irregularly sampled data can solve the task. These include GRU-D, ODE-LSTM, CfC and CfC-mmRNNs. ODE-based RNNs cannot solve the event-based encoding tasks regardless of their choice of solvers, as they have vanishing/exploding gradient issues 9 . The hyperparameter details of this experiment are provided in Extended Data Fig. 4 .

PhysioNet Challenge

The PhysioNet Challenge 2012 dataset considers the prediction of the mortality of 8,000 patients admitted to the intensive care unit. The features represent time series of medical measurements taken during the first 48 h after admission. The data are irregularly sampled in time and over features, that is, only a subset of the 37 possible features is given at each time point. We perform the same test–train split and preprocessing as in ref. 7 , and report the area under the curve (AUC) on the test set as a metric in Extended Data Fig. 6 . We observe that CfCs perform competitively to other baselines while performing 160 times faster in terms of training time compared with ODE-RNN and 220 times compared with continuous latent models. CfCs are also, on average, three times faster than advanced discretized gated recurrent models. The hyperparameter details of this experiment are provided in Extended Data Fig. 7 .

Sentiment analysis using IMDB

The Internet Movie Database (IMDB) sentiment analysis dataset 28 consists of 25,000 training and 25,000 test sentences (see Methods for more details). Extended Data Fig. 8 shows how CfCs equipped with mixed memory instances outperform advanced RNN benchmarks. The hyperparameter details of this experiment are provided in Extended Data Fig. 7 .

Performance of CfCs in autonomous driving

In this experiment, our objective is to evaluate how robustly CfCs learn to perform autonomous navigation in comparison with their ODE-based counterparts, LTC networks. The task is to map incoming high-dimensional pixel observations to steering curvature commands. The details of this experiment are given in Methods .

We observe that CfCs similar to NCPs demonstrate a consistent attention pattern in each subtask while maintaining their attention profile under heavy noise as depicted in Extended Data Fig. 10c . This is while the attention profile of other networks such as CNNs and LSTMs is hindered by added input noise (Extended Data Fig. 10c ).

This experiment empirically validates that CfCs possess similar robustness properties to their ODE counterparts, that is, LTC-based networks. Moreover, similar to NCPs, CfCs are parameter efficient. They performed the end-to-end autonomous lane-keeping task with around 4,000 trainable parameters in their RNN component (Extended Data Fig. 9 ).

Scope, discussion and conclusions

We introduce a closed-form continuous-time neural model built from an approximate closed-form solution of LTC networks that possess the strong modelling capabilities of ODE-based networks while being notably faster, more accurate, and stable. These closed-form continuous-time models achieve this by explicit time-dependent gating mechanisms and having a LTC modulated by neural networks. A discussion of related research on continuous-time models is given in Methods .

For large-scale time-series prediction tasks, and where closed-loop performance matters 24 , CfCs can bring great value. This is because they capture the flexible, causal and continuous-time nature of ODE-based networks, such as LTC networks, while being more efficient. A discussion on how to use different variants of CfCs is provided in Methods . On the other hand, implicit ODE- and partial differential equation-based models 17 , 29 , 30 , 31 can be beneficial in solving continuously defined physics problems and control tasks. Moreover, for generative modelling, continuous normalizing flows built by ODEs are the suitable choice of model as they ensure invertibility, unlike CfCs 2 . This is because DEs guarantee invertibility (that is, under uniqueness conditions 6 , one can run them backwards in time). CfCs only approximate ODEs and therefore no longer necessarily form a bijection 32 .

What are the limitations of CfCs?

CfCs might express vanishing gradient problems. To avoid this, for tasks that require long-term dependences, it is better to use them together with mixed memory networks 9 (as in the CfC variant CfC-mmRNN) or with proper parametrization of their transition matrices 33 , 34 . Moreover, we speculate that inferring causality from ODE-based networks might be more straightforward than a closed-form solution 24 . It would also be beneficial to assess whether verifying a continuous neural flow 35 is more tractable by using an ODE representation of the system or its closed form.

For problems such as language modelling where a large amount of sequential data and substantial computational resources are available, transformers 36 and their variants are great choices of models. CfCs could bring value when: (1) data have limitations and irregularities (for example, medical data, financial time series, robotics 37 and closed-loop control, and multi-agent autonomous systems in supervised and reinforcement learning schemes 38 ), (2) the training and inference efficiency of a model is important (for example, embedded applications 39 , 40 , 41 ) and (3) when interpretability matters 42 .

Ethics statement

All authors acknowledge the Global Research Code on the development, implementation and communication of this research. For the purpose of transparency, we have included this statement on inclusion and ethics. This work cites a comprehensive list of research from around the world on related topics.

Proof of theorem 1

Proof. In the single-dimensional case, the IVP in equation ( 1 ) becomes linear in x as follows:

Therefore, we can use the theory of linear ODEs to obtain an integral closed-form solution (section 1.10 in ref. 21 ) consisting of two nested integrals. The inner integral can be eliminated by means of integration by substitution 43 . The remaining integral expression can then be solved in the case of piecewise constant inputs and approximated in the case of general inputs. The three steps of the proof are outlined below.

Integral closed-form solution of LTC

We consider the ODE semantics of a single neuron that receives some arbitrary continuous input signal I and has a positive, bounded, continuous and monotonically increasing nonlinearity f :

Assumption . We assumed a second constant value w τ in the above representation of a single LTC neuron. This is done to introduce symmetry in the structure of the ODE, yielding a simpler expression for the solution. The inclusion of this second constant may appear to profoundly alter the dynamics. However, as shown below, numerical experiments suggest that this simplifying assumption has a marginal effect on the ability to approximate LTC cell dynamics.

Using the variation of constants formula (section 1.10 in ref. 21 ), we obtain after some simplifications:

Analytical LTC solution for piecewise constant inputs

The derivation of a useful closed-form expression of x requires us to solve the integral expression \(\int\nolimits_{0}^{t}f(I(s))\,{\mathrm{d}}s\) for any t  ≥ 0. Fortunately, the integral \(\int\nolimits_{0}^{t}f(I(s))\,{\mathrm{d}}s\) enjoys a simple closed-form expression for piecewise constant inputs I . Specifically, assume that we are given a sequence of time points

such that \({\tau }_{1},\ldots ,{\tau }_{n-1}\in {\mathbb{R}}\) and I ( t ) =  γ i for all t   ∈  [ τ i ;  τ i +1 ) with 0 ≤  i  ≤ n  − 1. Then, it holds that

when τ k  ≤  t  <  τ k +1 for some 0 ≤  k  ≤  n  − 1 (as usual, one defines \(\mathop{\sum }\nolimits_{i = 0}^{-1}:= 0\) ). With this, we have

when τ k  ≤  t  <  τ k +1 for some 0 ≤  k  ≤  n  − 1. While any continuous input can be approximated arbitrarily well by a piecewise constant input 43 , a tight approximation may require a large number of discretization points τ 1 , …,  τ n . We address this next.

Analytical LTC approximation for general inputs

Inspired by equations ( 7 ) and ( 8 ), the next result provides an analytical approximation of x ( t ).

For any Lipschitz continuous, positive, monotonically increasing and bounded f and continuous input signal I(t), we approximate x(t) in equation ( 6 ) as

Then, \(| x(t)-\tilde{x}(t)| \le | x(0)-A| {{\mathrm{e}}}^{-{w}_{\tau }t}\) for all t   ≥   0. Writing c   =   x(0)   −   A for convenience, we can obtain the following sharpness results, additionally:

For any t   ≥   0, we have \(\sup \left\{ \frac{1}{c}(x(t)-\tilde{x}(t))| I:[0;t]\to {\mathbb{R}} \right\}={{\mathrm{e}}}^{-{w}_{\tau }t}\) .

For any t   ≥   0, we have \(\inf \left\{ \frac{1}{c}(x(t)-\tilde{x}(t))| I:[0;t]\to {\mathbb{R}} \right\}={{\mathrm{e}}}^{-{w}_{\tau }t}({{\mathrm{e}}}^{-t}-1)\) .

Above, the supremum and infimum are meant to be taken across all continuous input signals. These statements settle the question about the worst-case errors of the approximation. The first statement implies, in particular, that our bound is sharp.

The full proof is given in the next section. Lemma 1 demonstrates that the integral solution we obtained and shown in equation ( 6 ) is tightly close to the approximate closed-form solution we proposed in equation (9). Note that, as w τ is positively defined, the derived bound between equations ( 6 ) and (9) ensures an exponentially decaying error as time goes by. Therefore, we have the statement of the theorem. □

Proof of lemma 1

We start by noting that

Since 0 ≤  f  ≤ 1, we conclude that \({{\mathrm{e}}}^{-\int\nolimits_{0}^{t}f(I(s)){\mathrm{d}}s}\in [0;1]\) and e − f ( I ( t )) t f (− I ( t ))  ∈  [0; 1]. This shows that \(| x(t)-\tilde{x}(t)| \le | c| {{\mathrm{e}}}^{-{w}_{\tau }t}\) . To see the sharpness results, pick some arbitrary small ε  > 0 and a sufficiently large C  > 0 such that f (− C ) ≤  ε and 1 −  ε  ≤  f ( C ). With this, for any 0 <  δ  <  t , we consider the piecewise constant input signal I such that I ( s ) = − C for s   ∈  [0;  t  −  δ ] and I ( s ) =  C for s   ∈  ( t  −  δ ;  t ]. Then, it can be noted that

Statement 1 follows by noting that there exists a family of continuous signals \({I}_{n}:[0;t]\to {\mathbb{R}}\) such that ∣ I n (  ⋅  ) ∣  ≤  C for all n  ≥ 1 and I n  →  I pointwise as n  →  ∞ . This is because

where L is the Lipschitz constant of f , and the last identity is due to the dominated convergence theorem 43 . To see statement 2, we first note that the negation of the signal − I provides us with

if ε ,  δ  → 0. The fact that the left-hand side of the last inequality must be at least e − t  − 1 follows by observing that \({{\mathrm{e}}}^{-t}\le {{\mathrm{e}}}^{-\int\nolimits_{0}^{t}f(I^{\prime} (s)){\mathrm{d}}s}\) and e − f ( I ″ ( t )) t f ( −  I ″ ( t )) ≤ 1 for any \(I^{\prime} ,I^{\prime\prime} :[0;t]\to {\mathbb{R}}\) . □

Compiling LTC architectures into their closed-form equivalent

In general, it is possible to compile the architecture of an LTC network into its closed-form version. This compilation allows us to speed up the training and inference time of ODE-based networks as the closed-form variant does not require complex ODE solvers to compute outputs. Algorithm 1 provides the instructions on how to transfer the architecture of an LTC network into its closed-form variant. Here, W Adj corresponds to the adjacency matrix that maps exogenous inputs to hidden states and the coupling among hidden states. This adjacency matrix can have an arbitrary sparsity (that is, there is no need to use a directed acyclic graph for the coupling between neurons).

Algorithm 1

Translate the architecture of an LTC network into its closed-form variant

  Inputs: LTC inputs I ( N × T ) ( t ), the activity x ( H × T ) ( t ) and initial states x ( H ×1) (0) of LTC neurons and the adjacency matrix for synapses \({W}_{Adj}^{[(N+H)* (N+H)]}\)

 LTC ODE solver with step of Δ t

 time-instance vectors of inputs, \({{{{\bf{t}}}}}_{I(t)}^{(1\times T)}\)

 time-instance of LTC neurons t x ( t )     ∇ time might be sampled irregularly

 LTC neuron parameter τ ( H ×1)

 LTC network synaptic parameters { σ ( N × H ) , μ ( N × H ) , A ( N × H ) }

  Outputs: LTC closed-form approximation of hidden state neurons, \({\hat{{{{\bf{x}}}}}}^{(N\times T)}(t)\)

  x pre ( t ) =  W Adj  × [ I 0 … I N ,  x 0 … x H ]    ∇ all presynaptic signals to nodes

  for i th neuron in neurons 1 to H do

   for j in Synapses to i th neuron do

   \({\hat{x}}_{i}+=({x}_{0}-{A}_{ij}){\mathrm{e}}^{\left[\left.-{t}_{x(t)}\odot \left(1/{\tau }_{i}+\frac{1}{1+{e}^{(-{\sigma }_{ij}({x}_{pr{e}_{ij}}-{\mu }_{ij}))}}\right)\right)\right]}\odot \frac{1}{1+{\mathrm{e}}^{({\sigma }_{ij}({x}_{\mathrm{pre}_{ij}}-{\mu }_{ij}))}}+{A}_{ij}\)

  return \(\hat{{{{\bf{x}}}}}(t)\)

Experimental details of the tightness experiment

We took a trained NCP 22 , which consists of a perception module and an LTC-based network 1 that possesses 19 neurons and 253 synapses. The network was trained to steer a self-driving vehicle autonomously. We used recorded real-world test runs of the vehicle for a lane-keeping task governed by this network. The records included the inputs, outputs and all the LTC neurons’ activities and parameters. To perform a numerical evaluation of our theory to determine whether our proposed closed-form solution for LTC neurons is good enough in practice as well, we inserted the parameters for individual neurons and synapses of the DEs into the closed-form solution (similar to the representations shown in Extended Data Fig. 1b,c ) and emulated the structure of the ODE-based LTC networks. We then visualized the output neuron’s dynamics of the ODE (in blue) and of the closed-form solution (in red). As illustrated in Fig. 2 , we observed that the behaviour of the ODE is captured by the closed-form solution with a mean squared error of 0.006. This experiment provides numerical evidence for the tightness results presented in our theory. Hence, the closed-form solution contains the main properties of liquid networks in approximating dynamics.

Baseline models

The example baseline models considered include some variations of classical auto-regressive RNNs, such as an RNN with concatenated Δ t (RNN-Δ t ), a recurrent model with moving average on missing values (RNN-impute), RNN-Decay 7 , LSTMs 44 and gated recurrent units (GRUs) 45 . We also report results for a variety of encoder–decoder ODE-RNN-based models, such as RNN-VAE, latent variable models with RNNs, and with ODEs, all from ref. 7 .

Furthermore, we include models such as interpolation prediction networks (IP-Net) 46 , set functions for time series (SeFT) 47 , CT-RNN 48 , CT-GRU 49 , CT-LSTM 50 , GRU-D 51 , PhasedLSTM 52 and bi-directional RNNs 53 . Finally, we benchmarked CfCs against competitive recent RNN architectures with the premise of tackling long-term dependences, such as Legandre memory units 54 , high-order polynomial projection operators (Hippo) 55 , orthogonal recurrent models (expRNNs) 56 , mixed memory RNNs such as ODE-LSTMs 9 , coupled oscillatory RNNs (coRNN) 57 and Lipschitz RNN 58 .

Experimental details for the Walker2D dataset

This task is designed based on the Walker2d-v2 OpenAI gym 59 environment using data from four different stochastic policies. The objective is to predict the physics state in the next time step. The training and testing sequences are provided at irregularly sampled intervals. We report the squared error on the test set as a metric.

Description of the event-based MNIST experiment

We first sequentialize each image by transforming each 28 × 28 image into a long series of length 784. The objective is to predict the class corresponding to each image from the long input sequence. Advanced sequence modelling frameworks such as coRNN 57 , Lipschitz RNN 58 and mixed memory ODE-LSTM 9 can solve this task very well with accuracy of up to 99.0%. However, we aim to make the task even more challenging by sparsifying the input vectors with event-like irregularly sampled mechanisms. To this end, in each vector input (that is, flattened image), we transform each consecutive occurrence of values into one event. For instance, within the long binary vector of an image, the sequence 1, 1, 1, 1 is transformed to (1,  t  = 4) (ref. 9 ). This way, sequences of length 784 are condensed into event-based irregularly sampled sequences of length 256 that are far more challenging to handle than equidistance input signals. A recurrent model now has to learn to memorize input information of length 256 while keeping track of the time lags between the events.

Description of the event-based XOR encoding experiment

The bit streams are provided in densely sampled and event-based sampled formats. The densely sampled version simply represents an incoming bit as an input event. The event-based sampled version transmits only bit changes to the network, that is, multiple equal bits are packed into a single input event. Consequently, the densely sampled variant is a regular sequence classification problem, whereas the event-based encoding variant represents an irregularly sampled sequence classification problem.

Experimental details of the IMDB dataset experiment

Each sentence corresponds to either positive or negative sentiment. We tokenize the sentences in a word-by-word fashion with a vocabulary consisting of the 20,000 words occurring most frequently in the dataset. We map each token to a vector using trainable word embedding. The word embedding is initialized randomly. No pretraining of the network or word embedding is performed.

Setting of the driving experiment

It has been shown that models based on LTC networks are more robust when trained on offline demonstrations and tested online in closed loop with their environments, in many end-to-end robot control tasks such as mobile robots 60 , autonomous ground vehicles 22 and autonomous aerial vehicles 24 , 61 . This robustness in decision-making (that is, their flexibility in learning and executing the task from demonstrations despite environmental or observational disturbances and distributional shifts) originates from their model semantics that formally reduces to dynamic causal models 20 , 24 . Intuitively, LTC-based networks learn to extract a good representation of the task they are given (for example, their attention maps indicate what representation they have learned to focus on the road with more attention to the road’s horizon) and maintain this understanding under heavy distribution shifts. An example is illustrated in Extended Data Fig. 10 .

In this experiment, we aim to investigate whether CfC models and their variants, such as CfC-mmRNN, possess this robustness characteristic (maintaining their attention map under distribution shifts and added noise), similar to their ODE counterparts (LTC-based networks called NCPs 22 ).

We start by training neural network architectures that possess a convolutional head stacked with the choice of RNN. The RNN compartment of the networks is replaced by LSTM networks, NCPs 22 , Cf-S, CfC-NoGate and CfC-mmRNN. We also trained a fully convolutional neural network for the sake of proper comparison. Our training pipeline followed an imitation learning approach with paired pixel-control data from a 30 Hz BlackFly PGE-23S3C red–green–blue camera, collected by a human expert driver across a variety of rural driving environments, including different times of day, weather conditions and seasons of the year. The original 3 h data set was further augmented to include off-orientation recovery data using a privileged controller 62 and a data-driven view synthesizer 63 . The privileged controller enabled the training of all networks using guided policy learning 64 . After training, all networks were transferred on-board our full-scale autonomous vehicle (Lexus RX450H, retrofitted with drive-by-wire capability). The vehicle was consistently started at the centre of the lane, initialized with each trained model and run to completion at the end of the road. If the model exited the bounds of the lane, a human safety driver intervened and restarted the model from the centre of the road at the intervention location. All models were tested with and without noise added to the sensory inputs to evaluate robustness.

The testing environment consisted of 1 km of private test road with unlabelled lane markers, and we observed that all trained networks were able to successfully complete the lane-keeping task at a constant velocity of 30 km h −1 . Extended Data Fig. 10 provides an insight into how these networks reach driving decisions. To this end, we computed the attention of each network while driving by using the VisualBackProp algorithm 65 .

Related works on continuous-time models

Continuous-time models.

Machine learning, control theory and dynamical systems merge at models with continuous-time dynamics 60 , 66 , 67 , 68 , 69 . In a seminal work, Chen et al. 2 , 7 revived the class of continuous-time neural networks 48 , 70 , with neural ODEs. These continuous-depth models give rise to vector field representations and a set of functions that were not possible to generate before with discrete neural networks. These capabilities enabled flexible density estimation 3 , 4 , 5 , 71 , 72 as well as performant modelling of sequential and irregularly sampled data 1 , 7 , 8 , 9 , 58 . In this paper, we showed how to relax the need for an ODE solver to realize an expressive continuous-time neural network model for challenging time-series problems.

Improving neural ODEs

ODE-based neural networks are as good as their ODE solvers. As the complexity or the dimensionality of the modelling task increases, ODE-based networks demand a more advanced solver that largely impacts their efficiency 17 , stability 13 , 15 , 73 , 74 , 75 and performance 1 . A large body of research has studied how to improve the computational overhead of these solvers, for example, by designing hypersolvers 17 , deploying augmentation methods 4 , 12 , pruning 6 or regularizing the continuous flows 14 , 15 , 16 . To enhance the performance of an ODE-based model, especially in time-series modelling tasks 76 , solutions for stabilizing their gradient propagation have been provided 9 , 58 , 77 . In this work, we showed that CfCs improve the scalability, efficiency and performance of continuous-depth neural models.

Which CfC variants to choose in different applications

Our extensive experimental results demonstrate that different CfC variants, namely Cf-S, CfC-noGate, vanilla CfC and CfC-mmRNN, achieve comparable results to each other while one comes on top depending on the nature of the data set. We suggest using CfC in most cases where the sequence length is up to a couple of hundred steps. To capture longer-range dependences, we recommend CfC-mmRNN. The Cf-S variant is effective when we aim to obtain the fastest inference time. CfC-noGate could be tested as a hyperparameter when using the vanilla CfC as the primary choice of model.

Description of hyperparameters

The hyperparameters used in our experimental results are as follows:

clipnorm: the gradient clipping norm (that is, the global norm clipping threshold)

optimizer: the weight update preconditioner (for example, Adam, Stochastic Gradient Descent with momentum, etc.)

batch_size: the number of samples used to compute the gradients

hidden size: the number of RNN units

epochs: the number of passes over the training dataset

base_lr: the initial learning rate

decay_lr: the factor by which the learning rate is multiplied after each epoch

backbone_activation: the activation function of the backbone layers

backbone_dr: the dropout rate of the backbone layers

forget_bias: the forget gate bias (for mmRNN and LSTM)

backbone_units: the number of hidden units per backbone layer

backbone_layers: the number of backbone layers

weight_decay: the L2 weight regularization factor

τ data : the constant factor by which the elapsed time input is multiplied (default value 1)

init: the gain of the Xavier uniform distribution for the weight initialization (default value 1)

Data availability

All data and materials used in the analysis are openly available at https://github.com/raminmh/CfC under an Apache 2.0 license for the purposes of reproducing and extending the analysis.

Code availability

All code and materials used in the analysis are openly available at https://github.com/raminmh/CfC under an Apache 2.0 license for the purposes of reproducing and extending the analysis ( https://doi.org/10.5281/zenodo.7135472 ).

Change history

29 november 2022.

A Correction to this paper has been published: https://doi.org/10.1038/s42256-022-00597-y

Hasani, R., Lechner, M., Amini, A., Rus, D. & Grosu, R. Liquid time-constant networks. In Proc. of AAAI Conference on Artificial Intelligence 35(9), 7657–7666 (AAAI, 2021).

Chen, T. Q., Rubanova, Y., Bettencourt, J. & Duvenaud, D. K. Neural ordinary differential equations. In Proc. of Advances in Neural Information Processing Systems (Eds. Bengio, S. et al.) 6571–6583 (NeurIPS, 2018).

Grathwohl, W., Chen, R. T., Bettencourt, J., Sutskever, I. & Duvenaud, D. Ffjord: free-form continuous dynamics for scalable reversible generative models. In International Conference on Learning Representations (2018). https://openreview.net/forum?id=rJxgknCcK7

Dupont, E., Doucet, A. & Teh, Y. W. Augmented neural ODEs. In Proc. of Advances in Neural Information Processing Systems (Eds. Wallach, H. et al.) 3134–3144 (NeurIPS, 2019).

Yang, G. et al. Pointflow: 3D point cloud generation with continuous normalizing flows. In Proc. of the IEEE/CVF International Conference on Computer Vision 4541–4550 (IEEE, 2019).

Liebenwein, L., Hasani, R., Amini, A. & Daniela, R. Sparse flows: pruning continuous-depth models. In Proc. of Advances in Neural Information Processing Systems (Eds. Ranzato, M. et al.) 22628–22642 (NeurIPS, 2021).

Rubanova, Y., Chen, R. T. & Duvenaud, D. Latent Neural ODEs for irregularly-sampled time series. In Proc. of Advances in Neural Information Processing Systems (Eds. Wallach, H. et al.) 32 (NeurIPS, 2019).

Gholami, A., Keutzer, K. & Biros, G. ANODE: unconditionally accurate memory-efficient gradients for neural ODEs. In Proceedings of the 28th International Joint Conference on Artificial Intelligence 730–736 (IJCAI, 2019).

Lechner, M. & Hasani, R. Learning long-term dependencies in irregularly-sampled time series. Preprint at https://arxiv.org/abs/2006.04418 (2020).

Prince, P. J. & Dormand, J. R. High order embedded Runge–Kutta formulae. J. Comput. Appl. Math. 7 , 67–75 (1981).

Article   MathSciNet   MATH   Google Scholar  

Raissi, M., Perdikaris, P. & Karniadakis, G. E. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378 , 686–707 (2019).

Massaroli, S., Poli, M., Park, J., Yamashita, A. & Asma, H. Dissecting neural ODEs. In Proc. of 33th Conference on Neural Information Processing Systems (Eds. Larochelle, H. et al.) (NeurIPS , 2020).

Bai, S., Kolter, J. Z. & Koltun, V. Deep equilibrium models. Adv. Neural Inform. Process. Syst. 32 , 690–701 (2019).

Google Scholar  

Finlay, C., Jacobsen, J.-H., Nurbekyan, L. & Oberman, A. M. How to train your neural ODE: the world of Jacobian and kinetic regularization. In International Conference on Machine Learning (Eds. Daumé III, H. & Singh, A.) 3154–3164 (PMLR, 2020).

Massaroli, S. et al. Stable Neural Flows. Preprint at https://arxiv.org/abs/2003.08063 (2020).

Kidger, P., Chen, R. T. & Lyons, T. “Hey, that’s not an ODE”: Faster ODE Adjoints via Seminorms. In Proceedings of the 38th International Conference on Machine Learning (Eds. Meila, M. & Zhang, T.) 139 (PMLR, 2021).

Poli, M. et al. Hypersolvers: toward fast continuous-depth models. In Proc. of Advances in Neural Information Processing Systems (Eds. Larochelle, H.) 21105–21117 (NeurIPS, 2020).

Schumacher, J., Haslinger, R. & Pipa, G. Statistical modeling approach for detecting generalized synchronization. Phys. Rev. E 85 , 056215 (2012).

Article   Google Scholar  

Moran, R., Pinotsis, D. A. & Friston, K. Neural masses and fields in dynamic causal modeling. Front. Comput. Neurosci. 7 , 57 (2013).

Friston, K. J., Harrison, L. & Penny, W. Dynamic causal modelling. Neuroimage 19 , 1273–1302 (2003).

Perko, L. Differential Equations and Dynamical Systems (Springer-Verlag, 1991).

Book   MATH   Google Scholar  

Lechner, M. et al. Neural circuit policies enabling auditable autonomy. Nat. Mach. Intell. 2 , 642–652 (2020).

Hochreiter, S. Untersuchungen zu dynamischen neuronalen netzen . Diploma, Technische Universität München 91 (1991).

Vorbach, C., Hasani, R., Amini, A., Lechner, M. & Rus, D. Causal navigation by continuous-time neural networks. In Proc. of Advances in Neural Information Processing Systems (Eds. Ranzato, M. et al.) 12425–12440 (NeurIPS, 2021).

Hasani, R. et al. Response characterization for auditing cell dynamics in long short-term memory networks. In Proc. of 2019 International Joint Conference on Neural Networks 1–8 (IEEE, 2019).

Anguita, D., Ghio, A., Oneto, L., Parra Perez, X. & Reyes Ortiz, J. L. A public domain dataset for human activity recognition using smartphones. In Proc. of the 21st International European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning 437–442 (i6doc, 2013).

Todorov, E., Erez, T. & Tassa, Y. MuJoCo: a physics engine for model-based control. In Proc. of 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems 5026–5033 (IEEE, 2012).

Maas, A. et al. Learning word vectors for sentiment analysis. In Proc. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies 142–150 (ACM, 2011).

Lu, L., Jin, P., Pang, G., Zhang, Z. & Karniadakis, G. E. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nat. Mach. Intell. 3 , 218–229 (2021).

Karniadakis, G. E. et al. Physics-informed machine learning. Nat. Rev. Phys. 3 , 422–440 (2021).

Wang, S., Wang, H. & Perdikaris, P. Learning the solution operator of parametric partial differential equations with physics-informed deeponets. Sci. Adv. 7 , eabi8605 (2021).

Rezende, D. & Mohamed, S. Variational inference with normalizing flows. In Proc. of International Conference on Machine Learning (Eds. Bach, F. & Blei, D.) 1530–1538 (PMLR, 2015).

Gu, A., Goel, K. & Re, C. Efficiently modeling long sequences with structured state spaces. In Proc. of International Conference on Learning Representations (2022). https://openreview.net/forum?id=uYLFoz1vlAC

Hasani, R. et al. Liquid structural state-space models. Preprint at https://arxiv.org/abs/2209.12951 (2022).

Grunbacher, S. et al. On the verification of neural ODEs with stochastic guarantees. Proc. AAAI Conf. Artif. Intell. 35 , 11525–11535 (2021).

Vaswani, A. et al. Attention is all you need. In Proc. of Advances in Neural Information Processing Systems (Eds. Guyon, I. et al.) 5998–6008 (NeurIPS, 2017).

Lechner, M., Hasani, R., Grosu, R., Rus, D. & Henzinger, T. A. Adversarial training is not ready for robot learning. In 2021 IEEE International Conference on Robotics and Automation (ICRA) 4140–4147 (IEEE, 2021).

Brunnbauer, A. et al. Latent imagination facilitates zero-shot transfer in autonomous racing. In 2022 International Conference on Robotics and Automation (ICRA) 7513–7520 (IEEE, 2021).

Hasani, R. M., Haerle, D. & Grosu, R. Efficient modeling of complex analog integrated circuits using neural networks. In Proc. of 12th Conference on Ph.D. Research in Microelectronics and Electronics 1–4 (IEEE, 2016).

Wang, G., Ledwoch, A., Hasani, R. M., Grosu, R. & Brintrup, A. A generative neural network model for the quality prediction of work in progress products. Appl. Soft Comput. 85 , 105683 (2019).

DelPreto, J. et al. Plug-and-play supervisory control using muscle and brain signals for real-time gesture and error detection. Auton. Robots 44 , 1303–1322 (2020).

Hasani, R. Interpretable Recurrent Neural Networks in Continuous-Time Control Environments . PhD dissertation, Technische Univ. Wien (2020).

Rudin, W. Principles of Mathematical Analysis, 3rd edn. (McGraw-Hill, 1976).

Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9 , 1735–1780 (1997).

Chung, J., Gulcehre, C., Cho, K. & Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. Preprint at https://arxiv.org/abs/1412.3555 (2014).

Shukla, S. N. & Marlin, B. Interpolation–prediction networks for irregularly sampled time series. In Proc. of International Conference on Learning Representations (2018). https://openreview.net/forum?id=r1efr3C9Ym

Horn, M., Moor, M., Bock, C., Rieck, B. & Borgwardt, K. Set functions for time series. In Proc. of International Conference on Machine Learning (Eds. Daumé III, H. & Singh, A.) 4353–4363 (PMLR, 2020).

Funahashi, K.-i & Nakamura, Y. Approximation of dynamical systems by continuous time recurrent neural networks. Neural Netw. 6 , 801–806 (1993).

Mozer, M. C., Kazakov, D. & Lindsey, R. V. Discrete event, continuous time RNNs. Preprint at https://arxiv.org/abs/1710.04110 (2017).

Mei, H. & Eisner, J. The neural Hawkes process: a neurally self-modulating multivariate point process. In Proc. of 31st International Conference on Neural Information Processing Systems (Eds. Guyon, I. et al.) 6757–6767 (Curran Associates Inc., 2017).

Che, Z., Purushotham, S., Cho, K., Sontag, D. & Liu, Y. Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 8 , 1–12 (2018).

Neil, D., Pfeiffer, M. & Liu, S.-C. Phased LSTM: accelerating recurrent network training for long or event-based sequences. In Proc. of 30th International Conference on Neural Information Processing Systems (Eds. Lee, D. D. et al.) 3889–3897 (Curran Associates Inc., 2016).

Schuster, M. & Paliwal, K. K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45 , 2673–2681 (1997).

Voelker, A. R., Kajić, I. & Eliasmith, C. Legendre memory units: continuous-time representation in recurrent neural networks. In Proceedings of the 33rd International Conference on Neural Information Processing Systems (Eds. Wallach, H. et al.) 15570–15579 (ACM, 2019).

Gu, A., Dao, T., Ermon, S., Rudra, A. & Ré, C. Hippo: recurrent memory with optimal polynomial projections. In Proc. of Advances in Neural Information Processing Systems (Eds. Larochelle, H. et al.) 1474–1487 (NeurIPS, 2020).

Lezcano-Casado, M. & Martınez-Rubio, D. Cheap orthogonal constraints in neural networks: a simple parametrization of the orthogonal and unitary group. In Proc. of International Conference on Machine Learning (Eds. Chaudhuri, K. & Salakhutdinov, R.) 3794–3803 (PMLR, 2019).

Rusch, T. K. & Mishra, S. Coupled oscillatory recurrent neural network (coRNN): an accurate and (gradient) stable architecture for learning long time dependencies. In Proc. of International Conference on Learning Representations (2021). https://openreview.net/forum?id=F3s69XzWOia

Erichson, N. B., Azencot, O., Queiruga, A., Hodgkinson, L. & Mahoney, M. W. Lipschitz recurrent neural networks. In Proc. of International Conference on Learning Representations (2021). https://openreview.net/forum?id=-N7PBXqOUJZ

Brockman, G. et al. OpenAI gym. Preprint at https://arxiv.org/abs/1606.01540 (2016).

Lechner, M., Hasani, R., Zimmer, M., Henzinger, T. A. & Grosu, R. Designing worm-inspired neural networks for interpretable robotic control. In Proc. of International Conference on Robotics and Automation 87–94 (IEEE, 2019).

Tylkin, P. et al. Interpretable autonomous flight via compact visualizable neural circuit policies. IEEE Robot. Autom. Lett. 7 , 3265–3272 (2022).

Amini, A. et al. Vista 2.0: An open, data-driven simulator for multimodal sensing and policy learning for autonomous vehicles. In 2022 International Conference on Robotics and Automation (ICRA) 2419–2426 (IEEE, 2022).

Amini, A. et al. Learning robust control policies for end-to-end autonomous driving from data-driven simulation. IEEE Robot. Autom. Lett. 5 , 1143–1150 (2020).

Levine, S. & Koltun, V. Guided policy search. In Proc. of International Conference on Machine Learning (Eds. Dasgupta, S. & McAllester, D.) 1–9 (PMLR, 2013).

Bojarski, M. et al. VisualBackProp: efficient visualization of CNNs for autonomous driving. In Proc. of IEEE International Conference on Robotics and Automation 1–8 (IEEE, 2018).

Zhang, H., Wang, Z. & Liu, D. A comprehensive review of stability analysis of continuous-time recurrent neural networks. IEEE Trans. Neural Netw. Learn. Syst 25 , 1229–1262 (2014).

Weinan, E. A proposal on machine learning via dynamical systems. Commun. Math. Stat. 5 , 1–11 (2017).

Lu, Z., Pu, H., Wang, F., Hu, Z. & Wang, L. The expressive power of neural networks: a view from the width. In Proc. of Advances in Neural Information Processing Systems (Eds. Guyon, I. et al.) 30 (Curran Associates, Inc 2017).

Li, Q., Chen, L., Tai, C. et al. Maximum principle based algorithms for deep learning. J. Mach. Learn. Res. 18 , 5998–6026 (2018).

Cohen, M. A. & Grossberg, S. Absolute stability of global pattern formation and parallel memory storage by competitive neural networks. IEEE Trans. Syst. Man Cybern. 5 , 815–826 (1983).

Mathieu, E. & Nickel, M. Riemannian continuous normalizing flows. In Proc. of Advances in Neural Information Processing Systems Vol. 33 (eds Larochelle et al.) 2503–2515 (Curran Associates, Inc., 2020).

Hodgkinson, L., van der Heide, C., Roosta, F. & Mahoney, M. W. Stochastic normalizing flows. In Proc. of Advances in Neural Information Processing Systems (Eds. Larochelle, H. et al.) 5933–5944 (NeurIPS, 2020).

Haber, E., Lensink, K., Treister, E. & Ruthotto, L. IMEXnet a forward stable deep neural network. In Proc . of International Conference on Machine Learning (Eds. Chaudhuri, K. & Salakhutdinov, R.) 2525–2534 (PMLR, 2019).

Chang, B., Chen, M., Haber, E. & Chi, E. H. AntisymmetricRNN: a dynamical system view on recurrent neural networks. In International Conference on Learning Representations (2018). https://openreview.net/forum?id=ryxepo0cFX

Lechner, M., Hasani, R., Rus, D. & Grosu, R. Gershgorin loss stabilizes the recurrent neural network compartment of an end-to-end robot learning scheme. In Proc. of IEEE International Conference on Robotics and Automation 5446–5452 (IEEE, 2020).

Gleeson, P., Lung, D., Grosu, R., Hasani, R. & Larson, S. D. c302: a multiscale framework for modelling the nervous system of Caenorhabditis elegans . Philos.Trans. R. Soc. B 373 , 20170379 (2018).

Li, X., Wong, T.-K. L., Chen, R. T. & Duvenaud, D. Scalable gradients for stochastic differential equations. In Proc. of International Conference on Artificial Intelligence and Statistics 3870–3882 (PMLR, 2020).

Shukla, S. N. & Marlin, B. M. Multi-time attention networks for irregularly sampled time series. In International Conference on Learning Representations (2020). https://openreview.net/forum?id=4c0J6lwQ4_

Xiong, Y. et al. Nyströmformer: a Nyström-based algorithm for approximating self-attention. In Proceedings of the AAAI Conference on Artificial Intelligence Vol. 35, No. 16, pp. 14138–14148 (AAAI, 2021).

Download references

Acknowledgements

This research was supported in part by the AI2050 program at Schmidt Futures (grant G-22-63172), the Boeing Company, and the United States Air Force Research Laboratory and the United States Air Force Artificial Intelligence Accelerator and was accomplished under cooperative agreement number FA8750-19-2-1000. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the United States Air Force or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes, notwithstanding any copyright notation herein. This work was further supported by The Boeing Company and Office of Naval Research grant N00014-18-1-2830. M.T. is supported by the Poul Due Jensen Foundation, grant 883901. M.L. was supported in part by the Austrian Science Fund under grant Z211-N23 (Wittgenstein Award). A.A. was supported by the National Science Foundation Graduate Research Fellowship Program. We thank T.-H. Wang, P. Kao, M. Chahine, W. Xiao, X. Li, L. Yin and Y. Ben for useful suggestions and for testing of CfC models to confirm the results across other domains.

Author information

These authors contributed equally: Ramin Hasani, Mathias Lechner.

Authors and Affiliations

Massachusetts Institute of Technology, Cambridge, MA, USA

Ramin Hasani, Mathias Lechner, Alexander Amini, Lucas Liebenwein, Aaron Ray & Daniela Rus

Institute of Science and Technology Austria, Klosterneuburg, Austria

Mathias Lechner

Aalborg University, Aalborg, Denmark

Max Tschaikowski

University of Vienna, Vienna, Austria

Gerald Teschl

You can also search for this author in PubMed   Google Scholar

Contributions

R.H. and M.L. conceptualized, proved theory, designed, performed research and analysed data. A.A. contributed to designing research, data curation, research implementation, new analytical tools and analysed data. L.L. and A.R. contributed to the refinement of the theory and research implementation. M.T. and G.T. proved theory and analysed correctness. D.R. helped with the design of the research, and guided and supervised the work. All authors wrote the paper.

Corresponding author

Correspondence to Ramin Hasani .

Ethics declarations

Competing interests.

The authors declare no competing interest.

Peer review

Peer review information.

Nature Machine Intelligence thanks Karl Friston and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended data fig. 1 instantiation of ltcs in ode and closed-form representations..

a) A sample LTC network with two nodes and five synapses. b) the ODE representation of this two-neuron system. c) the approximate closed-form representation of the network.

Extended Data Fig. 2 Closed-form Continuous-depth neural architecture.

A backbone neural network layer delivers the input signals into three head networks g, f and h. f acts as a liquid time-constant for the sigmoidal time-gates of the network. g and h construct the nonlinearities of the overall CfC network.

Extended Data Fig. 3 Hyperparameters for Human activity and Walker.

List of hyperparameters used to obtain results in Human activity and Walker2D Experiments.

Extended Data Fig. 4 Hyperparameters for ET-sMNIST and Bit-stream XOR.

List of hyperparameters used to obtain results in Event-based MNIST and Bit-stream XOR Experiments.

Extended Data Fig. 5 Bit-stream XOR sequence classification.

The performance values (accuracy %) for all baseline models are reproduced from 9 . Numbers present mean ± standard deviations, (n=5). Note : The performance of models marked by † are reported from 9 . Bold declares highest accuracy and best time per epoch (min).

Extended Data Fig. 6 PhysioNet.

AUC stands for area under curve. Numbers present mean ± standard deviations, (n=5). Note: The performance of the models marked by † are reported from 7 and the ones with * from 78 . Bold declares highest AUC score and best time per epoch (min).

Extended Data Fig. 7 Hyperparameters for Physionet and IMDB.

List of hyperparameters used to obtain results in Physionet and IMDB sentiment classification experiments.

Extended Data Fig. 8 Results on the IMDB datasets.

The experiment is performed without any pretraining or pretrained word-embeddings. Thus, we excluded advanced attention-based models 78 , 79 such as Transformers 36 and RNN structures that use pretraining. Numbers present mean ± standard deviations, (n=5). Note: The performance of the models marked by † are reported from 55 , and * are reported from 57 . The n/a standard deviation denotes that the original report of these experiments did not provide the statistics of their analysis. Bold declares highest accuracy and best time per epoch (min).

Extended Data Fig. 9 Lane-keeping models’ parameter count.

CfC and NCP networks perform lane-keeping in unseen scenarios with a compact representation.

Extended Data Fig. 10 Attention Profile of networks.

Trained networks receive unseen inputs (first column in each tab) and generate acceleration and steering commands. We use the Visual-Backprop algorithm 65 to compute the saliency maps of the convolutional part of each network. a) results for networks tested on data collected in summer. b) results for networks tested on data collected in winter. c) results for inputs corrupted by a zero-mean Gaussian noise with variance, σ 2  = 0.35.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Hasani, R., Lechner, M., Amini, A. et al. Closed-form continuous-time neural networks. Nat Mach Intell 4 , 992–1003 (2022). https://doi.org/10.1038/s42256-022-00556-7

Download citation

Received : 23 March 2022

Accepted : 05 October 2022

Published : 15 November 2022

Issue Date : November 2022

DOI : https://doi.org/10.1038/s42256-022-00556-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Efficient edge-ai models for robust ecg abnormality detection on resource-constrained hardware.

  • Zhaojing Huang
  • Luis Fernando Herbozo Contreras
  • Omid Kavehei

Journal of Cardiovascular Translational Research (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

recurrent neural network thesis

  • DOI: 10.5075/EPFL-THESIS-2366
  • Corpus ID: 144707025

Long short-term memory in recurrent neural networks

  • Felix Alexander Gers
  • Published 2001
  • Computer Science

Figures from this paper

figure 5.11

256 Citations

Compositional distributional semantics with long short term memory, on the importance of sluggish state memory for learning long term dependency, analysis of multi layer neuronal networks modeling and long short-term memory., using of recurrent neural networks (rnn) process, language identification in short utterances using long short-term memory (lstm) recurrent neural networks, a critical review of recurrent neural networks for sequence learning.

  • Highly Influenced

Gated Orthogonal Recurrent Units: On Learning to Forget

Ccg supertagging with bidirectional long short-term memory networks*, rapid retraining on speech data with lstm recurrent networks., application of lstm neural networks in language modelling, 56 references, learning precise timing with lstm recurrent networks, long short-term memory learns context free and context sensitive languages, a sequential adder using recurrent networks, language identification from prosody without explicit features, prediction of chaotic time series with neural networks, time warping invariant neural networks, dynamic modelling of chaotic time series with neural networks, a new evolutionary system for evolving artificial neural networks, using descriptions of trees in a tree adjoining grammar, related papers.

Showing 1 through 3 of 0 Related Papers

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

Master Thesis - Transfer Learning for Mapless Quadrotor Navigation Using Recurrent Neural Network

LiyuanHsu/Master-Thesis

Folders and files.

NameName
87 Commits

Repository files navigation

Project title.

Transfer Learning for Mapless Quadrotor Navigation Using Recurrent Neural Network PDF

We propose two deep recurrent neural network architectures (reinforcement learning and supervised learning) to solve quadrotor obstacle avoidance and navigation problems. First, trainingthese neural networks only in simulation environment, they are able to directly transfer into realworld without any fine-tuning. Both models achieve navigation tasks with success rate over 90%. Second, we show the generalization ability of these models. Training on few simple environments and transferring directly into unseen complex environments, both models perform navigation success rate up to 90%.

Getting Started

There are three parts in this project:

  • 1D navigation
  • 2D navigation

Environment generalization

We train the navigation model in follwoing two environments:

  • Training Reinforcement learning navigation model (A3C_LSTM) in OpenAI Gym environment
  • Training Supervised learning navigation model (SL_LSTM) by state-action pairs collecting from turtlebot_flatland environment

Prerequisites

  • flatland - Ros depository for map and robot setup for 2D flatland navigation environment
  • turtlebot_flatland - Turtlebot navigation demo under flatland
  • OpenAI/Gym - A toolkit for developing and comparing reinforcement learning algorithms
  • sweep-ros - LIDAR sensor ROS driver
  • bebop-autonomy - ROS driver for Parrot Bebop drone (quadrocopter)
  • vicon_bridge - A driver providing data from VICON motion capture systems
  • joystick_drivers - ROS joystick driver
  • bebop_vicon_ctrl - Ros setup for control bebop in vicon envionemnt

Repositories

  • globalplan_record : Code for record the navigation actions from ros-navigation planner
  • lidar : Connection and plotting LIDAR distance readings
  • envs : Environments for training reinforcement learning navigation model under OpenAI/Gym
  • Grid_RL_cylinder4_rectangle1_wall1 : Train the 2D navigation model using reinforcemente learning with combination of three simple maps under grid world
  • Grid_RL_cylinder4_rectangle1_wall1 : Train the 2D navigation model using supervised learning with combination of three simple maps under grid world

Map for turtleble_flatland (for supervised learning)

  • Cyliner_map -> cylinder_map1.png
  • Rectangle_map -> rectangle_map1.png
  • Wall_map -> wall_map1.png
  • Office_map -> complex_map3.png
  • Street_map -> complex_map5_2.png
  • Forest_map -> complex_map6.png

Map for OpenAI Gym (for reinforcement learning)

  • Cyliner_map -> planner_cylinder_map4.py
  • Rectangle_map -> planner_rectangle_map1.py
  • Wall_map -> planner_wall_map1.py
  • Cylinder_map + Rectangle_map -> planner_cylinder4_rectangle1.py
  • Cylinder_map + Wall_map -> planner_cylinder4_wall1.py
  • Rectangle_map + Wall_map -> planner_rectangle1_wall1.py
  • Cylinder_map + Rectangle_map + Wall_map -> planner_cylinder4_rectangle1_wall1.py
  • Office_map -> planner_complex_map4.py
  • Street_map -> planner_complex_map5_1.py
  • Forest_map -> planner_complex_map6.py

1D navigation - simulation

1d navigation - real world, 2d navigation - simulation.

recurrent neural network thesis

Acknowledgments

  • Stefan Stevsic, ETH Zürich
  • Prof. Dr. Otmar Hilliges, ETH Zürich
  • Prof. Dr. Moritz Diehl, University of Freiburg

Contributors 2

  • Python 93.1%

ACM Digital Library home

  • Advanced Search

A Study on the Performance of Recurrent Neural Network based Models in Maithili Part of Speech Tagging

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, index terms.

Computing methodologies

Artificial intelligence

Natural language processing

Recommendations

Toward enhanced arabic speech recognition using part of speech tagging.

One major source of suboptimal performance in automatic continuous speech recognition systems is misrecognition of small words. In general, errors resulting from small words are much more than errors resulting from long words. Therefore, compounding ...

Towards the first Maithili part of speech tagger: Resource creation and system development

Part of speech (POS) tagging for the Indian language Maithili is not an explored territory. There have been substantial efforts at developing POS taggers in several Indian languages including Hindi, Bengali, Tamil, Telugu, Kannada, ...

Enhancing recurrent neural network-based language models by word tokenization

Different approaches have been used to estimate language models from a given corpus. Recently, researchers have used different neural network architectures to estimate the language models from a given corpus using unsupervised learning neural networks ...

Information

Published in.

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

Google, USA

Association for Computing Machinery

New York, NY, United States

Publication History

Permissions, check for updates, author tags.

  • Part of speech tagging
  • Maithili language
  • neural model for NLP
  • Recurrent neural network
  • Research-article

Funding Sources

  • Science and Engineering Research Board, India

Contributors

Other metrics, bibliometrics, article metrics.

  • 0 Total Citations
  • 273 Total Downloads
  • Downloads (Last 12 months) 104
  • Downloads (Last 6 weeks) 0

View Options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

View options.

View or Download as a PDF file.

View online with eReader .

View this article in Full Text.

HTML Format

View this article in HTML Format.

Share this Publication link

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

applsci-logo

Article Menu

recurrent neural network thesis

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Drl-srs: a deep reinforcement learning approach for optimizing spaced repetition scheduling.

recurrent neural network thesis

1. Introduction

2. related work, 2.1. traditional spaced repetition algorithms, 2.2. reinforcement learning for spaced repetition, 3. background, 3.1. models for human memory, 3.1.1. exponential forgetting curve, 3.1.2. half-life regression, 3.1.3. difficulty–half-life–p(recall) hlr, 3.2. reinforcement learning, 4. methodology, 4.1. problem formulation.

  • State space. The state space S depends on the memory model. For EFC, the state space encodes the item difficulty, the time delay, and the memory strength. For DHP, the states include the difficulty, the time delay, and the recall result.
  • Action space. The action space A = { 1 , 2 , ⋯ , T } consists of review intervals ranging from 1 to the maximum interval T . The action a t ∈ A indicates that the current item was scheduled a t days ago after its review at time t .
  • Observation space. Away from the internal memory states, the agent can only obtain observations O including the time delay, the recall result, and the recall probability no matter the choice of memory model.
  • Reward function. Following previous works [ 14 , 15 , 17 ], we construct the reward function using the recall probability of the current item, which is provided by the memory model M .

4.2. Transformer-Based Half-Life Regression

  • Previous recall results r 1 : t − 1 ;
  • Previous recall probabilities p 1 : t − 1 ;
  • Previous review intervals Δ 1 : t − 1 .

4.3. Schedule Optimization Environment

The simulation process of the environment

4.4. Reinforcement-Learning-Based Spaced Repetition Optimization

4.4.1. dqn algorithm, 4.4.2. recurrent-style planning, 5. experiments, 5.1. environments, 5.2. memory prediction, 5.2.1. baselines.

  • Pimsleur [ 4 ] is a pioneer work in introducing an initial scheduling system that utilized a geometric progression, characterized by a common ratio of five.
  • Leitner [ 5 ] is a method using tangible containers, which manages the frequency of flashcard reviews by transferring them between boxes of differing dimensions.
  • HLR [ 8 ] proposes a parameter to measure the storage strength of memory and gives the recall probability according to the half-life and the time interval.
  • DHP-HLR [ 9 ] is a variant of HLR that considers the item difficulty.
  • GRU-HLR [ 10 ] is an improved version of DHP-HLR that uses recurrent neural networks to update internal parameters.

5.2.2. Evaluation Metrics and Experimental Settings

5.2.3. results and analysis, 5.3. schedule optimization, 5.3.1. baselines.

  • RANDOM is a baseline that chooses a random interval from [ 1 , halflife ] to schedule the next review.
  • ANKI [ 7 ] is a variant of the SM-2 algorithm, which is used in a popular learning application.
  • MEMORIZE [ 20 ] is a spaced repetition algorithm based on optimal control that is trained to determine the parameter to minimize the expectation of review cost.
  • HLR [ 8 ] is a baseline to schedule the next review interval equal to the half-life.
  • EFC [ 3 ] is one of the most recognized memory models that illustrates the relationship between the forgetting curve and the time decay.
  • DHP [ 9 ] is a memory model with the Markov property for explainability and simplicity and handcrafted state-transition equations. It takes half-life, recall probability, recall result, and item difficulty as state variables.
  • THLR is the simulation environment proposed in this work.

5.3.2. Evaluation Metrics and Experimental Settings

5.3.3. results and analysis, 6. conclusions, author contributions, data availability statement, acknowledgments, conflicts of interest.

  • Cepeda, N.J.; Vul, E.; Rohrer, D.; Wixted, J.T.; Pashler, H. Spacing effects in learning: A temporal ridgeline of optimal retention. Psychol. Sci. 2008 , 19 , 1095–1102. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Finke, L. Markov Models for Spaced Repetition Learning ; Mathematisches Institut der Georg-August-Universität Göttingen: Göttingen, Germany, 2023. [ Google Scholar ]
  • Ebbinghaus, H. Memory: A contribution to experimental psychology. Ann. Neurosci. 2013 , 20 , 155. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Pimsleur, P. A memory schedule. Mod. Lang. J. 1967 , 51 , 73–75. [ Google Scholar ] [ CrossRef ]
  • Leitner, S.; Totter, R. So Lernt Man Lernen ; Angewandte Lernpsychologie ein Weg zum Erfolg; Herder: Freiburg, Germany, 1972. [ Google Scholar ]
  • Woźniak, P.; Gorzelańczyk, E. Optimization of repetition spacing in the practice of learning. Acta Neurobiol. Exp. 1994 , 54 , 59–62. [ Google Scholar ] [ CrossRef ]
  • Lu, M.; Farhat, J.H.; Beck Dallaghan, G.L. Enhanced learning and retention of medical knowledge using the mobile flash card application Anki. Med. Sci. Educ. 2021 , 31 , 1975–1981. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Settles, B.; Meeder, B. A trainable spaced repetition model for language learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7–12 August 2016; pp. 1848–1858. [ Google Scholar ]
  • Ye, J.; Su, J.; Cao, Y. A stochastic shortest path algorithm for optimizing spaced repetition scheduling. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 4381–4390. [ Google Scholar ]
  • Su, J.; Ye, J.; Nie, L.; Cao, Y.; Chen, Y. Optimizing spaced repetition schedule by capturing the dynamics of memory. IEEE Trans. Knowl. Data Eng. 2023 , 35 , 10085–10097. [ Google Scholar ] [ CrossRef ]
  • Pinto, J.D.; Paquette, L. Deep Learning for Educational Data Science. arXiv 2024 , arXiv:2404.19675. [ Google Scholar ]
  • Taherisadr, M.; Stavroulakis, S.A.; Elmalaki, S. adaPARL: Adaptive privacy-aware reinforcement learning for sequential decision making human-in-the-loop systems. In Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation, San Antonio, TX, USA, 9–12 May 2023; pp. 262–274. [ Google Scholar ]
  • Gharbi, H.; Elaachak, L.; Fennan, A. Reinforcement Learning Algorithms and Their Applications in Education Field: A Systematic Review. In Proceedings of the International Conference on Smart City Applications; Springer: Berlin/Heidelberg, Germany, 2023; pp. 410–418. [ Google Scholar ]
  • Reddy, S.; Levine, S.; Dragan, A. Accelerating human learning with deep reinforcement learning. In Proceedings of the NIPS Workshop: Teaching Machines, Robots, and Humans, Long Beach, CA, USA, 9 December 2017. [ Google Scholar ]
  • Sinha, S. Using Deep Reinforcement Learning for Personalizing Review Sessions on E-Learning Platforms with Spaced Repetition. Master’s Thesis, KTH, School of Electrical Engineering and Computer Science (EECS), Stockholm, Sweeden, 2019. [ Google Scholar ]
  • Upadhyay, U.; De, A.; Gomez Rodriguez, M. Deep reinforcement learning of marked temporal point processes. In Proceedings of the 32nd Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; p. 31. [ Google Scholar ]
  • Yang, Z.; Shen, J.; Liu, Y.; Yang, Y.; Zhang, W.; Yu, Y. TADS: Learning Time-Aware Scheduling Policy with Dyna-Style Planning for Spaced Repetition. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20), New York, NY, USA, 25–30 July 2020; pp. 1917–1920. [ Google Scholar ] [ CrossRef ]
  • Sutton, R.S.; Szepesvari, C.; Geramifard, A.; Bowling, M.P. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping. arXiv 2012 , arXiv:1206.3285. [ Google Scholar ]
  • Zhao, G.; Huang, Z.; Zhuang, Y.; Liu, J.; Liu, Q.; Liu, Z.; Wu, J.; Chen, E. Simulating Student Interactions with Two-stage Imitation Learning for Intelligent Educational Systems. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; pp. 3423–3432. [ Google Scholar ]
  • Tabibian, B.; Upadhyay, U.; De, A.; Zarezade, A.; Schölkopf, B.; Gomez-Rodriguez, M. Enhancing human learning via spaced repetition optimization. Proc. Natl. Acad. Sci. USA 2019 , 116 , 3988–3993. [ Google Scholar ] [ CrossRef ]
  • Hunziker, A.; Chen, Y.; Mac Aodha, O.; Gomez Rodriguez, M.; Krause, A.; Perona, P.; Yue, Y.; Singla, A. Teaching multiple concepts to a forgetful learner. Adv. Neural Inf. Process. Syst. 2019 , 32 , 4048–4058. [ Google Scholar ]
  • Reddy, S.; Labutov, I.; Banerjee, S.; Joachims, T. Unbounded human learning: Optimal scheduling for spaced repetition. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1815–1824. [ Google Scholar ]
  • Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction ; MIT Press: Cambridge, MA, USA, 2018. [ Google Scholar ]
  • Pokrywka, J.; Biedalak, M.; Gralinski, F.; Biedalak, K. Modeling Spaced Repetition with LSTMs. In Proceedings of the 15th International Conference on Computer Supported Education CSEDU (2), Prague, Czech, 21–23 April 2023; pp. 88–95. [ Google Scholar ]
  • Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems ; Neural Information Processing Systems Foundation, Inc.: La Jolla, CA, USA, 2017; Volume 30. [ Google Scholar ]
  • Wen, Q.; Zhou, T.; Zhang, C.; Chen, W.; Ma, Z.; Yan, J.; Sun, L. Transformers in time series: A survey. arXiv 2022 , arXiv:2202.07125. [ Google Scholar ]
  • Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014 , arXiv:1406.1078. [ Google Scholar ]
  • Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015 , 518 , 529–533. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997 , 9 , 1735–1780. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Schäfer, A.M. Reinforcement Learning with Recurrent Neural Networks. Ph.D. Thesis, Osnabrück University, Osnabrück, Germany, 2008. [ Google Scholar ]
  • Towers, M.; Terry, J.K.; Kwiatkowski, A.; Balis, J.U.; Cola, G.d.; Deleu, T.; Goulão, M.; Kallinteris, A.; KG, A.; Krimmel, M.; et al. Gymnasium. 2023. Available online: https://github.com/Farama-Foundation/Gymnasium (accessed on 16 May 2024).
  • Weng, J.; Chen, H.; Yan, D.; You, K.; Duburcq, A.; Zhang, M.; Su, Y.; Su, H.; Zhu, J. Tianshou: A Highly Modularized Deep Reinforcement Learning Library. J. Mach. Learn. Res. 2022 , 23 , 1–6. [ Google Scholar ]
  • Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019 , 32 , 8026. [ Google Scholar ]
  • Wilcoxon, F. Individual comparisons by ranking methods. In Breakthroughs in Statistics: Methodology and Distribution ; Springer: New York, NY, USA, 1992; pp. 196–202. [ Google Scholar ]
  • Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006 , 7 , 1–30. [ Google Scholar ]

Click here to enlarge figure

Learning-BasedTemporal DynamicsAttention Mechanism
Pimsleur [ ]---
Leitner [ ]---
HLR [ ]--
DHP [ ]--
GRU-HLR [ ]-
THLR (ours)
Learning-BasedTemporal DynamicsReinforcement Learning
RANDOM---
ANKI [ ]---
MEMORIZE [ ]---
HLR [ ]--
Ours
MAEMAPE
Pimsleur0.3169165.69%
Leitner0.4535133.92%
HLR0.107076.65%
DHP0.077946.35%
GRU-HLR0.030718.88%
THLR (ours)
EFCDHPTHLR
RANDOM52,439.400.86728,803.550.47513,482.700.178
ANKI13,264.310.63713,702.830.58412,425.180.185
MEMORIZE--35,350.100.81712,694.960.169
HLR--10,516.850.46825,169.260.355
Ours88,548.67 162,482.57 18,572.39
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Xiao, Q.; Wang, J. DRL-SRS: A Deep Reinforcement Learning Approach for Optimizing Spaced Repetition Scheduling. Appl. Sci. 2024 , 14 , 5591. https://doi.org/10.3390/app14135591

Xiao Q, Wang J. DRL-SRS: A Deep Reinforcement Learning Approach for Optimizing Spaced Repetition Scheduling. Applied Sciences . 2024; 14(13):5591. https://doi.org/10.3390/app14135591

Xiao, Qinfeng, and Jing Wang. 2024. "DRL-SRS: A Deep Reinforcement Learning Approach for Optimizing Spaced Repetition Scheduling" Applied Sciences 14, no. 13: 5591. https://doi.org/10.3390/app14135591

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

A New Hybrid Algorithm Based on Ant Colony Optimization and Recurrent Neural Networks with Attention Mechanism for Solving the Traveling Salesman Problem

  • Conference paper
  • First Online: 28 June 2024
  • Cite this conference paper

recurrent neural network thesis

  • Anderson Nguetoum Likeufack   ORCID: orcid.org/0009-0004-0864-1811 8 &
  • Mathurin Soh   ORCID: orcid.org/0000-0002-2290-5615 8  

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2085))

Included in the following conference series:

  • Conference on Research in Computer Science

In this paper, we propose a hybrid approach for solving the symmetric traveling salesman problem. The proposed approach combines the ant colony algorithm (ACO) with neural networks based on the attention mechanism. The idea is to use the predictive capacity of neural networks to guide the behaviour of ants in choosing the next cities to visit and to use the prediction results of the latter to update the pheromone matrix, thereby improving the quality of the solutions obtained. In concrete terms, attention is focused on the most promising cities by taking into account both distance and pheromone information thanks to the attention mechanism, which makes it possible to assign weights to each city according to its degree of relevance. These weights are then used to predict the next towns to visit for each city. Experimental results on instances TSP from the TSPLIB library demonstrate that this hybrid approach is better compared to the classic ACO.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Suwannarongsri, S., Puangdownreong, D.: Solving traveling salesman problems via artificial intelligent search techniques. Recent Res. Artif. Intell. Database Manag. 2 (2), 1–5 (2012)

Google Scholar  

Deneubourg, J.L.: Goss: probabilistic behavior in ants. J. Theor. Biol. 105 , 259–271 (1983)

Article   Google Scholar  

Dorigo, M.: Optimization, learning, and natural algorithms. Ph.D. thesis, University of Brussels (1992)

Dehan, M.: Distribution of medicinal products derived from blood plasma in Belgium: a case study. Catholic University of Louvain, 50–73 (2018)

Glover, F.: Heuristics for integer programming using surrogate constraints. Dec. Sci. 156–166 (1977)

Grefenstette, J.J.: Incorporating problem specific knowledge into genetic algorithms. In: Davis, L.D. (ed.) Genetic Algorithms and Simulated Annealing. Pitman, London (1987)

Gorges-Schleuter, M., Kramer, O., Mühlenbein, H.: Evolution algorithms in combinatorial optimization. Parall. Comput. 65–88 (1988)

Jourdan, L., Dhaenens, C., Talbi, E.G., Gallina, S.: A data mining approach to discover genetic and environmental factors involved in multifactorial diseases. Knowl.-Based Syst. 15 (4), 235–242 (2002)

Hachimi, H.: Hybridations d’algorithmes métaheuristiques en optimisation globale et leurs applications, Ph.D. thesis, Université Mohammed V - Agdal, Rabat, Institut National des Sciences Appliquées de Rouen, Rabat, Maroc; Rouen, France, 2013. Spécialité: Mathématiques appliquées et Informatique. Option: Optimisation, Analyse numérique, Statistique

Gambardella, L.M., Dorigo, M.: Ant-Q: A Reinforcement Learning Approach to the Traveling Salesman Problem, pp. 252–260. Morgan Kaufmann, Palo Alto (1995)

Pu, Y.F., Siarry, P., Zhu, W.Y., Wang, J., Zhang, N.: Fractional-order ant colony algorithm: a fractional long term memory based cooperative learning approach. Swarm Evol. Comput. 69 , 101014 (2022)

Gulcu, S., Mahi, M., Baykan, O., Kodaz, H.: A parallel cooperative hybrid method based on ant colony optimization and 3-Opt algorithm for solving traveling salesman problem. Soft. Comput. 22 , 1669–1685 (2018)

Mahi, M., Baykan, O., Kodaz, H.: A new hybrid method based on particle swarm optimization, ant colony optimization and 3-opt algorithms for traveling salesman problem. Appl. Soft Comput. 30 , 484–490 (2015)

Gong, X., Rong, Z., Wang, J., et al.: A hybrid algorithm based on state-adaptive slime mold model and fractional-order ant system for the travelling salesman problem. Complex Intell. Syst. (2022)

Bresson, X., Laurent, T.: The transformer network for the traveling salesman problem. arXiv preprint arXiv:2103.03012

Jung, M., Lee, J., Kim, J.: A lightweight CNN-transformer model for learning traveling salesman problems. arXiv preprint arXiv:2305.01883v1

Chieng Hoon Choong, A., Lee, N.K.: Evaluation of convolutionary neural networks modeling of DNA sequences using ordinal versus one-hot encoding method. bioRxiv (2017)

Michelucci, U.: An introduction to autoencoders. arXiv preprint arXiv:2201.03898v1 (2022)

Download references

Author information

Authors and affiliations.

University of Dschang, Dschang, Cameroon

Anderson Nguetoum Likeufack & Mathurin Soh

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Mathurin Soh .

Editor information

Editors and affiliations.

University of Yaoundé I, Yaoundé, Cameroon

Paulin Melatagia Yonta

Conservatoire National des Arts et Métiers, Paris, France

Kamel Barkaoui

René Ndoundam

University of Ngaoundéré, Ngaoundéré, Cameroon

Omer-Blaise Yenke

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Cite this paper.

Nguetoum Likeufack, A., Soh, M. (2024). A New Hybrid Algorithm Based on Ant Colony Optimization and Recurrent Neural Networks with Attention Mechanism for Solving the Traveling Salesman Problem. In: Melatagia Yonta, P., Barkaoui, K., Ndoundam, R., Yenke, OB. (eds) Research in Computer Science. CRI 2023. Communications in Computer and Information Science, vol 2085. Springer, Cham. https://doi.org/10.1007/978-3-031-63110-8_12

Download citation

DOI : https://doi.org/10.1007/978-3-031-63110-8_12

Published : 28 June 2024

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-63109-2

Online ISBN : 978-3-031-63110-8

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

IMAGES

  1. Simple Explanation of Recurrent Neural Network (RNN)

    recurrent neural network thesis

  2. Recurrent Neural Network architecture.

    recurrent neural network thesis

  3. The Complete Guide to Recurrent Neural Networks

    recurrent neural network thesis

  4. 5: Elman's recurrent neural network.

    recurrent neural network thesis

  5. Pre-thesis Presentation Topic: Recurrent Neural Network (RNN)

    recurrent neural network thesis

  6. GitHub

    recurrent neural network thesis

VIDEO

  1. Recurrent Neural Networks

  2. final project presentation video AkhilaUppula 1

  3. Bidirectional Recurrent Neural Network

  4. Recurrent Neural Network Explained with Example!

  5. Deep Structured Joint Embedding for Image Text Matching

  6. Deep Learning

COMMENTS

  1. PDF Long Short-term Memory Recurrent Neural Networks for Classi cation of

    Long Short-term Memory Recurrent Neural Networks for Classi cation of Acute Hypotensive Episodes by Alexander Scott Ja e Submitted to the Department of Electrical Engineering and Computer ... This thesis thus applies established neural network techniques to the established AHE prediction problem to produce novel results. 12. Chapter 2

  2. PDF by Ilya Sutskever

    Training Recurrent Neural Networks Ilya Sutskever Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2013 Recurrent Neural Networks (RNNs) are powerful sequence models that were believed to be difficult to train, and as a result they were rarely used in machine learning applications. This thesis presents methods

  3. [1912.05911] Recurrent Neural Networks (RNNs): A gentle Introduction

    Recurrent Neural Networks (RNNs): A gentle Introduction and Overview. State-of-the-art solutions in the areas of "Language Modelling & Generating Text", "Speech Recognition", "Generating Image Descriptions" or "Video Tagging" have been using Recurrent Neural Networks as the foundation for their approaches. Understanding the underlying concepts ...

  4. Resurrecting Recurrent Neural Networks for Long Sequences

    Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train. Deep state-space models (SSMs) have recently been shown to perform remarkably well on long sequence modeling tasks, and have the added benefits of fast parallelizable training and RNN-like fast inference. However, while SSMs are superficially similar to RNNs, there are important ...

  5. PDF Supervised Sequence Labelling with Recurrent Neural Networks

    The aim of this thesis is to advance the state-of-the-art in supervised sequence labelling with recurrent networks in general, and long short-term memory in particular. Its two main contributions are (1) a new type of output layer that allows recurrent networks to be trained directly for sequence labelling tasks where the align-

  6. PDF Exploring many-body Physics with Recurrent Neural Networks

    using recurrent neural networks", arXiv:2303.11207. Research presented in Chapter 8: Chapter 8 includes research results obtained during my research internship at Zapata Computing, in collaboration with Juan Carrasquilla, ... This thesis would not be possible without the support and help of many people. To start, I would like to express my ...

  7. PDF Long Short-Term Memory Recurrent Neural Network Architectures for

    1.2 Feedforward Neural Networks. Neural network is a machine learning technique inspired by the structure of the brain. The basic foundational unit is called a neuron. Every neuron accepts a set of inputs and each input is given a specific weight. The neuron then computes some function on the weighted input.

  8. Closed-form continuous-time neural networks

    Extended Data Fig. 2 Closed-form Continuous-depth neural architecture. A backbone neural network layer delivers the input signals into three head networks g, f and h. f acts as a liquid time ...

  9. A Study of Recurrent neural networks (RNNs) in univariate and

    Abstract. Recurrent neural netw orks (RNNs) are p owerful sequence models. Using their internal. memory they are able to model temporal dep endencies of unspecified duration between. the inputs ...

  10. Training Recurrent Neural Networks

    Recurrent Neural Networks (RNNs) are powerful sequence models that were believed to be difficult to train, and as a result they were rarely used in machine learning applications. This thesis presents methods that overcome the difficulty of training RNNs, and applications of RNNs to challenging problems.

  11. PDF Use of Recurrent Neural Networks (RNN) in the deep neural network

    neural networks are used to predict the nonlinear beam element stiffness matrix bas ed on previously conducted FEM calculations using Abaqus. The current work will focus on the exploration of addi-tional ML models, in particular Recurrent Neural Net-works (RNN) and associated advantages or disad-vantages. The work will focus on the implementation,

  12. PDF Superresolution Recurrent Convolutional Neural Networks for Learning

    1.1.2 Recurrent Neural Networks A recurrent neural network (RNN)[6, 23, 82, 52] is a class of artificial neural network where connections between nodes form a directed graph along a sequence. Here, the input/outputs are not independent, rather such models recompute the similar function for each element in

  13. UND Scholarly Commons

    UND Scholarly Commons | University of North Dakota Research

  14. Generating text with recurrent neural networks

    To our knowledge this represents the largest recurrent neural network application to date. References [1] Bell, R.M., Koren, Y., and Volinsky, C. The BellKor solution to the Netflix prize. KorBell Team's Report to Netflix, 2007. ... Diploma thesis. PhD thesis, Institut fur Informatik, Technische Universitat Munchen, 1991. Google Scholar [6]

  15. Long short-term memory in recurrent neural networks

    Compositional Distributional Semantics with Long Short Term Memory. An extension of the recursive neural network that makes use of a variant of the long short-term memory architecture that provides a solution to the vanishing gradient problem and allows the network to capture long range dependencies. Expand.

  16. PDF CHAPTER RNNs and LSTMs

    9.1 Recurrent Neural Networks A recurrent neural network (RNN) is any network that contains a cycle within its network connections, meaning that the value of some unit is directly, or indirectly, dependent on its own earlier outputs as an input. While powerful, such networks are difficult to reason about and to train.

  17. PDF Master Thesis Counterfactual Learning of Recurrent Neural Networks

    Master Thesis Counterfactual Learning of Recurrent Neural Networks Recurrent Neural Networks (RNN) are a general computational model that has shown to be e ective in many di erent kinds of tasks. Additionally, it is a model that neuroscientist have been looking at as a (very abstract) model of the what the brain may be doing [1].

  18. An Introduction to Recurrent Neural Networks and the Math That Powers

    A recurrent neural network (RNN) is a special type of artificial neural network adapted to work for time series data or data that involves sequences. Ordinary feedforward neural networks are only meant for data points that are independent of each other. However, if we have data in a sequence such that one data point depends upon the previous ...

  19. LiyuanHsu/Master-Thesis: Master Thesis

    We propose two deep recurrent neural network architectures (reinforcement learning and supervised learning) to solve quadrotor obstacle avoidance and navigation problems. First, trainingthese neural networks only in simulation environment, they are able to directly transfer into realworld without any fine-tuning.

  20. A Study on the Performance of Recurrent Neural Network based Models in

    Then, we employ several recurrent neural networks (RNN)-based models, including Long-short Term Memory (LSTM), Gated Recurrent Unit (GRU), LSTM with a CRF layer (LSTM-CRF), and GRU with a CRF layer (GRU-CRF) and perform a comparative study. We also study the effect of both word embedding and character embedding in the task.

  21. (PDF) Training Recurrent Neural Networks

    This paper describes training Recurrent Neural Networks (RNN) which are able to learn features and long range dependencies from sequential data. Although training RNNs is mostly plagued by the ...

  22. PDF Master's thesis Time Series Forecasting using Deep Neural Networks

    e over sequential data, used for classi cation and regression tasks. Types of Recurrent Neural Networks are described in this thesis and the algorithms are used i. the implementation of a baseline model for time series forecasting. Grid Search or Bayesian Optimisation are strategies that assist in nding the best combination of hyperparamete.

  23. Exploring Different Dynamics of Recurrent Neural Network Methods for

    A recurrent neural network called LSTM has been used to predict stock value using market data, according to Pramod and Shastry (Citation 2020). Neural networks such as multi-layer perceptron, RNN, LSTM, and CNN for predicting stock prices based on historical data have been proposed by Hiransha et al. (Citation 2018). Using data from two stock ...

  24. Applied Sciences

    Sinha introduced a method that uses a Recurrent Neural Network (RNN) to model the student's memory mechanism, providing pseudo rewards at each learning event to guide the policy learning process. This approach improved upon the assumption of constant intervals by incorporating the student's memory state into the decision-making process.

  25. A New Hybrid Algorithm Based on Ant Colony Optimization and Recurrent

    Then we train a recurrent neural network with attention mechanism on this sequence of one-hot vectors to allow the model to learn the sequential relationships between cities and capture the dependencies and relationships between cities. ... Ph.D. thesis, Université Mohammed V - Agdal, Rabat, Institut National des Sciences Appliquées de Rouen ...

  26. [1601.04114] Training Recurrent Neural Networks by Diffusion

    This work presents a new algorithm for training recurrent neural networks (although ideas are applicable to feedforward networks as well). The algorithm is derived from a theory in nonconvex optimization related to the diffusion equation. The contributions made in this work are two fold. First, we show how some seemingly disconnected mechanisms used in deep learning such as smart ...

  27. PDF Zongxia Li, Andrew Mao, Daniel Ko Stephens, Pranav Goel, Emily Walpole

    ages the architecture power of Recurrent Neural Network (RNN); NTM s with pre-trained language models, such as BERT, that already learns the se-mantic relationship and association of words from a large corpus of texts. NTM s have the advantage of producing higher automatic evaluation scores, and classicationabilities, alongwithothermoreexten-

  28. Hydrologic connectivity and dynamics of solute ...

    2.2.Deep learning rainfall-runoff reconstruction of streamflow. We next develop a rainfall-runoff model based on the EA-LSTM neural network (Kratzert et al., 2019b) to reconstruct streamflow for all sub-basins Ω i for i = 1, …, N.An overview of the EA-LSTM network is given in Appendix A.The LSTM-based networks can generalize to ungauged basins with better overall skill than calibrated ...

  29. arXiv:2406.19307v1 [cs.CL] 27 Jun 2024

    Methods: Methods based on neural network archi-tectures, especially recurrent neural networks and later transformers, are more capable of capturing contextual information. Consequently, they show substantial improvements in the identification and analysis of causal relationships in text; (iii) Pre-Trained Language Models: Language models like