What is the Bellman equation?

Reinforcement Learning - Interview Questions

According to the Bellman Equation, long-term- reward in a given action is equal to the reward from the current action combined with the expected reward from the future actions taken at the following time. Let’s try to understand first.

Let’s take an example :

Here we have a maze which is our environment and the sole goal of our agent is to reach the trophy state (R = 1) or to get Good reward and to avoid the fire state because it will be a failure (R = -1) or will get Bad reward.