Correct Answer : 4
Explanation : MDP consists of 4 tuples :* A set of finite States S* A set of finite Actions A* Rewards received after transitioning from state S to state S', due to action a.* Probability Pa.
Correct Answer : Discount factor
Explanation : Gamma (γ) in the bellman equation is known as the Discount factor.
Correct Answer : Markov state
Explanation : Represent the agent state in reinforcement learning Markov state.
Explaination : P[St+1 | St ] = P[St +1 | S1,......, St], in the following condition St represents the Markov state.