Which of the following correctly states the difference between Q-learning and SARSA?

A)  In comparison to QL, SARSA directly learns the optimal policy, whereas QL learns a policy that is "near" the optimal.
B)  In comparison to SARSA, QL directly learns the optimal policy, whereas SARSA learns a policy that is "near" the optimal
C)  Both (A) and (B)
D)  None of the above

Correct Answer :   In comparison to SARSA, QL directly learns the optimal policy, whereas SARSA learns a policy that is "near" the optimal


Explanation : In comparison to SARSA, QL directly learns the optimal policy, whereas SARSA learns a policy that is "near" the optimal.