Correct Answer : Off-policy
Explanation : In an off-policy learning algorithm target policy is not equal to behavior policy.