What's the difference between on-policy and off-policy evaluation?

Reinforcement Learning - Interview Questions

On-policy evaluation is used to assess the quality of a policy by running it in an environment and measuring the resulting rewards. This is the most common form of evaluation used in reinforcement learning. Off-policy evaluation is used to assess the quality of a policy by running it in an environment and measuring the rewards that would have been received if a different policy had been used. This is less common, but can be useful in certain situations.