Top 60 Reinforcement Learning (RL) Multiple Choice Questions (MCQs) with Answers

1 .

Reinforcement learning is a ______

A)

Feedback-based learning technique

B)

Prediction-based learning technique

C)

History results-based learning technique

D)

None of the above

Correct Answer : Feedback-based learning technique

Explanation : Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns to behave in an environment by performing the actions and seeing the results of actions. For each good action, the agent gets positive feedback, and for each bad action, the agent gets negative feedback or penalty.

2 .

___________ is a type of machine learning in which an agent learns to make decisions by interacting with an environment.

A)

Supervised learning

B)

Unsupervised learning

C)

Reinforcement learning

D)

Semi-supervised learning

Correct Answer : Reinforcement learning

Explanation : Reinforcement learning is a type of machine learning in which an agent learns to make decisions by interacting with an environment.

3 .

Which of the following are applications of Reinforcement learning?

A)

Robotics

B)

video games

C)

self-driving cars

D)

All of the above

Correct Answer : All of the above

Explanation : All of the above are applications of Reinforcement learning.

4 .

Which of the following is an application of reinforcement learning?

A)

Topic modeling

B)

Pattern recognition

C)

Image classification

D)

Recommendation system

Correct Answer : Recommendation system

5 .

What sets Reinforcement Learning apart from other machine learning paradigms?

A)

Pre-trained models

B)

Interaction with an environment

C)

Batch processing

D)

Supervised labeling

Correct Answer : Interaction with an environment

6 .

Q-learning works on which equation?

A)

Bellman-equation

B)

KNN-equation

C)

NaÃ¯ve bayes equation

D)

None of the above

Correct Answer : Bellman-equation

Explanation : Q-learning works on the Bellman equation.

7 .

Does reinforcement learning provide any previous training?

A)

Yes

B)

No

C)

Can Not Say

D)

None of the above

Correct Answer : No

Explanation : No, reinforcement learning does not require any previous training.

8 .

What does Q stand for in Q-learning?

A)

Quick

B)

Query

C)

Quantify

D)

Quality

Correct Answer : Quality

Explanation : In Q-learning "Q" stands for quality.

9 .

The matrix created during the Q-learning algorithm is commonly known as ____?

A)

Table

B)

Query-table

C)

Q-table

D)

Quick-matrix

Correct Answer : Q-table

Explanation : The matrix created during the Q-learning algorithm is commonly known as the q-table.

10 .

Q-learning is a model-free or model-based learning algorithm?

A)

Model-free

B)

Model-based

C)

Both (A) and (B)

D)

None of the above

Correct Answer : Model-free

Explanation : Q-learning is a model-free learning algorithm.

11 .

Which of the following is faster?

A)

QL

B)

SARSA

C)

Can Not Say

D)

None of the above

Correct Answer : SARSA

Explanation : SARSA is faster.

12 .

Which of the following gives the better final performance?

A)

QL

B)

SARSA

C)

Can Not Say

D)

None of the above

Correct Answer : QL

Explanation : Q-learning (QL) gives a better final performance.

13 .

What term describes the strategy of choosing actions to maximize cumulative rewards over time?

A)

Policy optimization

B)

Feature extraction

C)

Reinforcement learning

D)

Hyperparameter tuning

Correct Answer : Policy optimization

14 .

In Reinforcement Learning, what does the term 'agent' refer to?

A)

A labeled data point

B)

A neural network architecture

C)

A software program making decisions

D)

A person supervising the learning process

Correct Answer : A software program making decisions

15 .

How many types of feedback does reinforcement provide?

A)

1

B)

2

C)

3

D)

4

Correct Answer : 2

Explanation : Reinforcement learning gives two types of feedback: positive and negative.

16 .

Which kind of data does reinforcement learning use?

A)

Labeled data

B)

Unlabelled data

C)

Both (A) and (B)

D)

None of the above

Correct Answer : None of the above

Explanation : Reinforcement learning does not use any type of data.

17 .

Reinforcement learning methods learned through ____?

A)

Predictions

B)

Experience

C)

Analyzing the data

D)

None of the above

Correct Answer : Experience

Explanation : Reinforcement learning learns through experience.

18 .

___________ is defined as when an event, occurs due to a particular behavior, increases the strength and the frequency of the behavior.

A)

Positive Reinforcement

B)

Negative Reinforcement

C)

Both (A) and (B)

D)

None of the above

Correct Answer : Positive Reinforcement

Explaination : Positive Reinforcement is defined as when an event, occurs due to a particular behavior, increases the strength and the frequency of the behavior.

19 .

Which type of feedback does an agent in reinforcement learning receive?

A)

Labels

B)

Clusters

C)

Predictions

D)

Rewards or penalties

Correct Answer : Rewards or penalties

Explanation : Rewards or penalties an agent in reinforcement learning receive

20 .

_________ is the Q-learning algorithm used for in reinforcement learning.

A)

Clustering

B)

Image recognition

C)

Finding the optimal decision-making strategy

D)

Natural language processing

Correct Answer : Finding the optimal decision-making strategy

Explanation : Finding the optimal decision-making strategy is the Q-learning algorithm used for in reinforcement learning.

21 .

Thompson sampling is a__________.

A)

Probabilistic algorithm

B)

Based on Bayes inference rule

C)

Reinforcement learning algorithm

D)

All of the above

Correct Answer : All of the above

22 .

What is DQN in reinforcement learning?

A)

Deep Q-neural network

B)

Dynamic Q-neural network

C)

Dynamic Q-learning network

D)

None of the above

Correct Answer : Deep Q-neural network

Explanation : DQN stands for Deep Q-neural network.

23 .

Which of the following correctly states the difference between Q-learning and SARSA?

A)

In comparison to QL, SARSA directly learns the optimal policy, whereas QL learns a policy that is "near" the optimal.

B)

In comparison to SARSA, QL directly learns the optimal policy, whereas SARSA learns a policy that is "near" the optimal

C)

Both (A) and (B)

D)

None of the above

Correct Answer : In comparison to SARSA, QL directly learns the optimal policy, whereas SARSA learns a policy that is "near" the optimal

Explanation : In comparison to SARSA, QL directly learns the optimal policy, whereas SARSA learns a policy that is "near" the optimal.

24 .

How many types of machine learning are there?

A)

2

B)

3

C)

4

D)

5

Correct Answer : 4

Explanation : Four types of machine learning are there: Supervised, unsupervised, semi-supervised, and reinforcement.

25 .

Which of the following is the practical example of reinforcement learning?

A)

Driverless cars

B)

Text classification

C)

Market basket analysis

D)

House pricing prediction

Correct Answer : Driverless cars

Explanation : Driverless cars are the product of reinforcement learning concepts.

26 .

What is the environment in reinforcement learning?

A)

Environment is similar to feedback

B)

Environment is a situation in which an agent is present.

C)

Environment is a situation that is based on the current state

D)

Environment is a situation that the agent returns as a result.

Correct Answer : Environment is a situation in which an agent is present.

Explanation : Environment is a situation in which an agent is present.

27 .

What are actions in reinforcement learning?

A)

Actions are the feedback that an agent provides.

B)

Actions are the function that the environment takes.

C)

Actions are the moves that the agent takes inside the environment.

D)

None of the above

Correct Answer : Actions are the moves that the agent takes inside the environment.

Explanation : Actions are the moves that the agent takes inside the environment.

28 .

What is the state of reinforcement learning?

A)

State is a situation in which an agent is present.

B)

A state is the simple value of reinforcement learning.

C)

A state is a result returned by the environment after an agent takes an action.

D)

None of the above

Correct Answer : A state is a result returned by the environment after an agent takes an action.

Explanation : A state is a result returned by the environment after an agent takes an action.

29 .

What are the Rewards of Reinforcement learning?

A)

Environment gives value in return which is known as a reward.

B)

An agent's action is evaluated based on feedback returned from the environment.

C)

A reward is a result returned by the environment after an agent takes an action.

D)

None of the above

Correct Answer : An agent's action is evaluated based on feedback returned from the environment.

Explanation : An agent's action is evaluated based on feedback returned from the environment is known as rewards.

30 .

What is the Policy in reinforcement learning?

A)

The agent's policy determines what action to take based on the current state.

B)

The agent's policy determines what the state reward would be.

C)

The agent's policy determines what environment model should be decided

D)

None of the above

Correct Answer : The agent's policy determines what action to take based on the current state.

Explanation : The agent's policy determines what action to take based on the current state.

31 .

Balancing between trying new actions and exploiting known actions is known as:

A)

Model validation

B)

Feature extraction

C)

Dimensionality reduction

D)

Exploration vs. exploitation

Correct Answer : Exploration vs. exploitation

32 .

Which algorithm is particularly well-suited for environments with continuous action spaces?

A)

Q-Learning

B)

Policy Gradient

C)

Deep Q-Network (DQN)

D)

Monte Carlo Tree Search (MCTS)

Correct Answer : Policy Gradient

33 .

Does reinforcement learning follow the concept of the Hit and try method?

A)

Yes

B)

No

C)

Can Not Say

D)

None of the above

Correct Answer : Yes

Explanation : Yes, reinforcement learning follows the concept of the hit-and-try method.

34 .

In how many ways can you implement reinforcement learning?

A)

1

B)

2

C)

3

D)

4

Correct Answer : 3

Explanation : In three ways we can implement reinforcement learning:

* Value-based
* Policy-based
* Model-based

35 .

Q-learning follows an on-policy learning algorithm or an off-policy learning algorithm?

A)

On-policy

B)

Off-policy

C)

Both (A) and (B)

D)

None of the above

Correct Answer : Off-policy

Explanation : Q-learning is based on an off-policy learning algorithm.

36 .

SARSA follows an on-policy learning algorithm or an off-policy learning algorithm?

A)

On-policy

B)

Off-policy

C)

Both (A) and (B)

D)

None of the above

Correct Answer : On-policy

Explanation : SARSA is based upon an on-policy learning algorithm.

37 .

Which of the following type of policy is a learning algorithm in which the same policy is improved and evaluated?

A)

On-policy

B)

Off-policy

C)

Target policy

D)

behavior policy

Correct Answer : On-policy

Explanation : On-policy type of policy is a learning algorithm in which the same policy is improved and evaluated.

38 .

Which of the following types of policy is a learning algorithm that evaluates and improves a policy that is dissimilar from the Policy that is used for action selection?

A)

On-policy

B)

Target policy

C)

behavior policy

D)

Off-policy

Correct Answer : Off-policy

Explaination : Off-policy is a type of policy, is a learning algorithm that evaluates and improves a policy that is dissimilar from the Policy that is used for action selection.

39 .

Among On-policy and off-policy, which of the following target policy is not equal to behavior policy?

A)

On-policy

B)

Off-policy

C)

Both (A) and (B)

D)

None of the above

Correct Answer : Off-policy

Explanation : In an off-policy learning algorithm target policy is not equal to behavior policy.

40 .

Among On-policy and off-policy, which of the following target policy is equal to behavior policy?

A)

On-policy

B)

Off-policy

C)

Can Not Say

D)

None of the above

Correct Answer : On-policy

Explanation : In the on-policy learning algorithm target policy is equal to behavior policy.

41 .

How many types of policy-based approaches are there in reinforcement learning?

A)

1

B)

2

C)

3

D)

4

Correct Answer : 2

Explanation : There are two types of policy-based approaches:

* Deterministic
* Stochastic

42 .

How many elements does reinforcement learning consist of?

A)

1

B)

2

C)

3

D)

4

Correct Answer : 4

Explanation : Mainly there are four types of reinforcement learning :

* Policy
* Reward Signal
* Value Function
* Model of the environment

43 .

The agent's main objective is to ____the total number of rewards for good actions?

A)

Null

B)

Minimize

C)

Maximize

D)

None of the above

Correct Answer : Maximize

Explanation : The agent's main objective is to maximize the total number of rewards for good actions.

44 .

____ is a synonym for random and probabilistic?

A)

Deterministic

B)

Stochastic

C)

Can Not Say

D)

None of the above

Correct Answer : Stochastic

Explanation : Stochastic is a synonym for random and probabilistic variables.

45 .

In which of the following approaches of reinforcement learning, do we find the optimal value function?

A)

Value-based

B)

Policy-based

C)

Model-based

D)

None of the above

Correct Answer : Value-based

Explanation : In a Value-based approach to reinforcement learning, we find the optimal value function.

46 .

In which of the following approaches of reinforcement learning, a virtual model is created for the environment?

A)

Model-based

B)

Value-based

C)

Policy-based

D)

None of the above

Correct Answer : Model-based

Explanation : Model-based approach of reinforcement learning, a virtual model is created for the environment.

47 .

What do you mean by SARSA in reinforcement learning?

A)

State act reward act

B)

State act reward achievement

C)

State achievement rewards state action

D)

State action reward state action

Correct Answer : State action reward state action

Explanation : SARSA stands for State action reward state action.

48 .

______ is the policy that an agent is trying to learn?

A)

On-policy

B)

Off-policy

C)

Target policy

D)

behavior policy

Correct Answer : Target policy

Explanation : A target policy is a type of policy that an agent is trying to learn.

49 .

____- is the policy which is used by an agent for action selection?

A)

On-policy

B)

Off-policy

C)

Target policy

D)

behavior policy

Correct Answer : behavior policy

Explanation : Behavior policy is used by an agent for action selection.

50 .

How many tuples does MDP consist of?

A)

3

B)

4

C)

5

D)

6

Correct Answer : 4

Explanation : MDP consists of 4 tuples :

* A set of finite States S
* A set of finite Actions A
* Rewards received after transitioning from state S to state S', due to action a.
* Probability Pa.

51 .

Why do we use MDP in reinforcement learning?

A)

We use MDP to formalize the reinforcement learning problems.

B)

We use MDP to predict reinforcement learning problems.

C)

We use MDP to analyze the reinforcement learning problems.

D)

None of the above

Correct Answer : We use MDP to formalize the reinforcement learning problems.

Explanation : We use MDP to formalize the reinforcement learning problems.

52 .

Which of the following algorithms will find the best course of action, based on the agent's current state, without using a model and off-policy reinforcement learning?

A)

Q-learning

B)

Markov property

C)

Deep Q neural network

D)

State action reward state action

Correct Answer : Q-learning

Explaination : A Q-learning algorithm will find the best course of action, based on the agent's current state, without using a model and off-policy reinforcement learning.

53 .

What do you mean by MDP in reinforcement learning?

A)

Markov discount process

B)

Markov deciding procedure

C)

Markov decision process

D)

Markov discount procedure

Correct Answer : Markov decision process

Explanation : MDP stands for Markov decision process.

54 .

Reinforcement learning is defined by the ____?

A)

Policy

B)

Reward Signal

C)

Value Function

D)

Model of the environment

Correct Answer : Reward Signal

Explanation : Reinforcement learning is defined by the Reward signal.

55 .

Which element in reinforcement learning defines the behavior of the agent?

A)

Policy

B)

Reward Signal

C)

Value Function

D)

Model of the environment

Correct Answer : Policy

Explanation : Policy elements in reinforcement learning define the behavior of the agent.

56 .

On which of the following elements of reinforcement learning, the reward that an agent can expect is dependent?

A)

Policy

B)

Reward Signal

C)

Model of the environment

D)

Value Function

Correct Answer : Value Function

Explanation : On the value function, the reward that the agent can expect is dependent.

57 .

Who introduced the Bellman equation?

A)

Alfonso Shimbel

B)

Edsger W. Dijkstra

C)

Richard Ernest Bellman

D)

None of the above

Correct Answer : Richard Ernest Bellman

Explanation : Richard Ernest Bellman introduced the Bellman equation.

58 .

P[St+1 | St ] = P[St +1 | S1,......, St], in this condition
What is the meaning of St?

A)

State factor

B)

Markov state

C)

Discount factor

D)

None of the above

Correct Answer : Markov state

Explaination : P[St+1 | St ] = P[St +1 | S1,......, St], in the following condition St represents the Markov state.

59 .

How do you represent the agent state in reinforcement learning?

A)

Markov state

B)

Discount state

C)

Discount factor

D)

None of the above

Correct Answer : Markov state

Explanation : Represent the agent state in reinforcement learning Markov state.

60 .

Gamma (Î³) in the bellman equation is known as?

A)

Value factor

B)

Discount factor

C)

Environment factor

D)

None of the above

Correct Answer : Discount factor

Explanation : Gamma (γ) in the bellman equation is known as the Discount factor.