Google News
logo
Reinforcement Learning - Quiz(MCQ)

Reinforcement Learning (RL) : Reinforcement Learning (RL) is a feedback-based Machine learning technique in which an agent learns to behave in an environment by performing the actions and seeing the results of actions. For each good action, the agent gets positive feedback, and for each bad action, the agent gets negative feedback or penalty.

In Reinforcement Learning, the agent learns automatically using feedbacks without any labeled data, unlike supervised learning.

A)
Feedback-based learning technique
B)
Prediction-based learning technique
C)
History results-based learning technique
D)
None of the above

Correct Answer :   Feedback-based learning technique


Explanation : Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns to behave in an environment by performing the actions and seeing the results of actions. For each good action, the agent gets positive feedback, and for each bad action, the agent gets negative feedback or penalty.

A)
Supervised learning
B)
Unsupervised learning
C)
Reinforcement learning
D)
Semi-supervised learning

Correct Answer :   Reinforcement learning


Explanation : Reinforcement learning is a type of machine learning in which an agent learns to make decisions by interacting with an environment.

A)
Robotics
B)
video games
C)
self-driving cars
D)
All of the above

Correct Answer :   All of the above


Explanation : All of the above are applications of Reinforcement learning.

A)
Topic modeling
B)
Pattern recognition
C)
Image classification
D)
Recommendation system

Correct Answer :   Recommendation system

A)
Pre-trained models
B)
Interaction with an environment
C)
Batch processing
D)
Supervised labeling

Correct Answer :   Interaction with an environment

A)
Bellman-equation
B)
KNN-equation
C)
Naïve bayes equation
D)
None of the above

Correct Answer :   Bellman-equation


Explanation : Q-learning works on the Bellman equation.

A)
Yes
B)
No
C)
Can Not Say
D)
None of the above

Correct Answer :   No


Explanation : No, reinforcement learning does not require any previous training.

A)
Quick
B)
Query
C)
Quantify
D)
Quality

Correct Answer :   Quality


Explanation : In Q-learning "Q" stands for quality.

A)
Table
B)
Query-table
C)
Q-table
D)
Quick-matrix

Correct Answer :   Q-table


Explanation : The matrix created during the Q-learning algorithm is commonly known as the q-table.

A)
Model-free
B)
Model-based
C)
Both (A) and (B)
D)
None of the above

Correct Answer :   Model-free


Explanation : Q-learning is a model-free learning algorithm.

A)
QL
B)
SARSA
C)
Can Not Say
D)
None of the above

Correct Answer :   SARSA


Explanation : SARSA is faster.

A)
QL
B)
SARSA
C)
Can Not Say
D)
None of the above

Correct Answer :   QL


Explanation : Q-learning (QL) gives a better final performance.

A)
Policy optimization
B)
Feature extraction
C)
Reinforcement learning
D)
Hyperparameter tuning

Correct Answer :   Policy optimization

A)
A labeled data point
B)
A neural network architecture
C)
A software program making decisions
D)
A person supervising the learning process

Correct Answer :   A software program making decisions

A)
1
B)
2
C)
3
D)
4

Correct Answer :   2


Explanation : Reinforcement learning gives two types of feedback: positive and negative.

A)
Labeled data
B)
Unlabelled data
C)
Both (A) and (B)
D)
None of the above

Correct Answer :   None of the above


Explanation : Reinforcement learning does not use any type of data.

A)
Predictions
B)
Experience
C)
Analyzing the data
D)
None of the above

Correct Answer :   Experience


Explanation : Reinforcement learning learns through experience.

18 .
___________ is defined as when an event, occurs due to a particular behavior, increases the strength and the frequency of the behavior.
A)
Positive Reinforcement
B)
Negative Reinforcement
C)
Both (A) and (B)
D)
None of the above

Correct Answer :   Positive Reinforcement


Explaination : Positive Reinforcement is defined as when an event, occurs due to a particular behavior, increases the strength and the frequency of the behavior.

A)
Labels
B)
Clusters
C)
Predictions
D)
Rewards or penalties

Correct Answer :   Rewards or penalties


Explanation : Rewards or penalties an agent in reinforcement learning receive

A)
Clustering
B)
Image recognition
C)
Finding the optimal decision-making strategy
D)
Natural language processing

Correct Answer :   Finding the optimal decision-making strategy


Explanation : Finding the optimal decision-making strategy is the Q-learning algorithm used for in reinforcement learning.

A)
Probabilistic algorithm
B)
Based on Bayes inference rule
C)
Reinforcement learning algorithm
D)
All of the above

Correct Answer :   All of the above

A)
Deep Q-neural network
B)
Dynamic Q-neural network
C)
Dynamic Q-learning network
D)
None of the above

Correct Answer :   Deep Q-neural network


Explanation : DQN stands for Deep Q-neural network.

A)
In comparison to QL, SARSA directly learns the optimal policy, whereas QL learns a policy that is "near" the optimal.
B)
In comparison to SARSA, QL directly learns the optimal policy, whereas SARSA learns a policy that is "near" the optimal
C)
Both (A) and (B)
D)
None of the above

Correct Answer :   In comparison to SARSA, QL directly learns the optimal policy, whereas SARSA learns a policy that is "near" the optimal


Explanation : In comparison to SARSA, QL directly learns the optimal policy, whereas SARSA learns a policy that is "near" the optimal.

A)
2
B)
3
C)
4
D)
5

Correct Answer :   4


Explanation : Four types of machine learning are there: Supervised, unsupervised, semi-supervised, and reinforcement.

A)
Driverless cars
B)
Text classification
C)
Market basket analysis
D)
House pricing prediction

Correct Answer :   Driverless cars


Explanation : Driverless cars are the product of reinforcement learning concepts.

A)
Environment is similar to feedback
B)
Environment is a situation in which an agent is present.
C)
Environment is a situation that is based on the current state
D)
Environment is a situation that the agent returns as a result.

Correct Answer :   Environment is a situation in which an agent is present.


Explanation : Environment is a situation in which an agent is present.

A)
Actions are the feedback that an agent provides.
B)
Actions are the function that the environment takes.
C)
Actions are the moves that the agent takes inside the environment.
D)
None of the above

Correct Answer :   Actions are the moves that the agent takes inside the environment.


Explanation : Actions are the moves that the agent takes inside the environment.

A)
State is a situation in which an agent is present.
B)
A state is the simple value of reinforcement learning.
C)
A state is a result returned by the environment after an agent takes an action.
D)
None of the above

Correct Answer :   A state is a result returned by the environment after an agent takes an action.


Explanation : A state is a result returned by the environment after an agent takes an action.

A)
Environment gives value in return which is known as a reward.
B)
An agent's action is evaluated based on feedback returned from the environment.
C)
A reward is a result returned by the environment after an agent takes an action.
D)
None of the above

Correct Answer :   An agent's action is evaluated based on feedback returned from the environment.


Explanation : An agent's action is evaluated based on feedback returned from the environment is known as rewards.

A)
The agent's policy determines what action to take based on the current state.
B)
The agent's policy determines what the state reward would be.
C)
The agent's policy determines what environment model should be decided
D)
None of the above

Correct Answer :   The agent's policy determines what action to take based on the current state.


Explanation : The agent's policy determines what action to take based on the current state.

A)
Model validation
B)
Feature extraction
C)
Dimensionality reduction
D)
Exploration vs. exploitation

Correct Answer :   Exploration vs. exploitation

A)
Q-Learning
B)
Policy Gradient
C)
Deep Q-Network (DQN)
D)
Monte Carlo Tree Search (MCTS)

Correct Answer :   Policy Gradient

A)
Yes
B)
No
C)
Can Not Say
D)
None of the above

Correct Answer :   Yes


Explanation : Yes, reinforcement learning follows the concept of the hit-and-try method.

A)
1
B)
2
C)
3
D)
4

Correct Answer :   3


Explanation : In three ways we can implement reinforcement learning:

* Value-based
* Policy-based
* Model-based

A)
On-policy
B)
Off-policy
C)
Both (A) and (B)
D)
None of the above

Correct Answer :   Off-policy


Explanation : Q-learning is based on an off-policy learning algorithm.

A)
On-policy
B)
Off-policy
C)
Both (A) and (B)
D)
None of the above

Correct Answer :   On-policy


Explanation : SARSA is based upon an on-policy learning algorithm.

A)
On-policy
B)
Off-policy
C)
Target policy
D)
behavior policy

Correct Answer :   On-policy


Explanation : On-policy type of policy is a learning algorithm in which the same policy is improved and evaluated.

38 .
Which of the following types of policy is a learning algorithm that evaluates and improves a policy that is dissimilar from the Policy that is used for action selection?
A)
On-policy
B)
Target policy
C)
behavior policy
D)
Off-policy

Correct Answer :   Off-policy


Explaination : Off-policy is a type of policy, is a learning algorithm that evaluates and improves a policy that is dissimilar from the Policy that is used for action selection.

A)
On-policy
B)
Off-policy
C)
Both (A) and (B)
D)
None of the above

Correct Answer :   Off-policy


Explanation : In an off-policy learning algorithm target policy is not equal to behavior policy.

A)
On-policy
B)
Off-policy
C)
Can Not Say
D)
None of the above

Correct Answer :   On-policy


Explanation : In the on-policy learning algorithm target policy is equal to behavior policy.

A)
1
B)
2
C)
3
D)
4

Correct Answer :   2


Explanation : There are two types of policy-based approaches:

* Deterministic
* Stochastic

A)
1
B)
2
C)
3
D)
4

Correct Answer :   4


Explanation : Mainly there are four types of reinforcement learning :

* Policy
* Reward Signal
* Value Function
* Model of the environment

A)
Null
B)
Minimize
C)
Maximize
D)
None of the above

Correct Answer :   Maximize


Explanation : The agent's main objective is to maximize the total number of rewards for good actions.

A)
Deterministic
B)
Stochastic
C)
Can Not Say
D)
None of the above

Correct Answer :   Stochastic


Explanation : Stochastic is a synonym for random and probabilistic variables.

A)
Value-based
B)
Policy-based
C)
Model-based
D)
None of the above

Correct Answer :   Value-based


Explanation : In a Value-based approach to reinforcement learning, we find the optimal value function.

A)
Model-based
B)
Value-based
C)
Policy-based
D)
None of the above

Correct Answer :   Model-based


Explanation : Model-based approach of reinforcement learning, a virtual model is created for the environment.

A)
State act reward act
B)
State act reward achievement
C)
State achievement rewards state action
D)
State action reward state action

Correct Answer :   State action reward state action


Explanation : SARSA stands for State action reward state action.

A)
On-policy
B)
Off-policy
C)
Target policy
D)
behavior policy

Correct Answer :   Target policy


Explanation : A target policy is a type of policy that an agent is trying to learn.

A)
On-policy
B)
Off-policy
C)
Target policy
D)
behavior policy

Correct Answer :   behavior policy


Explanation : Behavior policy is used by an agent for action selection.

A)
3
B)
4
C)
5
D)
6

Correct Answer :   4


Explanation : MDP consists of 4 tuples :

* A set of finite States S
* A set of finite Actions A
* Rewards received after transitioning from state S to state S', due to action a.
* Probability Pa.

A)
We use MDP to formalize the reinforcement learning problems.
B)
We use MDP to predict reinforcement learning problems.
C)
We use MDP to analyze the reinforcement learning problems.
D)
None of the above

Correct Answer :   We use MDP to formalize the reinforcement learning problems.


Explanation : We use MDP to formalize the reinforcement learning problems.

52 .
Which of the following algorithms will find the best course of action, based on the agent's current state, without using a model and off-policy reinforcement learning?
A)
Q-learning
B)
Markov property
C)
Deep Q neural network
D)
State action reward state action

Correct Answer :   Q-learning


Explaination : A Q-learning algorithm will find the best course of action, based on the agent's current state, without using a model and off-policy reinforcement learning.

A)
Markov discount process
B)
Markov deciding procedure
C)
Markov decision process
D)
Markov discount procedure

Correct Answer :   Markov decision process


Explanation : MDP stands for Markov decision process.

A)
Policy
B)
Reward Signal
C)
Value Function
D)
Model of the environment

Correct Answer :   Reward Signal


Explanation : Reinforcement learning is defined by the Reward signal.

A)
Policy
B)
Reward Signal
C)
Value Function
D)
Model of the environment

Correct Answer :   Policy


Explanation : Policy elements in reinforcement learning define the behavior of the agent.

A)
Policy
B)
Reward Signal
C)
Model of the environment
D)
Value Function

Correct Answer :   Value Function


Explanation : On the value function, the reward that the agent can expect is dependent.

A)
Alfonso Shimbel
B)
Edsger W. Dijkstra
C)
Richard Ernest Bellman
D)
None of the above

Correct Answer :   Richard Ernest Bellman


Explanation : Richard Ernest Bellman introduced the Bellman equation.

58 .
P[St+1 | St ] = P[St +1 | S1,......, St], in this condition
What is the meaning of St?
A)
State factor
B)
Markov state
C)
Discount factor
D)
None of the above

Correct Answer :   Markov state


Explaination : P[St+1 | St ] = P[St +1 | S1,......, St], in the following condition St represents the Markov state.

A)
Markov state
B)
Discount state
C)
Discount factor
D)
None of the above

Correct Answer :   Markov state


Explanation : Represent the agent state in reinforcement learning Markov state.

A)
Value factor
B)
Discount factor
C)
Environment factor
D)
None of the above

Correct Answer :   Discount factor


Explanation : Gamma (γ) in the bellman equation is known as the Discount factor.