Google News
Deep Learning Interview Questions
Deep learning is a subset of machine learning that is entirely based on artificial neural networks. These neural networks attempt to simulate the behavior of the human brain—albeit far from matching its ability—allowing it to “learn” from large amounts of data. While a neural network with a single layer can still make approximate predictions, additional hidden layers can help to optimize and refine for accuracy.
Deep learning drives many artificial intelligence (AI) applications and services that improve automation, performing analytical and physical tasks without human intervention. Deep learning technology lies behind everyday products and services (such as digital assistants, voice-enabled TV remotes, and credit card fraud detection) as well as emerging technologies (such as self-driving cars).
If you are a deep learning engineer, you need to have a thorough understanding not only of coding but also of each of the components that go into creating a successful deep learning algorithm. 
Example: "The primary function of a neural network is to receive a set of inputs, perform complex calculations and then use the output to solve the problem. A neural network is used for a range of applications. One example is classification; there are many classifiers available today, such as random forest, decision trees, support vector, logistic regression and so on, and of course neural networks."
The most common Neural Networks consist of three network layers :
* An input layer
* A hidden layer (this is the most important layer where feature extraction takes place, and adjustments are made to train faster and function better)
* An output layer 
Each sheet contains neurons called “nodes,” performing various operations. Neural Networks are used in deep learning algorithms like CNN, RNN, GAN, etc.
Following are some of the applications of deep learning :
* Pattern recognition and natural language processing.
* Recognition and processing of images.
* Automated translation.
* Analysis of sentiment.
* System for answering questions.
* Classification and Detection of Objects.
* Handwriting Generation by Machine.
* Automated text generation.
* Colorization of Black and White images.
Following are the advantages of neural networks :
* Neural networks are extremely adaptable, and they may be used for both classification and regression problems, as well as much more complex problems. Neural networks are also quite scalable. We can create as many layers as we wish, each with its own set of neurons. When there are a lot of data points, neural networks have been shown to generate the best outcomes. They are best used with non-linear data such as images, text, and so on. They can be applied to any data that can be transformed into a numerical value.
* Once the neural network mode has been trained, they deliver output very fast. Thus, they are time-effective.
Following are the disadvantages of neural networks :
* The "black box" aspect of neural networks is a well-known disadvantage. That is, we have no idea how or why our neural network produced a certain result. When we enter a dog image into a neural network and it predicts that it is a duck, we may find it challenging to understand what prompted it to make this prediction.

* It takes a long time to create a neural network model.

* Neural networks models are computationally expensive to build because a lot of computations need to be done at each layer.

* A neural network model requires significantly more data than a traditional machine learning model to train.
Supervised learning is a system in which both input and desired output data are provided. Input and output data are labeled to provide a learning basis for future data processing.

Unsupervised procedure does not need labeling information explicitly, and the operations can be carried out without the same. The common unsupervised learning method is cluster analysis. It is used for exploratory data analysis to find hidden patterns or grouping in data.
Both shallow and deep networks are good enough and capable of approximating any function. But for the same level of accuracy, deeper networks can be much more efficient in terms of computation and number of parameters. Deeper networks can create deep representations. At every layer, the network learns a new, more abstract representation of the input.
Overfitting is the most common issue which occurs in deep learning. It usually occurs when a deep learning algorithm apprehends the sound of specific data. It also appears when the particular algorithm is well suitable for the data and shows up when the algorithm or model represents high variance and low bias.
Backpropagation is a training algorithm which is used for multilayer neural networks. It transfers the error information from the end of the network to all the weights inside the network. It allows the efficient computation of the gradient.
Backpropagation can be divided into the following steps : 
* It can forward propagation of training data through the network to generate output.
* It uses target value and output value to compute error derivative concerning output activations.
* It can backpropagate to compute the derivative of the error concerning output activations in the previous layer and continue for all hidden layers.
* It uses the previously calculated derivatives for output and all hidden layers to calculate the error derivative concerning weights.
* It updates the weights.
Machine Learning forms a subset of Artificial Intelligence, where we use statistics and algorithms to train machines with data, thereby, helping them improve with experience.
Deep Learning is a part of Machine Learning, which involves mimicking the human brain in terms of structures called neurons, thereby, forming neural networks.
A perceptron is similar to the actual neuron in the human brain. It receives inputs from various entities and applies functions to these inputs, which transform them to be the output.
A perceptron is mainly used to perform binary classification where it sees an input, computes functions based on the weights of the input, and outputs the required transformation.
Machine Learning is powerful in a way that it is sufficient to solve most of the problems. However, Deep Learning gets an upper hand when it comes to working with data that has a large number of dimensions. With data that is large in size, a Deep Learning model can easily work with it as it is built to handle this.
Artificial Intelligence is a technique that enables machines to mimic human behavior.
Machine Learning is a subset of AI technique which uses statistical methods to enable machines to improve with experience.

Differentiate between AI, Machine Learning and Deep Learning.

Deep learning is a subset of ML which make the computation of multi-layer neural network feasible. It uses Neural networks to simulate human-like decision making.
Activation function translates the inputs into outputs. Activation function decides whether a neuron should be activated or not by calculating the weighted sum and further adding bias with it. The purpose of the activation function is to introduce non-linearity into the output of a neuron.
There can be many Activation functions like :
* Linear or Identity
* Unit or Binary Step
* Sigmoid or Logistic
* Tanh
* ReLU
* Softmax
Data visualisation libraries help in understanding complex ideas by using visual elements such as graphs, charts, maps and more. The visualisation tools help you to recognise patterns, trends, outliers and more, making it possible to design your data according to the requirement. Popular data visualisation libraries include D3, React-Vis, Chart.js, vx, and more.
Overfitting is a type of modelling error which results in the failure to predict future observations effectively or fit additional data in the existing model. It occurs when a function is too closely fit to a limited set of data points and usually ends with more parameters than the data can accommodate. It is common for huge data sets to have some anomalies, so when this data is used for any kind of modelling, it can result in inaccuracies in the analysis.
What is overfitting?
Overfitting can be prevented by following a few methods namely :
Cross-validation : Where the initial training data is split into several mini-test sets and each mini data set is used to tune the model.

Remove features :
Remove irrelevant features manually from the algorithms and use feature selection heuristics to identify the important features

Regularisation :
This involves various ways of making your model simpler so that there’s little room for error due to obscurity. Adding penalty parameters and pruning your decision tree are ways of doing that.

Ensembling :
These are machine learning techniques for combining multiple separate predictions. The most popular methods of ensembling are bagging and boosting.
The deep learning frameworks and tools are :
* Keras
* TensorFlow
* PyTorch
* Theano
* Caffe2
* MXNet
The supervised learning algorithms are :
* Artificial Neural Network (ANN)
* Perceptron (single and multi-layer)
* Convolution Neural Network (CNN)
* Recurrent Neural Network (RNN)

The unsupervised learning algorithms are :
* Autoencoders
* Self-Organizing Maps (SOMs)
* Boltzmann Machine
* Generative adversarial networks (GANs)
Single layer perceptron is the first proposed neural model created. The content of the local memory of the neuron consists of a vector of weights. The computation of a single layer perceptron is performed over the calculation of sum of the input vector each with the value multiplied by corresponding element of vector of the weights. The value which is displayed in the output will be the input of an activation function.
Single Layer Perceptron
As in Neural Networks, MLPs have an input layer, a hidden layer, and an output layer. It has the same structure as a single layer perceptron with one or more hidden layers. A single layer perceptron can classify only linear separable classes with binary output (0,1), but MLP can classify nonlinear classes.

Multi Layer Perceptron

Except for the input layer, each node in the other layers uses a nonlinear activation function. This means the input layers, the data coming in, and the activation function is based upon all nodes and weights being added together, producing the output. MLP uses a supervised learning method called “backpropagation.” In backpropagation, the neural network calculates the error with the help of cost function. It propagates this error backward from where it came (adjusts the weights to train the model more accurately).
One of the most basic Deep Learning models is a Boltzmann Machine, resembling a simplified version of the Multi-Layer Perceptron. This model features a visible input layer and a hidden layer -- just a two-layer neural net that makes stochastic decisions as to whether a neuron should be on or off. Nodes are connected across layers, but no two nodes of the same layer are connected.
Also referred to as “loss” or “error,” cost function is a measure to evaluate how good your model’s performance is. It’s used to compute the error of the output layer during backpropagation. We push that error backward through the neural network and use that during the different training functions.

Cost Function
Gradient Descent is an optimal algorithm to minimize the cost function or to minimize an error. The aim is to find the local-global minima of a function. This determines the direction the model should take to reduce the error.

Gradient Descent
This network is made up of numerous tiny neural networks, rather than being a single network. The sub-networks combine to form a larger neural network, which operates independently to achieve a common goal. These networks are extremely useful for breaking down a large-small problem into smaller chunks and then solving it.

Modular Neural Network
Convolutional Neural Networks are mostly used in computer vision. In contrast to fully linked layers in MLPs, one or more convolution layers extract simple characteristics from input by performing convolution operations in CNN models. Each layer is made up of nonlinear functions of weighted sums at various coordinates of spatially close subsets of the previous layer's outputs, allowing the weights to be reused.

The AI system learns to automatically extract the properties of these inputs to fulfill a specific task, such as picture classification, face identification, and image semantic segmentation, given a sequence of images or videos from the actual world.

Convolutional Neural Network (CNN)
Recurrent Neural Networks were created to solve the sequential input data time-series problem. RNN's input is made up of the current input and prior samples. As a result, the node connections create a directed graph. Furthermore, each neuron in an RNN has an internal memory that stores the information from previous samples' computations. Because of their superiority in processing data with a variable input length, RNN models are commonly employed in natural language processing (NLP). The goal of AI in this case is to create a system that can understand human-spoken natural languages, such as natural language modeling, word embedding, and machine translation.

Each successive layer in an RNN is made up of nonlinear functions of weighted sums of outputs and the preceding state. As a result, the basic unit of RNN is termed "cell," and each cell is made up of layers and a succession of cells that allow recurrent neural network models to be processed sequentially.

Recurrent Neural Network (RNN)
Fourier transform package is highly efficient for analyzing, maintaining, and managing a large databases. The software is created with a high-quality feature known as the special portrayal. One can effectively utilize it to generate real-time array data, which is extremely helpful for processing all categories of signals.
In neural networking, weight initialization is one of the essential factors. A bad weight initialization prevents a network from learning. On the other side, a good weight initialization helps in giving a quicker convergence and a better overall error. Biases can be initialized to zero. The standard rule for setting the weights is to be close to zero without being too small.
If the set of weights in the network is put to a zero, then all the neurons at each layer will start producing the same output and the same gradients during backpropagation.
As a result, the network cannot learn at all because there is no source of asymmetry between neurons. That is the reason why we need to add randomness to the weight initialization process.
Now, this can be answered in two ways. If you are on a phone interview, you cannot perform all the calculus in writing and show the interviewer. In such cases, it best to explain it as such:
Forward propagation : The inputs are provided with weights to the hidden layer. At each hidden layer, we calculate the output of the activation at each node and this further propagates to the next layer till the final output layer is reached. Since we start from the inputs to the final output layer, we move forward and it is called forward propagation

Backpropagation : We minimize the cost function by its understanding of how it changes with changing the weights and biases in a neural network. This change is obtained by calculating the gradient at each hidden layer (and using the chain rule). Since we start from the final cost function and go back each hidden layer, we move backward and thus it is called backward propagation
Deep Learning goes right from the simplest data structures like lists to complicated ones like computation graphs.
Here are the most common ones :
List : An ordered sequence of elements (You can also mention NumPy ndarrays here)

Matrix : An ordered sequence of elements with rows and columns

Dataframe : A dataframe is just like a matrix, but it holds actual data with the column names and rows denoting each datapoint in your dataset. If marks of 100 students, their grades, and their details are stored in a dataframe, their details are stored as columns. Each row will represent the data of each of the 100 students

Tensors : You will work with them on a daily basis if you have ventured into deep learning. Used both in PyTorch and TensorFlow, tensors are like the basic programming unit of deep learning. Just like multidimensional arrays, we can perform numerous mathematical operations on them. Read more about tensors here

Computation Graphs : Since deep learning involves multiple layers and often hundreds, if not thousands of parameters, it is important to understand the flow of computation. A computation graph is just that. A computation graph gives us the sequence of operations performed with each node denoting an operation or a component in the neural network
The softmax function is used to calculate the probability distribution of the event over 'n' different events. One of the main advantages of using softmax is the output probabilities range. The range will be between 0 to 1, and the sum of all the probabilities will be equal to one. When the softmax function is used for multi-classification model, it returns the probabilities of each class, and the target class will have a high probability.
Swish is a new, self-gated activation function. Researchers at Google discovered the Swish function. According to their paper, it performs better than ReLU with a similar level of computational efficiency.
Autoencoder is an artificial neural network. It can learn representation for a set of data without any supervision. The network automatically learns by copying its input to the output; typically,internet representation consists of smaller dimensions than the input vector. As a result, they can learn efficient ways of representing the data. Autoencoder consists of two parts; an encoder tries to fit the inputs to the internal representation, and a decoder converts the internal state to the outputs.
Dropout is a cheap regulation technique used for reducing overfitting in neural networks. We randomly drop out a set of nodes at each training step. As a result, we create a different model for each training case, and all of these models share weights. It's a form of model averaging.
Tensors are nothing but a de facto for representing the data in deep learning. They are just multidimensional arrays, which allows us to represent the data having higher dimensions. In general, we deal with high dimensional data sets where dimensions refer to different features present in the data set.
A Boltzmann machine (also known as stochastic Hopfield network with hidden units) is a type of recurrent neural network. In a Boltzmann machine, nodes make binary decisions with some bias. Boltzmann machines can be strung together to create more sophisticated systems such as deep belief networks. Boltzmann Machines can be used to optimize the solution to a problem.
Some important points about Boltzmann Machine :
* It uses a recurrent structure.
* It consists of stochastic neurons, which include one of the two possible states, either 1 or 0.
* The neurons present in this are either in an adaptive state (free state) or clamped state (frozen state).
* If we apply simulated annealing or discrete Hopfield network, then it would become a Boltzmann Machine.
The loss function is used as a measure of accuracy to see if a neural network has learned accurately from the training data or not. This is done by comparing the training dataset to the testing dataset.
The loss function is a primary measure of the performance of the neural network. In Deep Learning, a good performing network will have a low loss function at all times when training.
Autoencoders are artificial neural networks that learn without any supervision. Here, these networks have the ability to automatically learn by mapping the inputs to the corresponding outputs.
Autoencoders, as the name suggests, consist of two entities :
* Encoder : Used to fit the input into an internal computation state
* Decoder : Used to convert the computational state back into the output
There are five main steps that are used to initialize and use the gradient descent algorithm :
* Initialize biases and weights for the network
* Send input data through the network (the input layer)
* Calculate the difference (the error) between expected and predicted values
* Change values in neurons to minimize the loss function
* Multiple iterations to determine the best weights for efficient working
Hyperparameters are variables used to determine the structure of a neural network. They are also used to understand parameters, such as the learning rate and the number of hidden layers, and more, present in the neural network.
Hyperparameters can be trained using four components as shown below :
Batch size : This is used to denote the size of the input chunk. Batch sizes can be varied and cut into sub-batches based on the requirement.

Epochs : An epoch denotes the number of times the training data is visible to the neural network so that it can train. Since the process is iterative, the number of epochs will vary based on the data.

Momentum : Momentum is used to understand the next consecutive steps that occur with the current data being executed at hand. It is used to avoid oscillations when training.

Learning rate : Learning rate is used as a parameter to denote the time required for the network to update the parameters and learn.
Transfer learning is a learning technique that allows data scientists to use what they've learned from a previous machine learning model that was used for a similar task. The ability of humans to transfer their knowledge is used as an example in this learning. You can learn to operate other two-wheeled vehicles more simply if you learn to ride a bicycle. A model trained for autonomous automobile driving can also be used for autonomous truck driving. The features and weights can be used to train the new model, allowing it to be reused. When there is limited data, transfer learning works effectively for quickly training a model.
transfer learning
In the above image, the first diagram represents training a model from scratch while the second diagram represents using a model already trained on cats and dogs to classify the different class of vehicles, thereby representing transfer learning.
Following are the advantages of transfer learning :
Better initial model : In other methods of learning, you must create a model from scratch. Transfer learning is a better starting point because it allows us to perform tasks at a higher level without having to know the details of the starting model.

Higher learning rate : Because the problem has already been taught for a similar task, transfer learning allows for a faster learning rate during training.

Higher accuracy after training : Transfer learning allows a deep learning model to converge at a higher performance level, resulting in more accurate output, thanks to a better starting point and higher learning rate.
A tensor is a multidimensional array that represents a generalization of vectors and matrices. It is one of the key data structures used in deep learning. Tensors are represented as n-dimensional arrays of base data types. The data type of each element in the Tensor is the same, and the data type is always known. It's possible that only a portion of the shape (that is, the number of dimensions and the size of each dimension) is known. Most operations yield fully-known tensors if their inputs are likewise fully known, however, in other circumstances, the shape of a tensor can only be determined at graph execution time.

Tensor in Deep Learning
The LSTM model is considered a special case of RNNs. The problems of vanishing gradients and exploding gradients we saw earlier are a disadvantage while using the plain RNN model.
In LSTMs, we add a forget gate, which is basically a memory unit that retains information that is retained across timesteps and discards the other information that is not needed. This also necessitates the need for input and output gates to include the results of the forget gate as well.
As you can see, the LSTM model can become quite complex. In order to still retain the functionality of retaining information across time and yet not make a too complex model, we need GRUs.
Basically, in GRUs, instead of having an additional Forget gate, we combine the input and Forget gates into a single Update Gate :
It is this reduction in the number of gates that makes GRU less complex and faster than LSTM.
An optimization algorithm that is used to minimize some function by repeatedly moving in the direction of steepest descent as specified by the negative of the gradient is known as gradient descent. It's an iteration algorithm, in every iteration algorithm, we compute the gradient of a cost function, concerning each parameter and update the parameter of the function via the following formula:
Gradient Descent
Θ - is the parameter vector,
α - learning rate,
J(Θ) - is a cost function
In machine learning, it is used to update the parameters of our model. Parameters represent the coefficients in linear regression and weights in neural networks.
Stochastic Gradient Descent : Stochastic gradient descent is used to calculate the gradient and update the parameters by using only a single training example.

Batch Gradient Descent :
Batch gradient descent is used to calculate the gradients for the whole dataset and perform just one update at each iteration.

Mini-batch Gradient Descent :
Mini-batch gradient descent is a variation of stochastic gradient descent. Instead of a single training example, mini-batch of samples is used. Mini-batch gradient descent is one of the most popular optimization algorithms.
* It is computationally efficient compared to stochastic gradient descent.
* It improves generalization by finding flat minima.
* It improves convergence by using mini-batches. We can approximate the gradient of the entire training set, which might help to avoid local minima.
TensorFlow has numerous advantages, and some of them are as follows :
* Open-source
* Trains using CPU and GPU
* Has a large community
* High amount of flexibility and platform independence
* Supports auto differentiation and its features
* Handles threads and asynchronous computation easily
A Restricted Boltzmann Machine, or RBM for short, is an undirected graphical model that is popularly used in Deep Learning today. It is an algorithm that is used to perform:
* Dimensionality reduction
* Regression
* Classification
* Collaborative filtering
* Topic modeling
Leaky ReLU, also called LReL, is used to manage a function to allow the passing of small-sized negative values if the input value to the network is less than zero.
With the use of sequential processing, programmers were up against :
* The usage of high processing power
* The difficulty of parallel execution
* This caused the rise of the transformer architecture. Here, there is a mechanism called attention mechanism, which is used to map all of the dependencies between * sentences, thereby making huge progress in the case of NLP models.
The procedure of developing an assumption structure involves three specific actions.
* The first step contains algorithm development. This particular process is lengthy.
* The second step contains algorithm analyzing, which represents the in-process methodology.
* The third step is about implementing the general algorithm in the final procedure. The entire framework is interlinked and required for throughout the process.
An epoch is a terminology used in deep learning that refers to the number of passes the deep learning algorithm has made across the full training dataset. Batches are commonly used to group data sets (especially when the amount of data is very large). The term "iteration" refers to the process of running one batch through the model.
The number of epochs equals the number of iterations if the batch size is the entire training dataset. This is frequently not the case for practical reasons. Several epochs are used in the creation of many models.
There is a general relation which is given by :
d * e = i * b
d - is the dataset size
e - is the number of epochs
i - is the number of iterations
b - is the batch size
Yes, if the problem is represented by a linear equation, deep networks can be built using a linear function as the activation function for each layer. A problem that is a composition of linear functions, on the other hand, is a linear function, and there is nothing spectacular that can be accomplished by implementing a deep network because adding more nodes to the network will not boost the machine learning model's predictive capacity.
Backpropagation in Recurrent Neural Networks differ from that of Artificial Neural Networks in the sense that each node in Recurrent Neural Networks has an additional loop as shown in the following image:
Recurrent Neural Network
This loop, in essence, incorporates a temporal component into the network. This allows for the capture of sequential information from data, which is impossible with a generic artificial neural network.
Following are the applications of autoencoders :
Image Denoising : Denoising images is a skill that autoencoders excel at. A noisy image is one that has been corrupted or has a little amount of noise (that is, random variation of brightness or color information in images) in it. Image denoising is used to gain accurate information about the image's content.

Dimensionality Reduction : The input is converted into a reduced representation by the autoencoders, which is stored in the middle layer called code. This is where the information from the input has been compressed, and each node may now be treated as a variable by extracting this layer from the model. As a result, we can deduce that by removing the decoder, an autoencoder can be utilised for dimensionality reduction, with the coding layer as the output.

Feature Extraction : The encoding section of Autoencoders aids in the learning of crucial hidden features present in the input data, lowering the reconstruction error. During encoding, a new collection of original feature combinations is created.

Image Colorization : Converting a black-and-white image to a coloured one is one of the applications of autoencoders. We can also convert a colourful image to grayscale.

Data Compression : Autoencoders can be used for data compression. Yet they are rarely used for data compression because of the following reasons:

* Lossy compression : The autoencoder's output is not identical to the input, but it is a near but degraded representation. They are not the best option for lossless compression.

* Data-specific : Autoencoders can only compress data that is identical to the data on which they were trained. They differ from traditional data compression algorithms like jpeg or gzip in that they learn features relevant to the provided training data. As a result, we can't anticipate a landscape photo to be compressed by an autoencoder trained on handwritten digits.
Capsules are a vector specifying the features of the object and its likelihood. These features can be any of the instantiation parameters like position, size, orientation, deformation, velocity, hue, texture and much more.
capsule neural network
A capsule can also specify its attributes like angle and size so that it can represent the same generic information. Now, just like a neural network has layers of neurons, a capsule network can have layers of capsules.
The layer between the encoder and decoder, ie. the code is also known as Bottleneck. This is a well-designed approach to decide which aspects of observed data are relevant information and what aspects can be discarded.


It does this by balancing two criteria :
* Compactness of representation, measured as the compressibility.
* It retains some behaviourally relevant variables from the input.
Autoencoder is a simple 3-layer neural network where output units are directly connected back to input units. Typically, the number of hidden units is much less than the number of visible ones. The task of training is to minimize an error or reconstruction, i.e. find the most efficient compact representation for input data.

Restricted Boltzmann Machine

RBM shares a similar idea, but it uses stochastic units with particular distribution instead of deterministic distribution. The task of training is to find out how these two sets of variables are actually connected to each other.
One aspect that distinguishes RBM from other autoencoders is that it has two biases. The hidden bias helps the RBM produce the activations on the forward pass, while The visible layer’s biases help the RBM learn the reconstructions on the backward pass.
Generative adversarial networks are used to achieve generative modeling in Deep Learning. It is an unsupervised task that involves the discovery of patterns in the input data to generate the output.
The generator is used to generate new examples, while the discriminator is used to classify the examples generated by the generator.
Generative adversarial networks are used for a variety of purposes. In the case of working with images, they have a high amount of traction and efficient working.
Creation of art : GANs are used to create artistic images, sketches, and paintings.

Image enhancement :
They are used to greatly enhance the resolution of the input images.

Image translation :
They are also used to change certain aspects, such as day to night and summer to winter, in images easily.
Traditional machine learning refers to a set of algorithms and approaches that have been widely used for many years in a variety of applications, such as linear regression, logistic regression, decision trees, and random forests, among others. These algorithms make use of hand-engineered features and rely on feature engineering to extract relevant information from the data.

Deep learning, on the other hand, is a subfield of machine learning that uses artificial neural networks with multiple layers (hence "deep") to learn complex representations of the input data. In deep learning, the features are learned automatically by the network, rather than being hand-engineered by the programmer. This allows deep learning models to automatically extract high-level features from raw data, such as images, audio, and text, and to make predictions based on those features.
The main difference between deep learning and traditional machine learning is the level of abstraction in the learned representations. Deep learning models learn hierarchical representations of the data, with each layer learning increasingly higher-level features. In contrast, traditional machine learning models typically only learn a single level of features.

Another key difference is the amount of data required to train a model. Deep learning models require large amounts of data to train effectively, while traditional machine learning models can often be trained with smaller amounts of data.

Simple Answer :  Deep learning is a subfield of machine learning that uses deep artificial neural networks to learn high-level representations of the data, while traditional machine learning algorithms make use of hand-engineered features and require relatively small amounts of data to train.