Pytorch Interview Questions and Answers

PyTorch is a part of computer software based on torch library, which is an open-source Machine learning library for Python. It is a deep learning framework which was developed by the Facebook artificial intelligence research group. It is used for the application such as Natural Language Processing and Computer Vision.
The features of PyTorch are as follows :
 
* Easy interface : PyTorch offers easy to use API, and it is straightforward to operate and run on Python. The code execution is smooth.
* Python usage : This library is considered to be Pythonic, which smoothly integrates the Python data science stack.
* Computational Graphs : PyTorch provides an excellent platform which offers dynamic computational graphs. So that a user can change them during runtime, this is more useful when a developer has no idea of how much memory is required for creating a neural network model.
* Imperative Programming : PyTorch performs computations through each line of the written code. This is similar to Python program execution.
There are three levels of abstraction, which are as follows :
 
* Tensor : Tensor is an imperative n-dimensional array which runs on GPU.
* Variable : It is a node in the computational graph. This stores data and gradient.
* Module : Neural network layer will store state the otherwise learnable weight.
Tensors play an important role in deep learning with PyTorch. In simple words, we can say, this framework is completely based on tensors. A tensor is treated as a generalized matrix. It could be a 1D tensor (vector), 2D tensor(matrix), 3D tensor(cube) or 4D tensor(cube vector).
There are three ways by which we can create the Tensor. Each one has a different way to createTensor and use a different method. Tensors are created as :
 
* Create Tensor from an array
* Create Tensor with all ones and random number
* Create Tensor from numpy array
We can't say that tensor and matrix are the same. Tensor has some properties through which we can say both have some similarities such as we can perform all the mathematical operation of the matrix in tensor.
 
A tensor is a mathematical entity which lives in a structure and interacts with other mathematical entity. If we transform the other entities in the structure in a regular way, then the tensor will obey a related transformation rule. This dynamical property of tensor makes it different from the matrix.
The torch.from_numpy() is one of the important property of torch which places an important role in tensor programming. It is used to create a tensor from numpy.ndarray. The ndarray and return tensor share the same memory. If we do any changes in the returned tensor, then it will reflect the ndaaray also.
PyTorch is the essential part of deep learning tool. Deep learning is a subset of machine learning, which algorithm works on the human brain. These algorithms are known as artificial neural networks. That neural network is used for image classification, Artificial Neural Networks, and Recurrent Neural Networks. Unlike other libraries like TensorFlow where you have first to define an entire computation graph before you can run your model.
* Stochastic Gradient Descent : Here, we use only a single training example for calculation of gradient and parameters.
* Batch Gradient Descent : We calculate the gradient for the whole dataset and perform the update at each iteration.
* Mini-batch Gradient Descent : It’s a variant of Stochastic Gradient Descent and here instead of single training example, mini-batch of samples is used.
Backpropagation is a training algorithm used for multiple users for a many layer neural network. In this method, we move the error into the end of the net to all weights inside the system and allowing efficient calculation of the gradient.
 
It is divided into several steps as follows :
 
* Forward propagation of training data to generate output.
* Then by using target value and output value error derivative can be computed concerning output activation.
* Then we back produce for computing derivative of the error concerning output activation on previous and continue this for all the hidden layers.
* Using previously solved derivatives for output and all the hidden layers, we calculate error derivatives.
* And then we update the weights.
Variable is a package which is used to wrap a tensor. The autograd.variable is the central class for the package. The torch.autograd provides classes and functions for implementing automatic differentiation of arbitrary scalar-valued functions. It needs minimal changes to the existing code. We only need to declare tensor for which gradients should be computed with the requires_grad=True keyword.
The derivatives of the function are calculated with the help of the Gradient. There are four simple steps through which we can calculate derivative easily.
 
These steps are as follows :
 
* Initialization of the function for which we will calculate the derivatives.
* Set the value of the variable which is used in the function.
* Compute the derivative of the function by using the backward () method.
* Print the value of the derivative using grad.
The benefits of mini-batch gradient descent are as follow :
 
* It is more efficient compared to stochastic gradient descent.
* The generalization is maintained by discovering the flat minima.
* Mini-batches allow help to approximate the gradient of the entire training set, which helps us to avoid local minima.
An auto-encoder is a self-government machine learning algorithm that uses the backpropagation principle, where the target values are equal to the inputs provided. Internally, it has a hidden layer that manages a code used to represent the input.
Gradient descent is an optimization algorithm, which is used to learn the value of parameters that controls the cost function. It is a repetitive algorithm which moves in the direction of vertical descent as defined by the negative of the gradient.
Mathematics is a vital role in any machine learning algorithm and includes many core concepts of mathematics to get the right algorithm.

The essential elements of machine learning and data science are as follows :

Vectors : Vector is considered to be an array of numbers which is continuous or discrete, and space which consists of vectors is called a vector space.
Scalers : Scalers are termed to have zero dimensions containing only one value. When it comes to PyTorch, it does not include a particular tensor with zero dimensions.
Matrices : In matrices, most of the structured data is usually represented in the form of tables or a specific model.
There is an automatic differentiation technique used in PyTorch. This technique is more powerful when we are building a neural network. There is a recorder which records what operations we have performed, and then it replays it backs to compute our gradient.
Linear Regression is a technique or way to find the linear relation between the dependent variable and the independent variable by minimizing the distance. It is a supervised machine learning approach which is used for classification of order discrete category.
The loss function is bread and butter for machine learning. It is quite simple to understand and used to evaluate how well our algorithm models our dataset. If our prediction is completely off, then the function will output a higher number else it will output a lower number.
Torch.optim is a module that implements various optimization algorithm used for building neural networks. Most of the commonly used syntax is already supported.
 
Below is the code of Adam optimizer

Optimizer = torch.optim.Adam(mode1, parameters( ), lr=learning rate
MSE stands for Mean Squared Error, which is used to create a criterion the measures the mean squared error between each element in an input x and target y. The CTCLoss stands for Connectionist Temporal Classification Loss, which is used to calculate the loss between continuous time series and target sequence. The BCELoss(Binary Cross Entropy) is used to create a criterion to measures the Binary Cross Entropy between the target and the output.
The torch.nn provide us many more classes and modules to implement and train the neural network. The torch.nn.functional contains some useful function like activation function and convolution operation, which we can use. However, these are not full layers, so if we want to define a layer of any kind, we have to use torch.nn.
nn module : The nn package define a set of modules, which are thought of as a neural network layer that produce output from the input and have some trainable weights.
 
It is a type of tensor that considers a module parameter. Parameters are tensors subclasses. A fully connected ReLU networks where one hidden layer, trained to predict y from x to minimizing the square distance.
 
Example :
Import torch
# define mode1
model= torch.nn.Sequential(
torch.nn.Linear(hidden_num_units, hidden_num_units),
torch.nn.ReLU( ),
torch.nn.Linear(hidden_num_units, output_num_units),
)
loss_fn= torch.nn.crossEntropyLoss( )
Neural Network and Deep Neural Network both are similar and do the same thing. The difference between NN and DNN is that there can be only one hidden layer in the neural network, but in a deep neural network, there is more than one hidden layer. Hidden layers play an important role to make an accurate prediction.
Installing PyTorch with Anaconda and Conda
 
* Download Anaconda and install (Go with the latest Python version).
* Go to the Getting Started section on the PyTorch website through pytorch.org.
* Generate the appropriate configuration options for your particular environment. For example:

   * OS : Windows
   * Package Manager : condo
   * Python : 3.6
   * CUDA : 9.0

* Run the below command in the terminal(CMD) to install PyTorch.

For example, the configuration we specified in step(3), we have the following command:
> conda install PyTorch -c PyTorch
> pip3 install torchvision
A PyTorch implementation of the neural network looks the same as a NumPy implementation. The motive of this section is to showcase the similar nature of PyTorch and NumPy. For example: create a three-layered network having five nodes in the input layer, three in the hidden layer, and one in the output layer.
Import torch
n_input, n_hidden, n_output= 3, 4, 1
Collection of the object in a well-defined order is known as Array. And the linked list is also a set of objects but they are not in a well-defined form or remain in sequence, and also they have a pointer which is not in case of Array.
To determine the output of the neural network, we use the Activation Function. Its main task is to do mapping of resulting values in between 0 to 1 or -1 to 1 etc. The activation functions are basically divided into two types :
 
* Linear Activation Function
* Non-linear Activation Function
There is no big difference between the three of them. The Conv1d and Conv2D is used to apply 1D and 2D convolution. The Conv3D is used to apply 3D convolution over an input signal composed of several input planes.
"Feed-Forward" is a process through which we receive an input to produce some kind of output to make some kind of prediction. It is the core of many other important neural networks such as convolution neural network and deep neural network.
 
In the feed-forward neural network, there are no feedback loops or connections in the network. Here is simply an input layer, a hidden layer, and an output layer.
The deep neural network is a kind of neural network with many layers. "Deep" means that the neural network has a lot of layers which looks deep stuck of layers in the network. The convolutional neural network is another kind of deep neural network. The Convolutional Neural Network has a convolution layer, which is used filters to convolve an area in input data to a smaller area, detecting important or specific part within the area. The convolution can be used on the image as well as text.
Stochastic Gradient Descent : In SGD, we use only a single training example for calculation of gradient and parameters.
Batch Gradient Descent : In BGD, we calculate the gradient for the whole dataset and perform the updation at each iteration.
Mini-batch Gradient Descent : Mini-batch Gradient Descent is a variant of Stochastic Gradient Descent. In this gradient descent, we used mini-batch of samples instead of a single training example.
There are the following steps to check GPU usage :
 
* Use the window key + R to open the run command.
* Type the dxdiag.exe command and press enter to open DirectXDiagnostic Tool.
* Click on the Display tab.
* Under the drivers, on the right side, check the Driver model information.
It is a collection of the color image which is commonly used to train machine learning and computer vision algorithms. The CIFAR 10 dataset contains 50000 training images and 10000 validation images such that the images can be classified between 10 different classes.
The CIFAR 10 dataset contains 50000 training images and 10000 validation images such that the images can be classified between 10 different classes. On the other hand, CIFAR-100 has 100 classes, which contain 600 images per class. There are 100 testing images and 500 training images per class.
Pooling layer plays a crucial role in pre-processing of an image. Pooling layer reduces the number of parameters when the images are too large. Pooling is "downscaling" of the image which is obtained from the previous layers.
Max pooling is a sample-based discrete process whose main objective is to reduce its dimensionality, downscale an input representation. And allow for the assumption to be made about features contained in the sub-region binned.
Down-scaling will perform through average pooling by dividing the input into rectangular pooling regions and will compute the average values of each region.
The sub-region for sum pooling or mean pooling will set the same as for max-pooling but instead of using the max function we use sum or mean.
The MNIST dataset is used in Image Recognition. It is a database of various handwritten digits. The MNIST dataset has a large amount of data which is commonly used to demonstrate the true power of deep neural networks.