Google News
Neural Networks Interview Questions
There is considerable overlap between the fields of neural networks and statistics. Statistics is concerned with data analysis. In neural network terminology, statistical inference means learning to generalize from noisy data. Some neural networks are not concerned with data analysis (e.g., those intended to model biological systems) and therefore have little to do with statistics. Some neural networks do not learn (e.g., Hopfield nets) and therefore have little to do with statistics.

Some neural networks can learn successfully only from noise-free data (e.g., ART or the perceptron rule) and therefore would not be considered statistical methods. But most neural networks that can learn to generalize effectively from noisy data are similar or identical to statistical methods. For example:
* Feedforward nets with no hidden layer (including functional-link neural nets and higher-order neural nets) are basically generalized linear models.
* Feedforward nets with one hidden layer are closely related to projection pursuit regression.
* Probabilistic neural nets are identical to kernel discriminant analysis.
* Kohonen nets for adaptive vector quantization are very similar to k-means cluster analysis.
* Kohonen self-organizing maps are discrete approximations to principal curves and surfaces.
* Hebbian learning is closely related to principal component analysis.
* Convolutional neural networks (CNN) is mostly used in image classification but it can also be used for NLP.
* For NLP tasks the sentences are represented as matrices. The row of the matrix consists of a token (or a character).
* The filters of the CNN can be made to slide over the row of the matrix.
* The height may vary, but sliding the windows over 2-5 words is typical.
Convolutional Neural Networks
* A very strong correlation between the new feature and an existing feature is a fairly good sign that the new feature provides little new information. A low correlation between the new feature and existing features is likely preferable.
* A strong linear correlation between the new feature and the predicted variable is an good sign that a new feature will be valuable, but the absence of a high correlation is not necessary a sign of a poor feature, because neural networks are not restricted to linear combinations of variables.
* If the new feature was manually constructed from a combination of existing features, consider leaving it out. The beauty of neural networks is that little feature engineering and preprocessing is required -- features are instead learned by intermediate layers. Whenever possible, prefer learning features to engineering them.
* A multilayer perceptron is a type of neural network which has many layers of perceptron stacked on top of each other.

* Mathematically, multilayer perceptron are capable of learning any mapping function and have been proven to be a universal approximation algorithm.

* Single layer perceptron only learn linear patterns, while multilayer perceptron can learn complex relationships. This predictive capability comes from the multi-layered structure of the network, so that the features can be combined into higher-order features.

Multilayer Perceptron over a Single-layer Perceptron

Neural networks require too much data to train. A classification network may require thousands of examples in a single class for it to identify it in unseen data. Due to this, sometimes it is not feasible to create ANN models for fringe applications.

Neural networks are not interpretable. The user needs to input data into the network and it outputs the required output, but the work that goes into processing the input and giving an output is not understandable to human beings.

The power required to train the neural network is extremely high compared to the amount of power that a human brain uses (around 20 Watts) to do almost the same things such as image classification.
Parametric : SVM and neural networks are both parametric but for different reasons.
* For SVM the typical parameters are; soft-margin parameter (C), parameter of the kernel function (gamma).
* Neural networks also have parameters but it is a lot more than SVM. Some NN parameters are the number of layers and their size, number of training epochs, and the learning rate.

Embedding Non-Linearity : Both the methods can embed non-linear functions.
* SVM does this through the usage of kernel method.
* Neural Networks embed non-linearity using non-linear activation functions.

Comparable Accuracy :
* If both SVM and Neural Networks are trained in the same dataset, given the same training time, and the same computation power they have comparable accuracy.
* If neural networks are given as much computation power and training time as possible then it outperforms SVMs.
Recurrent Neural Networks(RNN) : 
* RNNs are ideal for solving problems where the sequence is more important than the individual items themselves.
* An RNNs is essentially a fully connected neural network that contains a refactoring of some of its layers into a loop. That loop is typically an iteration over the addition or concatenation of two inputs, a matrix multiplication and a non-linear function.

Natural Language Processing(NLP) :
* Natural Language Processing (NLP) is a sub-field of computer science and artificial intelligence, dealing with processing and generating natural language data. Although there is still research that is outside of the machine learning, most NLP is now based on language models produced by machine learning.
* NLP is a good use case for RNNs and is used in the article to explain how RNNs can be constructed.

Source : Towardsdatascience
Biological Neurons Artificial Neurons
Major components: Axions, Dendrites, Synapse Major Components: Nodes, Inputs, Outputs, Weights, Bias
Information from other neurons, in the form of electrical impulses, enters the dendrites at connection points called synapses. The information flows from the dendrites to the cell where it is processed. The output signal, a train of impulses, is then sent down the axon to the synapse of other neurons. The arrangements and connections of the neurons made up the network and have three layers. The first layer is called the input layer and is the only layer exposed to external signals. The input layer transmits signals to the neurons in the next layer, which is called a hidden layer. The hidden layer extracts relevant features or patterns from the received signals. Those features or patterns that are considered important are then directed to the output layer, which is the final layer of the network.
A synapse is able to increase or decrease the strength of the connection. This is where information is stored. The artificial signals can be changed by weights in a manner similar to the physical changes that occur in the synapses.
Approx 1011 neurons. 102– 104 neurons with current technology
Human Brain(Biological Neuron Network) Computers(Artificial Neuron Network)
The human brain works asynchronously Computers(ANN) work synchronously.
Biological Neurons compute slowly (several ms per computation) Artificial Neurons compute fast (<1 nanosecond per computation)
The brain represents information in a distributed way because neurons are unreliable and could die any time. In computer programs every bit has to function as intended otherwise these programs would crash.
Our brain changes their connectivity over time to represents new information and requirements imposed on us. The connectivity between the electronic components in a computer never change unless we replace its components.
Biological neural networks have complicated topologies. ANNs are often in a tree structure.
Researchers are still to find out how the brain actually learns. ANNs use Gradient Descent for learning.
A Multi Layer Perceptron (MLP) contains one or more hidden layers (apart from one input and one output layer).  While a single layer perceptron can only learn linear functions, a multi layer perceptron can also learn non – linear functions.
Figure 4 shows a multi layer perceptron with a single hidden layer. Note that all connections have weights associated with them, but only three weights (w0, w1, w2) are shown in the figure.
Input Layer : The Input layer has three nodes. The Bias node has a value of 1. The other two nodes take X1 and X2 as external inputs (which are numerical values depending upon the input dataset). As discussed above, no computation is performed in the Input layer, so the outputs from nodes in the Input layer are 1, X1 and X2 respectively, which are fed into the Hidden Layer.
Hidden Layer : The Hidden layer also has three nodes with the Bias node having an output of 1. The output of the other two nodes in the Hidden layer depends on the outputs from the Input layer (1, X1, X2) as well as the weights associated with the connections (edges). Figure 4 shows the output calculation for one of the hidden nodes (highlighted). Similarly, the output from other hidden node can be calculated. Remember that f refers to the activation function. These outputs are then fed to the nodes in the Output layer.
Multi Layer Perceptron
Output Layer : The Output layer has two nodes which take inputs from the Hidden layer and perform similar computations as shown for the highlighted hidden node. The values calculated (Y1 and Y2) as a result of these computations act as outputs of the Multi Layer Perceptron.
Given a set of features X = (x1, x2, …) and a target y, a Multi Layer Perceptron can learn the relationship between the features and the target, for either classification or regression.
Lets take an example to understand Multi Layer Perceptrons better. Suppose we have the following student-marks dataset:

Multi Layer Perceptron
The two input columns show the number of hours the student has studied and the mid term marks obtained by the student. The Final Result column can have two values 1 or 0 indicating whether the student passed in the final term. For example, we can see that if the student studied 35 hours and had obtained 67 marks in the mid term, he / she ended up passing the final term.
Now, suppose, we want to predict whether a student studying 25 hours and having 70 marks in the mid term will pass the final term.

Multi Layer Perceptron
This is a binary classification problem where a multi layer perceptron can learn from the given examples (training data) and make an informed prediction given a new data point. We will see below how a multi layer perceptron learns such relationships..

Source : ujjwalkarn