The benefits of mini-batch gradient descent are as follow :
* It is more efficient compared to stochastic gradient descent.
* The generalization is maintained by discovering the flat minima.
* Mini-batches allow help to approximate the gradient of the entire training set, which helps us to avoid local minima.