Google News
logo
SciPy Interview Questions
SciPy is a scientific computation library that uses NumPy underneath. SciPy stands for Scientific Python. It provides more utility functions for optimization, stats and signal processing.
 
* Like NumPy, SciPy is open source so we can use it freely.
* SciPy was created by NumPy's creator Travis Olliphant.
* SciPy contains varieties of sub packages which help to solve the most common issue related to Scientific Computation.
* SciPy package in Python is the most used Scientific library only second to GNU Scientific Library for C/C++ or Matlab’s.
* Easy to use and understand as well as fast computational power.
* It can operate on an array of NumPy library.
Numpy :
* Numpy is written in C and use for mathematical or numeric calculation.
* It is faster than other Python Libraries
* Numpy is the most useful library for Data Science to perform basic calculations.
* Numpy contains nothing but array data type which performs the most basic operation like sorting, shaping, indexing, etc.
 
SciPy :
* SciPy is built in top of the NumPy
* SciPy module in Python is a fully-featured version of Linear Algebra while Numpy contains only a few features.
* Most new Data Science features are available in Scipy rather than Numpy.
K-means clustering is a method for finding clusters and cluster centers in a set of unlabelled data. Intuitively, we might think of a cluster as – comprising of a group of data points, whose inter-point distances are small compared with the distances to points outside of the cluster. Given an initial set of K centers, the K-means algorithm iterates the following two steps :
 
* For each center, the subset of training points (its cluster) that is closer to it is identified than any other center.
 
* The mean of each feature for the data points in each cluster are computed, and this mean vector becomes the new center for that cluster.
 
 
K-Means Implementation in SciPy :  We will understand how to implement K-Means in SciPy.
 
Import K-Means : We will see the implementation and usage of each imported function.
 
from SciPy.cluster.vq import kmeans,vq,whiten

Data generation : We have to simulate some data to explore the clustering.
 
from numpy import vstack,array
from numpy.random import rand

# data generation with three features
data = vstack((rand(100,3) + array([.5,.5,.5]),rand(100,3)))
 
Now, we have to check for data. The above program will generate the following output.
 
array([[ 1.48598868e+00, 8.17445796e-01, 1.00834051e+00],
       [ 8.45299768e-01, 1.35450732e+00, 8.66323621e-01],
       [ 1.27725864e+00, 1.00622682e+00, 8.43735610e-01],
............
The scipy.constant package is available with a wide range of constants, which is used extensively in the scientific field. There are various physical, mathematical constants and units that we can import the required constants and use them as per needed.
 
The scipy.constant provides the following list of mathematical constants.
* pi
* golden
 
Here we compare the 'pi' value by importing different modules.
 
#Import pi constant from the scipy   
from scipy.constants import pi  
#Import pi from math package  
from math import pi  
#Comparing these two pi value  
print("sciPy - pi Value = %.18f"%scipy.constants.pi)  
print("math - pi Value = %.18f"%math.pi) 
 
Output :
 
sciPy - pi Value = 3.141592653589793116
math - pi Value = 3.141592653589793116

 

The FFT stands for Fast Fourier Transformation. The Fourier transformation converts the time-domain signal into the frequency domain. It breaks a waveform (a function or signal) into another replacement characterized by sine and cosine. It can convert the periodic time signal whereas the Laplace transform converts both periodic and aperiodic signal.
 
There is a limitation in the Fourier transformation, it can only convert the stable time signal. SciPy provides the fftpack module, which is used to calculate Fourier transformation.
 
Fast Fourier Transform : 
 
The FFT of length N sequence x[n] is calculated by fft() function and the inverse transform is calculated using ifft().
 
#Importing the fft and inverse fft functions from fftpackage
from scipy.fftpack import fft  
#Importing numpy  
import numpy as np  
#create an array with random n numbers  
x = np.array([1.0, 2.0, 1.0, -1.0, 1.5])
#Applying the fft function  
y = fft(x)
print (y)
 
Output :
 
[ 4.5       +0.j        ,  2.08155948-1.65109876j,
       -1.83155948+1.60822041j, -1.83155948-1.60822041j,
        2.08155948+1.65109876j]

 

Sparse data is data that has mostly unused elements (elements that don't carry any information ).
 
It can be an array like this one :
 
[1, 0, 2, 0, 0, 3, 0, 0, 0, 0, 0, 0]
 
Sparse Data : is a data set where most of the item values are zero.
Dense Array : is the opposite of a sparse array: most of the values are not zero.
 
In scientific computing, when we are dealing with partial derivatives in linear algebra we will come across sparse data.
SciPy has a module, scipy.sparse that provides functions to deal with sparse data. There are two types of sparse matrices that we use :
 
CSC : Compressed Sparse Column. For efficient arithmetic, fast column slicing.
CSR : Compressed Sparse Row. For fast row slicing, faster matrix vector products
 
CSR Matrix : We can create CSR matrix by passing an arrray into function scipy.sparse.csr_matrix().
 
Create a CSR matrix from an array :
 
import numpy as np
from scipy.sparse import csr_matrix
arr = np.array([0, 0, 0, 0, 0, 1, 1, 0, 2])
print(csr_matrix(arr))


Output :
 

(0, 5) 1
(0, 6) 1
(0, 8) 2

Scipy is the scientific computing module of Python providing in-built functions on a lot of well-known Mathematical functions. The scipy.integrate sub-package provides several integration techniques including an ordinary differential equation integrator. 
 
Numerical Integration is the approximate computation of an integral using numerical techniques. Methods for Integrating function given function object :
 
* quad : General Purpose Integration
* dblquad : General Purpose Double Integration
* nquad : General Purpose n- fold Integration
* fixed_quad : Gaussian quadrature, order n
* quadrature : Gaussian quadrature to tolerance
* romberg : Romberg integration
* trapz : Trapezoidal rule
* cumtrapz : Trapezoidal rule to cumulatively compute integral
* simps : Simpson’s rule
* romb : Romberg integration
* polyint : Analytical polynomial integration (NumPy)
SciPy Interpolation is defined as finding a value between two points on a line or a curve. The first part of the word is "inter" as meaning "enter", which indicates us to look inside the data. In the other words, "The estimation of intermediate value between the precise data points is called as interpolation". Interpolation is very useful in statistics, science, and business or when there is a need to predict the value that exists within two existing data points.
 
Let's have a look how the interpolation work using the scipy.interpolation package.

import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt
x = np.linspace(0, 4, 12)
y = np.cos(x**2/3+4)
print (x,y)​


Output :

(
   array([0.,  0.36363636,  0.72727273,  1.09090909,  1.45454545, 1.81818182, 
          2.18181818,  2.54545455,  2.90909091,  3.27272727,  3.63636364,  4.]),   

   array([-0.65364362,  -0.61966189,  -0.51077021,  -0.31047698,  -0.00715476,
            0.37976236,   0.76715099,   0.99239518,   0.85886263,   0.27994201,
           -0.52586509,  -0.99582185])
)

 

The scipy.stats contains a large number of statistics, probability distributions functions. The list of statistics functions can be obtained by info(stats). A list of a random variable can also be acquired from the docstring for the stat sub-package.
 
rv_continuous : A generic continuous random variable class meant for subclassing
rv_discrete : A generic discrete random variable class meant for subclassing
rv_histogram : Generates a distribution given by a histogram
 
 
Example : 
 
from scipy.stats import norm  
import numpy as np  
print(norm.cdf(np.array([3,-1., 0, 1, 2, 4, -2, 5])))
 
Output :
 
[0.9986501  0.15865525 0.5        0.84134475 0.97724987 0.99996833
 0.02275013 0.99999971]

 

The sparse matrix allows the data structure to store large sparse matrices, and provide the functionality to perform complex matrix computations. 
 
* Suppose you have a 2-D matrix with hundreds of elements, where only a few of them contain a non-zero value. When sorting this matrix using the sorting approach, we would waste a lot of space for zeros.
 
* The sparse data structure allows us to store only non-zero values assuming the rest of them are zeros.
SciPy provides seven(7) sparse matrix there are following :
 
* Block Sparse Row matrix (BSR)
* Coordinate list matrix (COO)
* Compressed Sparse Column matrix (CSC)
* Compressed Sparse Row matrix (CSR)
* Sparse matrix with Diagonal storage (DIA)
* Dictionary Of Keys based sparse matrix (DOK)
* Row-based linked list sparse matrix (LIL)
Python’s SciPy gives tools for creating sparse matrices using multiple data structures, as well as tools for converting a dense matrix to a sparse matrix. The function csr_matrix() is used to create a sparse matrix of compressed sparse row format whereas csc_matrix() is used to create a sparse matrix of compressed sparse column format.
 
Syntax :

scipy.sparse.csr_matrix(shape=None, dtype=None) 
 
Parameters :

* shape : Get shape of a matrix
* dtype : Data type of the matrix
 
Optimizers are a set of procedures defined in SciPy that either find the minimum value of a function, or the root of an equation.
 
Optimizing Functions : Essentially, all of the algorithms in Machine Learning are nothing more than a complex equation that needs to be minimized with the help of given data.
 
Roots of an Equation : NumPy is capable of finding roots for polynomials and linear equations, but it can not find roots for non linear equations, like this  x + cos(x) for that you can use SciPy's optimze.root function.
 
This function takes two required arguments there are :

fun :  a function representing an equation.
x0 : an initial guess for the root.
 
The function returns an object with information regarding the solution.
The SciPy provides the ndimage (n-dimensional image) package, that contains the number of general image processing and analysis functions. It is dedicated to image processing. We can perform several tasks in image processing such as input/output image, classification, Feature extraction, Registration, etc.