Google News
logo
SciPy Interview Questions
SciPy is a scientific computation library that uses NumPy underneath. SciPy stands for Scientific Python. It provides more utility functions for optimization, stats and signal processing.
 
* Like NumPy, SciPy is open source so we can use it freely.
* SciPy was created by NumPy's creator Travis Olliphant.
* SciPy contains varieties of sub packages which help to solve the most common issue related to Scientific Computation.
* SciPy package in Python is the most used Scientific library only second to GNU Scientific Library for C/C++ or Matlab’s.
* Easy to use and understand as well as fast computational power.
* It can operate on an array of NumPy library.
Numpy :
* Numpy is written in C and use for mathematical or numeric calculation.
* It is faster than other Python Libraries
* Numpy is the most useful library for Data Science to perform basic calculations.
* Numpy contains nothing but array data type which performs the most basic operation like sorting, shaping, indexing, etc.
 
SciPy :
* SciPy is built in top of the NumPy
* SciPy module in Python is a fully-featured version of Linear Algebra while Numpy contains only a few features.
* Most new Data Science features are available in Scipy rather than Numpy.
K-means clustering is a method for finding clusters and cluster centers in a set of unlabelled data. Intuitively, we might think of a cluster as – comprising of a group of data points, whose inter-point distances are small compared with the distances to points outside of the cluster. Given an initial set of K centers, the K-means algorithm iterates the following two steps :
 
* For each center, the subset of training points (its cluster) that is closer to it is identified than any other center.
 
* The mean of each feature for the data points in each cluster are computed, and this mean vector becomes the new center for that cluster.
 
 
K-Means Implementation in SciPy :  We will understand how to implement K-Means in SciPy.
 
Import K-Means : We will see the implementation and usage of each imported function.
 
from SciPy.cluster.vq import kmeans,vq,whiten

Data generation : We have to simulate some data to explore the clustering.
 
from numpy import vstack,array
from numpy.random import rand

# data generation with three features
data = vstack((rand(100,3) + array([.5,.5,.5]),rand(100,3)))
 
Now, we have to check for data. The above program will generate the following output.
 
array([[ 1.48598868e+00, 8.17445796e-01, 1.00834051e+00],
       [ 8.45299768e-01, 1.35450732e+00, 8.66323621e-01],
       [ 1.27725864e+00, 1.00622682e+00, 8.43735610e-01],
............
The scipy.constant package is available with a wide range of constants, which is used extensively in the scientific field. There are various physical, mathematical constants and units that we can import the required constants and use them as per needed.
 
The scipy.constant provides the following list of mathematical constants.
* pi
* golden
 
Here we compare the 'pi' value by importing different modules.
 
#Import pi constant from the scipy   
from scipy.constants import pi  
#Import pi from math package  
from math import pi  
#Comparing these two pi value  
print("sciPy - pi Value = %.18f"%scipy.constants.pi)  
print("math - pi Value = %.18f"%math.pi) 
 
Output :
 
sciPy - pi Value = 3.141592653589793116
math - pi Value = 3.141592653589793116

 

The FFT stands for Fast Fourier Transformation. The Fourier transformation converts the time-domain signal into the frequency domain. It breaks a waveform (a function or signal) into another replacement characterized by sine and cosine. It can convert the periodic time signal whereas the Laplace transform converts both periodic and aperiodic signal.
 
There is a limitation in the Fourier transformation, it can only convert the stable time signal. SciPy provides the fftpack module, which is used to calculate Fourier transformation.
 
Fast Fourier Transform : 
 
The FFT of length N sequence x[n] is calculated by fft() function and the inverse transform is calculated using ifft().
 
#Importing the fft and inverse fft functions from fftpackage
from scipy.fftpack import fft  
#Importing numpy  
import numpy as np  
#create an array with random n numbers  
x = np.array([1.0, 2.0, 1.0, -1.0, 1.5])
#Applying the fft function  
y = fft(x)
print (y)
 
Output :
 
[ 4.5       +0.j        ,  2.08155948-1.65109876j,
       -1.83155948+1.60822041j, -1.83155948-1.60822041j,
        2.08155948+1.65109876j]

 

Sparse data is data that has mostly unused elements (elements that don't carry any information ).
 
It can be an array like this one :
 
[1, 0, 2, 0, 0, 3, 0, 0, 0, 0, 0, 0]
 
Sparse Data : is a data set where most of the item values are zero.
Dense Array : is the opposite of a sparse array: most of the values are not zero.
 
In scientific computing, when we are dealing with partial derivatives in linear algebra we will come across sparse data.
SciPy has a module, scipy.sparse that provides functions to deal with sparse data. There are two types of sparse matrices that we use :
 
CSC : Compressed Sparse Column. For efficient arithmetic, fast column slicing.
CSR : Compressed Sparse Row. For fast row slicing, faster matrix vector products
 
CSR Matrix : We can create CSR matrix by passing an arrray into function scipy.sparse.csr_matrix().
 
Create a CSR matrix from an array :
 
import numpy as np
from scipy.sparse import csr_matrix
arr = np.array([0, 0, 0, 0, 0, 1, 1, 0, 2])
print(csr_matrix(arr))


Output :
 

(0, 5) 1
(0, 6) 1
(0, 8) 2

Scipy is the scientific computing module of Python providing in-built functions on a lot of well-known Mathematical functions. The scipy.integrate sub-package provides several integration techniques including an ordinary differential equation integrator. 
 
Numerical Integration is the approximate computation of an integral using numerical techniques. Methods for Integrating function given function object :
 
* quad : General Purpose Integration
* dblquad : General Purpose Double Integration
* nquad : General Purpose n- fold Integration
* fixed_quad : Gaussian quadrature, order n
* quadrature : Gaussian quadrature to tolerance
* romberg : Romberg integration
* trapz : Trapezoidal rule
* cumtrapz : Trapezoidal rule to cumulatively compute integral
* simps : Simpson’s rule
* romb : Romberg integration
* polyint : Analytical polynomial integration (NumPy)
SciPy Interpolation is defined as finding a value between two points on a line or a curve. The first part of the word is "inter" as meaning "enter", which indicates us to look inside the data. In the other words, "The estimation of intermediate value between the precise data points is called as interpolation". Interpolation is very useful in statistics, science, and business or when there is a need to predict the value that exists within two existing data points.
 
Let's have a look how the interpolation work using the scipy.interpolation package.

import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt
x = np.linspace(0, 4, 12)
y = np.cos(x**2/3+4)
print (x,y)​


Output :

(
   array([0.,  0.36363636,  0.72727273,  1.09090909,  1.45454545, 1.81818182, 
          2.18181818,  2.54545455,  2.90909091,  3.27272727,  3.63636364,  4.]),   

   array([-0.65364362,  -0.61966189,  -0.51077021,  -0.31047698,  -0.00715476,
            0.37976236,   0.76715099,   0.99239518,   0.85886263,   0.27994201,
           -0.52586509,  -0.99582185])
)