Google News
logo
NumPy Interview Questions
NumPy is a Python extension and open-source, versatile general-purpose package used for array-processing. It is short of Numerical Python. It allows python to serve as a high-level language for manipulating numerical data, much like IDL, MATLAB, or Yorick, etc,..
It is known for its high-end performance with powerful N-dimensional array objects and the tools it is loaded with to work with arrays.
NumPy is a package in Python used for Scientific Computing. NumPy package is used to perform different operations. The ndarray (NumPy Array) is a multidimensional array used to store values of same datatype. These arrays are indexed just like Sequences, starts with zero.
* Python's lists are efficient general-purpose containers. They support (fairly) efficient insertion, deletion, appending, and concatenation, and Python's list comprehensions make them easy to construct and manipulate.

* They have certain limitations: they don't support "vectorized" operations like elementwise addition and multiplication, and the fact that they can contain objects of differing types mean that Python must store type information for every element, and must execute type dispatching code when operating on each element.

* NumPy is not just more efficient; it is also more convenient. We get a lot of vector and matrix operations for free, which sometimes allow one to avoid unnecessary work. And they are also efficiently implemented.

* NumPy array is faster and we get a lot built in with NumPy, FFTs, convolutions, fast searching, basic statistics, linear algebra, histograms, etc.
1D array creation :
import numpy as np
one_dimensional_list = [1,2,4]
one_dimensional_arr = np.array(one_dimensional_list)
print("1D array is : ",one_dimensional_arr)
 
2D array creation :
import numpy as np
two_dimensional_list=[[1,2,3],[4,5,6]]
two_dimensional_arr = np.array(two_dimensional_list)
print("2D array is : ",two_dimensional_arr)
 
3D array creation :
import numpy as np
three_dimensional_list=[[[1,2,3],[4,5,6],[7,8,9]]]
three_dimensional_arr = np.array(three_dimensional_list)
print("3D array is : ",three_dimensional_arr)
 
ND array creation : This can be achieved by giving the ndmin attribute. The below example demonstrates the creation of a 6D array:
import numpy as np
ndArray = np.array([1, 2, 3, 4], ndmin=6)
print(ndArray)
print('Dimensions of array:', ndArray.ndim)
5 .
Given an array a = [[1,2,3],[3,4,5],[23, 45,1] find the sum of every row in array a.
To solve the above problem, we first have to understand what the question required of us. If two subscripts reference array elements, then we have a two-dimensional array. By three subscripts, we have a three-dimensional array, etc. For 2 dimensional arrays, the elements of the arrays are stored in rows and columns. Setting the axis to 1 will calculate the sum of elements in a row. The question requires the sum of the elements in the row positions.
import numpy as np
a = [[1,2,3], [3,4,5], [23, 45,1]
Print (a.sum(axis=1))
Conversely, for column sum,
a = [[1,2,3], [3,4,5], [23, 45,1]
Print (a.sum(axis=0))

 

This should print [27, 51, 9]
As a powerful open-source package used for array-processing, NumPy has various useful features. They are:
 
* Contains a N-dimensional array object
* It is interolerable; compatible with many hardware and computing platforms
* Works extremely well with array libraries; sparse, distributed or GPU
* Ability to perform complicated (broadcasting) functions
* Tools that enable integration with C or C++ and Fortran code 
* Ability to perform high-level mathematical functions like statistics, Fourier transform, sorting, searching, linear algebra, etc 
* It can also behave as a multi-dimensional container for generic data
* Supports scientific and financial calculations
Python’s lists, even though hugely efficient containers capable of a number of functions, have several limitations when compared to NumPy arrays. It is not possible to perform vectorised operations which includes element-wise addition and multiplication. 
 
They also require that Python store the type information of every element since they support objects of different types. This means a type dispatching code must be executed each time an operation on an element is done. Also, each iteration would have to undergo type checks and require Python API bookkeeping resulting in very few operations being carried by C loops.
The following sequence of codes to identify the datatype of a NumPy array. 
 
print(‘\n data type num 1 ‘,num.dtype)
 
print(‘\n data type num 2 ‘,num2.dtype)
 
print(‘\n data type num 3 ‘,num3.dtype)
9 .
How do you calculate the memory size occupied by a NumPy array? Given an array a= [14, 98, 87]. Find the memory size it occupies. 
The elements of the array are integers. From the data types and sizes table, an integer data type has a size of 4 bytes. This implies the element 14 occupies 4 bytes, 98 also occupies 4 bytes, and finally, the last element, 87, will also occupy 4 bytes. Therefore the total size of memory the array occupies is 12 bytes. Imagine the array has large data sets of integer. It will be tedious to do this manually. 
 
However, NumPy has some attributes such as size, item size and nbytes to help solve this problem.  
 
Size attribute : size attribute returns the total number of elements in a NumPy array.
itemsize attribute : the itemsize returns the memory size bytes of an individual element in the NumPy array.
nbytes attribute : the nbytes attribute will return the total bytes of all the elements in the NumPy array.  
 
With these attributes, the problem can be solved using two methods or approaches. 
 
Approach 1 :
a= [14, 98, 87].
import numpy as np
import sys
print(sys.getsizeof(a)* len(a))
or 
a= np.array([14,98,87])
print(“The size in bytes of one element in array: “, a.itemsize)
print(“The size of array: “,   a.size)
print(“The bytes of all elements in a: “,  a.size * a.itemsize)

 

Approach 2 : We will use the nbytes attribute.
import numpy as np
a= np.array([14,98,87])
print(“The total size of space or memory the array occupies:  “, a.nbytes)
Given array :
[[35 53 63]
[72 12 22]
[43 84 56]]
New Column values :
[  
   20 
   30 
   40
]
Solution :
import NumPy as np
#inputs
inputArray = np.array([[35,53,63],[72,12,22],[43,84,56]])
new_col = np.array([[20,30,40]])
# delete 2nd column
arr = np.delete(sampleArray , 1, axis = 1)
#insert new_col to array
arr = np.insert(arr , 1, new_col, axis = 1)
print (arr)
We can use the method numpy.loadtxt() which can automatically read the file’s header and footer lines and the comments if any.
 
This method is highly efficient and even if this method feels less efficient, then the data should be represented in a more efficient format such as CSV etc. Various alternatives can be considered depending on the version of NumPy used.
 
Following are the file formats that are supported :
 
* Text files : These files are generally very slow, huge but portable and are human-readable.
* Raw binary : This file does not have any metadata and is not portable. But they are fast.
* Pickle : These are borderline slow and portable but depends on the NumPy versions.
* HDF5 : This is known as the High-Powered Kitchen Sink format which supports both PyTables and h5py format.
* .npy : This is NumPy's native binary data format which is extremely simple, efficient and portable.
The main difference is that the arange function is a built function in the python class that helps generate a sequence of integer values within a certain range. However, the arange function is a built-in function in the python library called Numpy, and so to use the arange function, you will have to install the NumPy package. Both range and arange functions take the same parameters shown below. (start, stop, and step). The main difference is that the range function takes only integers arguments. Otherwise, it returns an error message while the arange function will generate or return an instance of the NumPy ndarray.
 
* range([start], stop[, step])
 
* numpy.arange([start, ]stop, [step, ]dtype=None)
You can count the number of the times a given value appears using the bincount() function. It should be noted that the bincount() function accepts positive integers or boolean expressions as its argument. Negative integers cannot be used. 
 
Use NumPy.bincount(). The resulting array is

arr = NumPy.array([0, 5, 4, 0, 4, 4, 3, 0, 0, 5, 2, 1, 1, 9])
 
NumPy.bincount(arr)
This can be achieved by using the genfromtxt() method by setting the delimiter as a comma.
 
from numpy import genfromtxt

csv_data = genfromtxt('sample_doc.csv', delimiter=',')
For example, consider an array arr.
arr = np.array([[8, 3, 2],
[3, 6, 5],
[6, 1, 4]])
 
Let us try to sort the rows by the 2nd column so that we get :
[[6, 1, 4],
[8, 3, 2],
[3, 6, 5]]
 
We can do this by using the sort() method in numpy as :
import numpy as np
arr = np.array([[8, 3, 2],
[3, 6, 5],
[6, 1, 4]])
#sort the array using np.sort
arr = np.sort(arr.view('i8,i8,i8'),
order=['f1'],
axis=0).view(np.int)
 
We can also perform sorting and that too inplace sorting by doing :
arr.view('i8,i8,i8').sort(order=['f1'], axis=0)
If the variable is an array, you can check for an empty array by using the size attribute. However, it is possible that the variable is a list or a sequence type, in that case, you can use len().
 
The preferable way to check for a zero element is the size attribute. This is because : 
a = NumPy.zeros((1,0))
a.size
0
 
whereas

len(a)
1​
We can use the argmin() method of numpy as shown below :
 
import numpy as np
def find_nearest_value(arr, value):
   arr = np.asarray(arr)
   idx = (np.abs(arr - value)).argmin()
   return arr[idx]
#Driver code
arr = np.array([ 0.21169,  0.61391, 0.6341, 0.0131, 0.16541,  0.5645,  0.5742])
value = 0.52
print(find_nearest_value(arr, value)) # Prints 0.5645
We can use the shape attribute of the numpy array to find the shape. It returns the shape of the array in terms of row count and column count of the array.
 
import numpy as np
arr_two_dim = np.array([("x1","x2", "x3","x4"),("x5","x6", "x7","x8" )])
arr_one_dim = np.array([3,2,4,5,6])
# find and print shape
print("2-D Array Shape: ", arr_two_dim.shape)
print("1-D Array Shape: ", arr_one_dim.shape)
"""
​

 

Output :
2-D Array Shape:  (2, 4)
1-D Array Shape:  (5,)
"""
​
a = np.arange(15)
 
Method 1 :
index = np.where((a >= 5) & (a <= 10))
a[index]
 
Method 2 :
index = np.where(np.logical_and(a>=5, a<=10))
a[index]
# (array([6, 9, 10]),)
 
Method 3 : 
a[(a >= 5) & (a <= 10)]

 

Create a numpy array of length 10, starting from 5 and has a step of 3 between consecutive numbers
 
length = 10
start = 5
step = 3

def seq(start, length, step):
    end = start + (step*length)
    return np.arange(start, end, step)

seq(start, length, step)
# array([ 5,  8, 11, 14, 17, 20, 23, 26, 29, 32])

 

The element must be a type of unsigned int16. And print the following Attributes :
 
* The shape of an array.
* Array dimensions.
* The Length of each element of the array in bytes.
 
import numpy

firstArray = numpy.empty([4,2], dtype = numpy.uint16) 
print("Printing Array")
print(firstArray)

print("Printing numpy array Attributes")
print("1> Array Shape is: ", firstArray.shape)
print("2>. Array dimensions are ", firstArray.ndim)
print("3>. Length of each element of array in bytes is ", firstArray.itemsize)

 

import numpy

print("Printing Original array")
sampleArray = numpy.array([[34,43,73],[82,22,12],[53,94,66]]) 
print (sampleArray)

print("Array after deleting column 2 on axis 1")
sampleArray = numpy.delete(sampleArray , 1, axis = 1) 
print (sampleArray)

arr = numpy.array([[10,10,10]])

print("Array after inserting column 2 on axis 1")
sampleArray = numpy.insert(sampleArray , 1, arr, axis = 1) 
print (sampleArray)

 

import numpy

print("Creating 5X2 array using numpy.arange")
sampleArray = numpy.arange(100, 200, 10)
sampleArray = sampleArray.reshape(5,2)
print (sampleArray)

 

import NumPy
sampleArray = NumPy.array([[34,43,73],[82,22,12],[53,94,66]]) 
newColumn = NumPy.array([[10,10,10]])
 
Expected Output :
 
Printing Original array
[[34 43 73]
 [82 22 12]
 [53 94 66]]

Array after deleting column 2 on axis 1

[[34 73]
[82 12]
[53 66]]

Array after inserting column 2 on axis 1

[[34 10 73]
[82 10 12]
[53 10 66]]
 
Solution :
 
import NumPy
print(“Printing Original array”)
sampleArray = NumPy.array([[34,43,73],[82,22,12],[53,94,66]]) 
print (sampleArray)
print(“Array after deleting column 2 on axis 1”)
sampleArray = NumPy.delete(sampleArray , 1, axis = 1) 
print (sampleArray)
arr = NumPy.array([[10,10,10]])
print(“Array after inserting column 2 on axis 1”)
sampleArray = NumPy.insert(sampleArray , 1, arr, axis = 1) 
print (sampleArray)

 

Convert numpy’s datetime64 object to datetime’s datetime object
 
# Input : a numpy datetime64 object
dt64 = np.datetime64('2018-02-25 22:10:10')

# Solution
from datetime import datetime
dt64.tolist()

# or

dt64.astype(datetime)
# datetime.datetime(2018, 2, 25, 22, 10, 10)