# A complete guide on NumPy for data science

## A guide to learn and implement NumPy from basic to advanced level for exploratory data analysis when dealing with data.

# What is NumPy?

NumPy is a library of Python that will help in analyzing the data. It is used by individuals who deal with data science. It is a linear algebra library that has bindings to C libraries making it really fast.

# How to install NumPy?

To install NumPy using pip:

pip install numpy

To install Numpy using Anaconda:

conda install numpy

# What are NumPy Arrays?

While working with NumPy for data science, mostly we have to deal with NumPy arrays. These arrays are of two types:

- Matrices

Matrices are usually two-dimensional but they can still have either only one row or one column.

2. Vectors

Vectors on the other hand are strictly one-dimensional.

# How to create NumPy Arrays using lists?

**→ Importing the library**

**>>> import** numpy **as** np

**→ Creating a list and then converting it into an array of 1 dimension.**

>>> list1=[11,23,34,56]

>>> list1

[11, 23, 34, 56]>>> np.array(list1)

array([11, 23, 34, 56])>>> array1=np.array(list1)

>>> array1

array([11, 23, 34, 56])

**→ Creating a list of lists and converting it into an array of 2 dimensions.**

`>>> list2 `**=** [[11,22,33],[55,66,77],[88,99,100]]

>>> np.array(list2)

array([[ 11, 22, 33],

[ 55, 66, 77],

[ 88, 99, 100]])

As seen above, there are two dimensions i.e rows and columns. The dimension is also indicated with the number of brackets the array is enclosed in. There is one round bracket and a square bracket with encloses the array, therefore it is of 2 dimensions.

# How to create NumPy Arrays using built-in methods?

**→ Creating using the arange method which is similar to the python range. The arguments are start, stop and step values. The first value is ‘start’ and goes up to (stop-1) just like the range function.**

>>> np.arange(0,10,1)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])>>> np.arange(0,10,2)

array([0, 2, 4, 6, 8])

**→ Creating an array of all zeros.**

>>> np.zeros(5)

array([0., 0., 0., 0., 0.])>>> np.zeros((3,2))

array([[0., 0.],

[0., 0.],

[0., 0.]])

**→ Creating an array of all ones.**

>>> np.ones(5)

array([1., 1., 1., 1., 1.])>>> np.ones((3,2))

array([[1., 1.],

[1., 1.],

[1., 1.]])

**→ Creating an array where the values are spaced equally in an interval. It takes the arguments: start, stop, number of values.**

`>>> np.linspace(1,20,5)`

array([ 1. , 5.75, 10.5 , 15.25, 20. ])

As seen above, it returns 5 numbers in the interval 1 to 20 which are evenly spaced.

**→ Creating an identity matrix.**

`>>> np.eye(3)`

array([[1., 0., 0.],

[0., 1., 0.],

[0., 0., 1.]])

**→ Creating an array with random numbers of uniform distribution (0–1).**

>>> np.random.rand(3)

array([0.13426 , 0.22672772, 0.98574852])>>> np.random.rand(3,2)

array([[0.13636649, 0.3366877 ],

[0.36993761, 0.02392286],

[0.20869183, 0.59256244]])

**→ Creating an array with random numbers of the normal distribution (centered around 0).**

>>> np.random.randn(3)

array([ 0.71105797, -0.33395766, 0.67756835])>>> np.random.randn(4,2)

array([[ 1.21447908, 0.6830743 ],

[-0.28203856, 0.16459752],

[-0.32451067, -0.1618622 ],

[-0.9331776 , 0.6281955 ]])

**→ Creating an array with random integers using randint() where the arguments to be passed are low, high and size. Low is inclusive and high is exclusive.**

>>> np.random.randint(1,50)

49>>> np.random.randint(1,50,5)

array([27, 43, 44, 39, 16])

# What are the attributes and methods of NumPy Array?

`>>> arr1 `**=** np.arange(10,35)

>>> arr1

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])

**→ Reshape method to change the array into a new shape.**

`>>> arr1.reshape(5,5)`

array([[10, 11, 12, 13, 14],

[15, 16, 17, 18, 19],

[20, 21, 22, 23, 24],

[25, 26, 27, 28, 29],

[30, 31, 32, 33, 34]])

Incase while reshaping the matrix is not filled then it will give an error. Make sure that the number of rows multiplied by the number of columns equals the number of elements in the array.

>>> arr1.reshape(3,3)---------------------------------------------------------------------------ValueErrorTraceback (most recent call last)<ipython-input-26-2c9beb517969>in <module>----> 1arr1.reshape(3,3)ValueError: cannot reshape array of size 25 into shape (3,3)

**→ Finding the maximum and minimum values in the array.**

>>> arr1

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])>>> arr1.max()

34>>> arr1.min()

10

To know the index at which the max or min value is present, use argmax() or argmin().

>>> arr1.argmax()

24>>> arr1.argmin()

0

**→ shape() method to find the shape of the array.**

`>>> arr1.shape`

(25,)

This denotes that the array is 1-D with 25 elements.

`>>> arr2 `**=** arr1.reshape(5,5)

>>> arr2.shape

(5, 5)

This denotes that the array is 2-D and has 5 rows and 5 columns.

**→ Finding the datatype of the elements in the array.**

`>>> arr1.dtype`

dtype('int32')

# How to perform indexing and selection of elements in 1-D NumPy Array?

`>>> arr1`

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])

**→ Using the slicing notation to pick elements from the array.**

>>> arr1[3]

13>>> arr1[1:5]

array([11, 12, 13, 14])

**→ Using slicing to change values in an array, i.e broadcasting.**

`>>> arr1[1:5] `**=** 50

>>> arr1

array([10, 50, 50, 50, 50, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])

Now, let's slice the array and broadcast it. Notice how the sliced array will have changed values but also the original array.

>>> arr2=arr1[10:15]

>>> arr2

array([20, 21, 22, 23, 24])>>> arr2[:]=25

>>> arr2

array([25, 25, 25, 25, 25])>>> arr1

array([10, 50, 50, 50, 50, 15, 16, 17, 18, 19, 25, 25, 25, 25, 25, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])

To avoid this, make a copy of the array and then broadcast.

>>> arr_copy=arr1.copy()

>>> arr_copy

array([10, 50, 50, 50, 50, 15, 16, 17, 18, 19, 25, 25, 25, 25, 25, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])>>> arr_copy[:]=1

>>> arr_copy

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])>>> arr1

array([10, 50, 50, 50, 50, 15, 16, 17, 18, 19, 25, 25, 25, 25, 25, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])

# How to perform indexing and selection of elements in 2-D NumPy Array?

`>>> arr2 `**=** np.array([[1,2,3],[4,5,6],[7,8,9]])

>>> arr2

array([[1, 2, 3],

[4, 5, 6],

[7, 8, 9]])

**→ Indexing using brackets**

For indexing, pass row value and then the column value.

`>>> arr2[0][1]`

2

You can also use the below notation where the row and column values are written separated by a comma.

`>>> arr2[0,1]`

2

**→ Getting the subpart of a matrix.**

To get the submatrix, use slicing. In the below example, select rows o and 1 and columns 1 and 2.

`>>> arr2[:2,1:]`

array([[2, 3],

[5, 6]])

# How to perform conditional selection using a boolean array?

>>> arr1

array([10, 50, 50, 50, 50, 15, 16, 17, 18, 19, 25, 25, 25, 25, 25, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])>>> bool_arr=arr1>20

>>> bool_arr

array([False, True, True, True, True, False, False, False, False, False, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True])

**→ Getting the values where the boolean value is True.**

>>> arr1[bool_arr]

array([50, 50, 50, 50, 25, 25, 25, 25, 25, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])>>> arr1[arr1<20]

array([10, 15, 16, 17, 18, 19])

# How to perform Array with Array operations?

`>>> arr3 `**=** np.arange(0,5)

>>> arr3

array([0, 1, 2, 3, 4])

**→ Elementwise operations**

>>> arr3+arr3

array([0, 2, 4, 6, 8])>>> arr3-arr3

array([0, 0, 0, 0, 0])>>> arr3*arr3

array([ 0, 1, 4, 9, 16])

# How to perform Array with Scalar operations?

Scalar means just a single number. So when dealing with scalar and array operations, NumPy broadcasts the scalar into an array and performs the elementwise operations.

>>> arr3

array([0, 1, 2, 3, 4])>>> arr3+5

array([5, 6, 7, 8, 9])>>> arr3-2

array([-2, -1, 0, 1, 2])>>> arr3*3

array([ 0, 3, 6, 9, 12])>>> arr3/6

array([0. , 0.16666667, 0.33333333, 0.5 , 0.66666667])>>> arr3**3

array([ 0, 1, 8, 27, 64], dtype=int32)

In python, 0/0 gives an error but in NumPy, when 0/0 is performed, it gives a warning and returns a NAN (Null) value.

>>> arr3

array([0, 1, 2, 3, 4])>>> arr3/arr3

D:\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: RuntimeWarning: invalid value encountered in true_divide

"""Entry point for launching an IPython kernel.

array([nan, 1., 1., 1., 1.])>>> 1/arr3

D:\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in true_divide

"""Entry point for launching an IPython kernel.

array([ inf, 1. , 0.5 , 0.33333333, 0.25 ])

# How to perform NumPy array universal functions?

**→ Finding the square root of each element in the array.**

`>>> np.sqrt(arr3)`

array([0. , 1. , 1.41421356, 1.73205081, 2. ])

**→ Finding the exponential of each element in the array.**

`>>> np.exp(arr3)`

array([ 1. , 2.71828183, 7.3890561 , 20.08553692, 54.59815003])

**→ Finding the maximum of the array.**

`>>> arr3.max()`

4

**→ Trigonometric functions**

>>> np.sin(arr3)

array([ 0. , 0.84147098, 0.90929743, 0.14112001, -0.7568025 ])>>> np.cos(arr3)

array([ 1. , 0.54030231, -0.41614684, -0.9899925 , -0.65364362])

**→ Logarithmic function**

`>>> np.log(arr3)`

D:\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in log

"""Entry point for launching an IPython kernel.

array([ -inf, 0. , 0.69314718, 1.09861229, 1.38629436])

For more detailed information on various methods and functions of NumPy, check the official documentation here.

Refer to the notebook for code here.

Reach out to me:

Check out my other work:GitHub