The most commonly used function in any programming language be it SAS, R or python is Sorting. Sorting is very easy to understand but sometimes complex in its right applicability. A trivial example to kind of just give you gist of what I mean is suppose you needed to sort an array, one would quickly use the ndarray.sort () function and it will suffice the purpose but at the same time it would have modified the original array as it sort in place. A matter of grave concern when working in professional setting were one would not like to make permanent changes to the originals.
This blog is a short and quick overview of few sorting techniques available in NumPy. First a simple example of the existing ndarray sort which sorts arrays without producing a new array.
import numpy as np
from numpy import random
# Generated random array and sorted using sort() function
array = random.rand(5)
array.sort()
array
#output:
array([0.03034467, 0.06341531, 0.18354663, 0.65587457, 0.73773692])
Here it is important to keep in mind that the original array gets modified if the array is a view on different ndarray. let’s see an example to understand it , in the code below an array of random numbers was generated, viewed and afterwards only the second column values were sorted. Now when you print the array the original array has been modified because the sorting happened in place and ndarray sort does not create a copy of original array.
# Created array of random values and printed
array = random.rand(4,3)
array
#output:
array([[0.89938883, 0.68389035, 0.10326969],
[0.5449251 , 0.53512185, 0.83822619],
[0.12357088, 0.18909243, 0.70470795],
[0.21587635, 0.83896895, 0.33216934]])
#Sorted only the second column values in place
array[:,1].sort()
array
#output:
array([[0.89938883, 0.18909243, 0.10326969],
[0.5449251 , 0.53512185, 0.83822619],
[0.12357088, 0.68389035, 0.70470795],
[0.21587635, 0.83896895, 0.33216934]])
To overcome the above issue numpy.sort comes in handy as it creates a new sorted copy of an array.
# Created array of random values and printed
arr=random.rand(8)
arr
#output:
array([0.34900169, 0.96101593, 0.83975277, 0.21423816, 0.42274396,
0.82977306, 0.5146234 , 0.76650139])
# used np.sort function to avoid making changes in the original array
np.sort(arr)
#output:
array([0.21423816, 0.34900169, 0.42274396, 0.5146234 , 0.76650139,
0.82977306, 0.83975277, 0.96101593])
# Print the original which is not modified
arr
#output:
array([0.34900169, 0.96101593, 0.83975277, 0.21423816, 0.42274396,
0.82977306, 0.5146234 , 0.76650139])
One can also make use of the axis argument available in the sorting method, which allows us to independently sort sections of data along the desired axis.
# Created array1 of random values and printed
array1 = random.rand(3,5)
array1
#output
array([[0.2410326 , 0.59743713, 0.45146179, 0.6153204 , 0.8049864 ],
[0.32326701, 0.19897843, 0.96835549, 0.60135318, 0.65751711],
[0.70769511, 0.42551675, 0.53747005, 0.94710747, 0.60169363]])
#Sorted along the column.You can change the value of axis = 1 & -1 to see how the array is sorted
array1.sort(axis=0)
array1
#output:
array([[0.2410326 , 0.19897843, 0.45146179, 0.60135318, 0.60169363],
[0.32326701, 0.42551675, 0.53747005, 0.6153204 , 0.65751711],
[0.70769511, 0.59743713, 0.96835549, 0.94710747, 0.8049864 ]])
Quick Tip: If you want to sort array in descending order use arr[:, ::-1]
#Quicktip code using the previous sorted array1
array1[:,::-1]
#output:
array([[0.60169363, 0.60135318, 0.45146179, 0.19897843, 0.2410326 ],
[0.65751711, 0.6153204 , 0.53747005, 0.42551675, 0.32326701],
[0.8049864 , 0.94710747, 0.96835549, 0.59743713, 0.70769511]])
argsort & lexsort also known as indirect sorts are the type of sorter that are commonly used when one needs to reorder datasets by more than one variable/keys, for example sorting clinical data first by patients lastname and then by firstname here two keys lastname and firstname are used to sort. argsort function generate array of integer indicies that are sorted which are then used to reorder the array in sorted manner. Lexsort is similar to argsort except that it is primarily used in lexicographical sort on n number of key arrays.
# argsort
val=np.array([5,0,1,3,2])
val
#output
array([5, 0, 1, 3, 2])
#argsort returns the array of index in sorted format
index = val.argsort()
index
#output
array([1, 2, 4, 3, 0])
# Now use the sorted index to create the sorted array
val[index]
# output
array([0, 1, 2, 3, 5])
# lexsort
F_name = np.array (['Pran','Shakti','Mogambo','Shakal'])
L_name = np.array (['Amar','Akhbar','Anthony','Prem'])
Sorter = np.lexsort((F_name, L_name))
Sorter
#output:
array([1, 0, 2, 3])
print([L_name[i] + "," + F_name[i] for i in Sorter])
#output:
['Akhbar,Shakti', 'Amar,Pran', 'Anthony,Mogambo', 'Prem,Shakal']
Conclusion:
Appropriate implementation of sorting technique can make the tedious and time consuming data preprocessing and analysis job easier and fast.