Analytic Functions
Analytic Functions help while working with complex data sets. So it is important to understand it correctly. Also, it helps to optimize the query.
lets see about these functions one by one first for that , lets create a table called employees with schema employee_id, job_title, employee_name, salary, salary_rank .
after inserting the values the table will look like the one below for now the rank column will be null as the values as rank will be generated using the functions
RANK FUNCTIONS:
RANK()
ROW NUMBER()
DENSE RANK()
NTILE()
RANK()
Here we use the Rank function to get the same rank for the rows having similar function based on the job_title and ordering it by salary descending order
Engineer with same salary gets the same rank ,HR with same salary gets same rank and the 2 manager get the rank 1 and the one based on the salary he gets rank 3 and the other 3 HR based on the salary gets rank 13 and for 2 same salaries rank would be 18. so here the order done by job title and ranking is based on the salary .
RANK() PARTITION BY
In below query i have used PARTITION BY function it performs the calculation for each group (job title) descending order of the salary .each subset would get the ranking based on the salary
below example ,developer gets 3 different salary it is been ranked 1,2 and 3 but the testers have 2 different salary where 3 testers has same and another tester has different salary here the ranking is 1,1,1,4 .coming to HR they have 3 different salaries and they are ranked accordingly.
DENSE RANK()
Here overall ranking is given based on the job title and salary descendingly , based the duplicate value the ranking varies
in the below output ,for same duplicate values same rank will be there in the below example Developer has 3 different values so different rank is given,receptionist has the same 2 values and gets 10 th rank HR gets 3 different values and the ranking is based on that .
DENSE_RANK OVER PARTITION BY
when we use PARTITION BY along with DENSE_RANK() same like before i order by salary and partition by job title so the ranking work be based on the salary among the job
ROW NUMBER ()
Here ,order by salary and the unique sequential number is generated as the as the salaryrank. Rows having the same value also gets the different rank.
NTILE() - ranking will be based on how many group of results specified.
tool divides the row into 3 buckets each .
in the query i have give NTILE(3) where it partition and divides into 3 groups or buckets
while using the PARTITION BY along with NTILE
here the result of rank is like using the job title it partitioned among the jobtitle, the ranking is given for developer is 1,2,3 where as when the partition is not used in previous case it was like on the whole the ranking was done into 3 buckets.
UNION
Unions helps to combine two tables together, but it has the condition like number of column in each data table must match , also it does not require common key ,requires only the attributes, and while merging it will ignore the duplicates.
Example :
lets consider a product table :
here i have 2 queries like collecting 5 products with highest price and 5 highest products with price /weight ratio and using union
when I run the queries separately i get the below results
when we join 2 queries with the function UNION it takes the result of one query and joins the other automatically
if union keyword checks for an identical row in both the list there it gets joined it removes the duplicate i the above table we have the row called incredible granite mouse department home in both union removes that , we need the duplicate rows also in some cases in that UNION ALL keyword can me used so it will display duplicate rows also.
INTERSECT
when we use INTERSECT instead of UNION or UNION ALL in the above query
it will keep the rows in common to both queries.
(SELECT * FROM products
ORDER BY price DESC
LIMIT 5)
INTERSECT
(SELECT * FROM products
ORDER BY price/weight DESC
LIMIT 5);
Incase we use the INTERSECT ALL keyword we would except the the common row to be displayed twice as it has same id but we still get the same result,from both queries it takes either form the below oneor from the above one .
EXCEPT
Lets assume the same query when we use EXCEPT keyword insted of INTERSECT out there
it finds the rows that are present in first query but not second query, removes the duplicate
lets see the screenshot of the queries result when run separately and the analyse the result of EXCEPT
note: this is the result of the 1st query
this is the result of second query
here the marked ones are only common ,it removes that also it completely removes the second table ,it simply find the rows of first query but not the second query below is the result of that
RESULT of EXCEPT
to be get understood better lets inter change the queries
it will consider the price /weight table as first query this time and gives the result without second table now
CONCLUSION:
RANK() - specifies the rank for each rows.
DENSE_RANK() -in case there are duplicate values, different ranks will be assigned for that .
ROW_NUMBER () - rank will be different for same values in unique sequential numbers.
NTILE() - rank will be based the grouping mentioned as numbers
UNION: Combines 2 tables with the attribute, removes the duplicate
UNION ALL: Join together results of two queries
INTERSECT: Shows only the common rows ,removes duplicate
EXCEPT: Finds row first query not in second ,Generate unique records removes duplicate.
EXCEPT ALL: Finds the rows from first query not in second query