Oct 15 min read

Exploring the Subtle Differences in SQL Statements and Functions: A Beginner's Guide

DISTINCT VS GROUP BY

The DISTINCT statement retrieves unique values by removing duplicate rows from the result set based on the specified columns.

SELECT DISTINCT location_id, department_name

FROM departments;

The GROUP BY clause groups rows based on shared values in specified columns. It is typically used with aggregate functions to compute measures for each group. When no aggregate function is present, it yields the same result as DISTINCT and can filter out duplicates during the grouping process, depending on the query.

SELECT location_id, department_name,

COUNT (*) AS Count

FROM departments

GROUP BY location_id,department_name;

WHERE VS HAVING

The WHERE clause filters records based on specified condition and appears before the GROUP BY Clause.

It cannot be used with aggregate functions in the SQL Statement.

SELECT location_id,department_name,

COUNT (*) AS department_Count

FROM departments

WHERE department_name = 'IT'

GROUP BY location_id,department_name;

The HAVING clause is commonly used with the GROUP BY clause to filter groups based on specified conditions, and it can include aggregate functions in the SELECT statement.

SELECT location_id,department_name,

COUNT (*) AS department_Count

FROM departments

GROUP BY location_id,department_name

HAVING COUNT (department_name) > 2;

JOINS /UNION / UNION ALL

The JOINS combine all records and may include duplicates into new columns from multiple tables based on related columns.

SELECT department_name,city,country_name

FROM departments d

JOIN locations l

ON l.location_id = d.location_id

JOIN countries c

ON l.country_id = c.country_id;

The UNION Operator concatenates result sets from two or more SELECT statements, combining the records into new distinct rows.

SELECT country_name

FROM countries

UNION

SELECT country_name

FROM new_countries;

In Contrast, The UNION ALL Operator concatenates result sets from multiple SELECT statements while including duplicate records.

SELECT country_name

FROM countries

UNION ALL

SELECT country_name

FROM new_countries;

BETWEEN / IN / EXISTS

The BETWEEN Operator selects a range of data between two values (begin and end values).

SELECT country_name FROM countries

WHERE region_id

BETWEEN 1 AND 3;

Whereas the IN Operator specifies multiple values in the WHERE Clause.

SELECT country_name FROM countries

WHERE region_id IN (1, 2, 3);

The EXISTS Operator returns True or False based on the existence of the records in a subquery.

SELECT country_name FROM countries c

WHERE EXISTS (SELECT * FROM regions r WHERE c.region_id=r.region_id);

DROP/ DELETE /TRUNCATE

The DROP command is a part of Data Definition Language (DDL), removes a table from the database.

DROP TABLE regions;

The DELETE command is a part of Data Manipulation Language (DML), removes one or more records from a table.

DELETE FROM regions WHERE region_id = 2;

The TRUNCATE command, also a DDL operation, deletes all rows from an existing table.

TRUNCATE TABLE regions;

COALESCE VS NULLIF

The COALESCE () function returns the first non-null value from a list of arguments. While it can accept multiple arguments, they must all be of the same data type.

SELECT COALESCE(NULL,NULL,'US','UK') AS Country;

On the other hand, The NULLIF () function takes two values and returns the first value unless the two values are equal, in which case it returns NULL.

SELECT NULLIF ('US','US') AS Country;

AGGREGATE VS WINDOW FUNCTIONS

SQL aggregate functions are essential for performing calculations on groups of data.

AVG () computes the average of a numeric column.

SELECT ROUND (AVG (min_salary),2) FROM jobs;

SUM () adds up all the values in a numeric column.

SELECT ROUND(SUM(max_salary),2) AS Total_Max_salary FROM jobs;

MAX () retrieves the highest value in a specified column.

SELECT ROUND(MAX(max_salary),2) AS Highest_Max_Salary FROM jobs;

MIN () finds the lowest value in a specified column.

SELECT ROUND(MIN(max_salary),2) AS Lowest_Min_Salary FROM jobs;

COUNT () counts the number of rows in a result set.

SELECT COUNT(*) FROM jobs;

The above functions can be used with GROUP BY to perform calculations on subsets of data as well.

In SQL, WINDOW FUNCTIONS operate on a set of rows called a window frame. They return a single value for each row from the underlying query. A window function allows you to perform calculations across a set of rows related to the current row, and defining that set is done with the OVER () clause.

We can use AGGREGATE FUNCTION in window functions.

SELECT

job_id, min_salary,

SUM(min_salary) OVER (ORDER BY job_id) AS running_total

FROM jobs;

The PARTITION BY clause in a window function allows to define distinct groups (or partitions) within the result set.

SELECT

job_id, max_salary,

ROUND(AVG(max_salary) OVER (PARTITION BY job_id)::numeric,2) AS avg_max_salary

FROM jobs;

The ORDER BY clause within the OVER () clause sorts the result set in ascending or descending order.

SELECT

job_id,

max_salary,

RANK () OVER (ORDER BY max_salary DESC) AS max_salary_rank

FROM jobs;

The ROW_NUMBER () window function assigns a unique sequential integer to each row within a partition of a result set, starting at 1 for the first row. This numbering is based on the order specified in the ORDER BY clause within the OVER () statement.

SELECT

job_id,

max_salary,

ROW_NUMBER () OVER (ORDER BY max_salary DESC) AS max_salary_row_num

FROM jobs;

Both RANK () and DENSE_RANK () is used for ranking rows within a partition, but they handle ties (duplicate values) differently compared to ROW_NUMBER ()

In RANK () When there are identical values, they receive the same rank. If two rows are tied for rank 1, the next rank will be 3.

SELECT

job_id,

min_salary,

RANK() OVER (ORDER BY min_salary DESC) AS min_salary_rank

FROM jobs;

In Contrast, DENSE_RANK () identical rows receive the same rank and the next rank after ties will continue sequentially. So, if two rows are tied for rank 1, the following row will be ranked 2.

SELECT

job_id,

min_salary,

DENSE_RANK() OVER (ORDER BY min_salary DESC) AS min_salary_rank

FROM jobs;

The NTILE () function is a powerful window function in SQL that divides the result set into n buckets (or tiles) and assigns a bucket number to each row. This is particularly useful for identifying percentiles, quartiles, or any other subdivision of your data.

SELECT

job_id,

max_salary,

NTILE(4) OVER (ORDER BY max_salary) AS max_salary_quartile

FROM jobs;

LAG () function allows you to access data from a previous row in the same result set. It’s useful for comparing the current row’s value with a preceding row's value. Conversely, LEAD () function allows you to access data from a following row. It’s helpful when you want to compare the current row’s value with a subsequent row's value.

SELECT job_id, min_salary,

LAG(min_salary) OVER (ORDER BY min_salary desc) AS previous_min_salary,

LEAD(min_salary) OVER (ORDER BY min_salary DESC) AS next_min_salary

FROM jobs

ORDER BY min_salary desc;

CONCLUSION

SQL is an invaluable tool for data analysis, but its vast array of statements and functions can be daunting for beginners. In this guide, we have explored some fundamental concepts that often lead to confusion. Understanding these distinctions is crucial for effective data manipulation. Happy querying!

Exploring the Subtle Differences in SQL Statements and Functions: A Beginner's Guide

Recent Posts