Photo by Olav Ahrens Røtne on Unsplash
The GROUP BY clause group the data in the rows by a specific column/multiple columns. It is often used along with aggregate functions like COUNT(), SUM(), AVG(), MIN(), MAX(). The GROUP BY clause helps in summarizing a large amount of data in a short time.
Here I have used the sample superstore database to describe the concept.
Query:
SELECT
customer_name,
SUM(sales)
FROM superstore
GROUP BY customer_name
Here, the GROUP BY clause returned the total amount of sales grouped by the customer name.
Now, lets see how to GROUP BY multiple columns.
Query:
SELECT
category,
sub_category,
SUM(sales)
FROM superstore
GROUP BY category, sub_category
ORDER BY 1, 2
This output showed the sales of unique category and sub_category combination.
Now, lets see how to get category and sub_category wise overall sales in a single query. Lets work on that by making individual grouping first and joining them.
Query 1: (category-wise sales)
SELECT
category,
SUM(sales)
FROM superstore
GROUP BY category
Query 2: (sub_category-wise sales)
SELECT
sub_category,
SUM(sales)
FROM superstore
GROUP BY sub_category
Query 3: (category and sub_category-wise sales)
SELECT
category,
sub_category,
SUM(sales)
FROM superstore
GROUP BY category, sub_category
order by 1, 2
To make all these groupings into one single query, we are using UNION. And while using UNION all queries should have same columns. Here, The first query has only category column, so we are adding NULL clause for sub_category . Similarly, we will add NULL clause for category in the second query and the order of columns should be same in all the queries.
Query: (1 & 2)
SELECT
category,
NULL as sub_category,
SUM(sales)
FROM superstore
GROUP BY category
UNION
SELECT
NULL as category,
sub_category,
SUM(sales)
FROM superstore
GROUP BY sub_category
ORDER BY 1, 2
This output shows category-wise and sub_category-wise sales separately. Lets join the third query along with this to see the overall sub_category and category wise sales and the category-sub_category combination sales.
Query: (1, 2 & 3)
SELECT
category,
NULL as sub_category,
SUM(sales)
FROM superstore
GROUP BY category
UNION
SELECT
NULL as category,
sub_category,
SUM(sales)
FROM superstore
GROUP BY sub_category
UNION
SELECT
category,
sub_category,
SUM(sales)
FROM superstore
GROUP BY category, sub_category
order by 1, 2
Here, we can see that this query has returned the unique combination of category-sub_category sales, the category-wise sales and the sub_category-wise sales.
However, this query using UNION is very complicated as it involves a lot of grouping. An easier method to return this output is by using GROUPING SETS clause.
GROUPING SETS:
GROUPING SETS is used when we want to group by different sets of columns.
Syntax:
GROUP BY
GROUPING SETS (Column 1, Column 2, ….)
Here, we need to give the combinations of GROUP BY category in the GROUPING SETS clause.
Query:
SELECT
category,
sub_category,
SUM(sales)
FROM superstore
GROUP BY
GROUPING SETS (
(category),
(sub_category),
(category, sub_category)
)
ORDER BY 1, 2
Here, we have received the same output with the simpler query.
Now, lets see about the two common types of GROUPING SETS: ROLLUP and CUBE. These have predefined grouping sets.
ROLLUP:
ROLLUP function creates a hierarchy between the columns. Order is very important in ROLLUP function because Column 1 will be the highest level of hierarchy. ROLLUP function is mostly used in generating sub-totals and grand totals.
Syntax:
GROUP BY
ROLLUP(Column 1, Column 2, Column 3)
This works like:
(Column1, Column 2, Column 3),
(Column 1, Column 2),
(Column 1),
()
Here, ROLLUP function makes 4 grouping sets of these 3 columns.
In this case, () gives the overall total of the grouping set, the next level of hierarchy gives the overall total of Column 1, the next level gives the sub_total of Column 1 and Column 2 and the lowest level gives the individual value of that column.
Now, lets see this with an example of year, quarter, month and day-wise sales .
Query:
SELECT
EXTRACT(year from order_date) as year,
'Q' || EXTRACT(quarter from order_date) as quarter,
EXTRACT(month from order_date) as month,
DATE(order_date) as day,
SUM(sales)
FROM superstore
GROUP BY
ROLLUP (year, quarter, month, day)
ORDER BY 1,2,3,4
I will share the output of this query in parts.
The first part shows the daywise sales of the 1st month (January) of 2018 and the overall sales of January, 2018.
Here, we can see the overall sales of the march month and the overall sales of the 1st quarter of 2018.
In this part, we can see the overall sales of the December month of 2018, the overall sales of the last quarter of 2018 and the overall sales of 2018.
Likewise, it has returned the total sales per day, month, quarter and year for all the years (2018-2021) available in the dataset.
This last part of the output shows the total sales of December 2021 , the total sales of the 4th quarter of 2021, the total sales of 2021 and the overall sales of all 4 years (2018-2021).
ROLLUP function is mostly used in cases of natural hierarchy like
Year, Quarter, Month, Day
Country, Region, State, City
CUBE:
CUBE function in GROUP BY clause will give all possible combinations of columns. CUBE function is mostly used in cases where we want analyze data based on multiple perspectives.
Syntax:
GROUP BY
CUBE(Column 1, Column 2, Column 3)
This works like:
(Column1, Column 2, Column 3),
(Column 1, Column 2),
(Column 2, Column 3),
(Column 1, Column 3),
(Column 1),
(Column 2),
(Column 3),
()
Here, CUBE function makes 8 possible grouping sets of these 3 columns.
Query:
SELECT
customer_name,
EXTRACT(year from order_date)as year,
TO_CHAR(order_date, 'month') as month,
SUM(sales)
FROM superstore
GROUP BY
CUBE(customer_name, year, month)
ORDER BY 1,2,3
Lets look at the purchase details of customer 'Adrian Barton'. The CUBE function has first returned the purchase amount of the month of December(2018) and the overall purchase details of 2018. Likewise, it has reported purchase details for all the individual months of the year, the individual years, the total purchase for the months of all 4 years (For eg. if you check row number 117, it has given the total amount spent in the month of December 11000.4 (=1107.66 + 9892.74) and the overall purchase amount made by the customer 'Adrian Barton'.