top of page
hand-businesswoman-touching-hand-artificial-intelligence-meaning-technology-connection-go-

Grouping Sets in PostgreSQL


The GROUP BY clause group the data in the rows by a specific column/multiple columns. It is often used along with aggregate functions like COUNT(), SUM(), AVG(), MIN(), MAX(). The GROUP BY clause helps in summarizing a large amount of data in a short time.

Here I have used the sample superstore database to describe the concept.


Query:

 

SELECT

              customer_name,

              SUM(sales)

              FROM superstore

GROUP BY customer_name



Here, the GROUP BY clause returned  the total amount of sales grouped by the customer name.

 

Now, lets see how to GROUP BY multiple columns.


Query:


SELECT

              category,

              sub_category,

              SUM(sales)

              FROM superstore

GROUP BY category, sub_category

ORDER BY 1, 2

 


 

This output showed the sales of unique category and sub_category combination.

Now, lets see how to get category and sub_category wise overall sales in a single query. Lets work on that by making individual grouping first and joining them.


Query 1: (category-wise sales)


SELECT

              category,

              SUM(sales)

FROM superstore

GROUP BY category



 

Query 2: (sub_category-wise sales)

SELECT

              sub_category,

              SUM(sales)

FROM superstore

GROUP BY sub_category



 

Query 3: (category and sub_category-wise sales)

SELECT

              category,

              sub_category,

SUM(sales)

FROM superstore

GROUP BY category, sub_category

              order by 1, 2



 

To make all these groupings into one single query, we are using UNION. And while using UNION all queries should have same columns. Here, The first query has only category column, so we are adding NULL clause for sub_category . Similarly, we will add NULL clause for category in the second query and the order of columns should be same in all the queries.

 

Query: (1 & 2)

SELECT

              category,

              NULL as sub_category,

              SUM(sales)

FROM superstore

GROUP BY category

UNION

SELECT

              NULL as category,

              sub_category,

              SUM(sales)

FROM superstore

GROUP BY sub_category

ORDER BY 1,  2


This output shows category-wise and sub_category-wise sales separately. Lets join the third query along with this to see the overall sub_category and category wise sales and the category-sub_category combination sales.

 

Query: (1, 2 & 3)

SELECT

              category,

              NULL as sub_category,

              SUM(sales)

FROM superstore

GROUP BY category

UNION

SELECT

              NULL as category,

              sub_category,

              SUM(sales)

FROM superstore

GROUP BY sub_category

UNION

SELECT

              category,

              sub_category,

SUM(sales)

FROM superstore

GROUP BY category, sub_category

              order by 1, 2

 



  Here, we can see that this query has returned  the unique combination of category-sub_category sales, the category-wise sales and the sub_category-wise sales.

However, this query using UNION is very complicated as it involves a lot of grouping. An easier method to return this output is by using GROUPING SETS clause.


GROUPING SETS:


GROUPING SETS is used when we want to group by different sets of columns.

 

Syntax:


GROUP BY

GROUPING SETS (Column 1, Column 2, ….)

 

Here, we need to give the combinations of GROUP BY category in the GROUPING SETS clause.


Query:


SELECT

category,

sub_category,

SUM(sales)

FROM superstore

GROUP BY

GROUPING SETS (

             (category),

              (sub_category),

              (category, sub_category)

)

ORDER BY 1, 2

 


Here, we have received the same output with the simpler query.

 

Now, lets see about the two common types of GROUPING SETS: ROLLUP and CUBE. These have predefined grouping sets.


ROLLUP:


ROLLUP function creates a hierarchy between the columns. Order is very important in ROLLUP function because Column 1 will be the highest level of hierarchy. ROLLUP function is mostly used in generating sub-totals and grand totals.

 

Syntax:


GROUP BY

ROLLUP(Column 1, Column 2, Column 3)

 

This works like:

 

(Column1, Column 2, Column 3),

(Column 1, Column 2),

(Column 1),

()


Here, ROLLUP function makes 4 grouping sets of these 3 columns.

In this case, () gives the overall total of the grouping set, the next level of hierarchy gives the overall total of Column 1, the next level gives the sub_total of Column 1 and Column 2  and the lowest level gives the individual value of that column.

Now, lets see this with an example of year, quarter, month and day-wise sales .


Query:


SELECT

EXTRACT(year from order_date) as year,

'Q' ||  EXTRACT(quarter from order_date) as quarter,

EXTRACT(month from order_date) as month,

              DATE(order_date) as day,

SUM(sales)

FROM superstore

GROUP BY

ROLLUP (year, quarter, month, day)

ORDER BY 1,2,3,4


I will share the output of this query in parts.



The first part shows the daywise sales of the 1st month (January) of 2018 and the overall sales of January, 2018.



Here, we can see the overall sales of the march month and the overall sales of the 1st quarter of 2018.

 

 


In this part, we can see the overall sales of the December month of 2018, the overall sales of the last quarter of 2018 and the overall sales of 2018.

Likewise, it has returned the total sales per day, month, quarter and year for all the years (2018-2021) available in the dataset.

 








This last part of the output shows the total sales of December 2021 , the total sales of the 4th quarter of 2021, the total sales of 2021 and the overall sales of all 4 years (2018-2021).


ROLLUP function is mostly used in cases of natural hierarchy like

  • Year, Quarter, Month, Day

  • Country, Region, State, City



CUBE:


CUBE function in GROUP BY clause will give all possible combinations of columns. CUBE function is mostly used in cases where we want analyze data based on multiple perspectives.


Syntax:


GROUP BY

CUBE(Column 1, Column 2, Column 3)

 

This works like:

 

(Column1, Column 2, Column 3),

(Column 1, Column 2),

(Column 2, Column 3),

(Column 1, Column 3),

(Column 1),

(Column 2),

(Column 3),

()

 

Here, CUBE function makes 8 possible grouping sets of these 3 columns.


Query:


SELECT

customer_name,

EXTRACT(year from order_date)as year,

TO_CHAR(order_date, 'month') as month,

SUM(sales)

FROM superstore

GROUP BY

CUBE(customer_name, year, month)

ORDER BY 1,2,3


Lets look at the purchase details of customer 'Adrian Barton'. The CUBE function has first returned the purchase amount of the month of December(2018) and the overall purchase details of 2018. Likewise, it has reported purchase details for all the individual months of the year, the individual years, the total purchase for the months of all 4 years (For eg. if you check row number 117, it has given the total amount spent in the month of December 11000.4 (=1107.66 + 9892.74) and the overall purchase amount made by the customer 'Adrian Barton'.



7 views0 comments

Recent Posts

See All

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page