Common Table Expressions (CTE) in PostgreSQL

In this blog, we will see about CTEs in PostgreSQL.

What is CTE?

The Common Table Expressions (CTE) were introduced into standard SQL in order to simplify various classes of SQL Queries for which a derived table was just unsuitable. A Common Table Expression (CTE) is the result set of a query which exists temporarily and for use only within the context of a larger query. Much like a derived table, the result of a CTE is not stored and exists only for the duration of the query. Each auxiliary statement in a WITH clause can be a SELECT, INSERT, UPDATE, or DELETE; and the WITH clause itself is attached to a primary statement that can be a SELECT, INSERT, UPDATE, DELETE, or MERGE.

Basic syntax for CTE:

Image source: Google

Description of the syntax:

WITH clause: indicates that the following code segment is common table expression (CTE).
CTE name: Specify the name of the CTE. Scope of the CTE is within the query. CTE name should be unique within the query in which it is used.
Column List (optional): Specify the list of column names within the parentheses after the CTE name. Column names from SELECT statement inside the CTE will be taken if not specified.
AS keyword: Indicates the beginning of the CTE definition.
CTE query: CTE will be defined by this query which may include main SQL clauses likes GROUP BY and other valid SQL constructs.
Main query: After defining the CTE, you can reference it in the main query by its name. In the main query, you can use the CTE as if it were a regular table, simplifying the structure of complex queries.

Next will see some examples for CTE. Examples are based on the Maternal Health database.

Basic PostgreSQL CTE

In the following example, we have a CTE named gestational_dm and the code is to get the patient count with gestational_diabetes_mellitus=1, also how much is the percentage of patients with gdm and among that what percentage has complications in delivery. Outside the CTE, the column names are directly returned from CTE.

WITH gestational_dm AS (
SELECT 
SUM(CASE WHEN gestational_diabetes_mellitus=1 THEN 1 ELSE 0 END) AS gdm_count,
ROUND(SUM(CASE WHEN gestational_diabetes_mellitus=1 THEN 1 ELSE 0 END)*100.0/272,2) AS gdm_percent,
SUM(CASE WHEN fhr.gestational_diabetes_mellitus=1 AND hl.delivery_mode='Cesarean Section' THEN 1 ELSE 0 END) AS gdm_complex_count
FROM hospitalization_labor hl
	JOIN fetal_health_risk fhr
	ON hl.caseid=fhr.caseid
)
SELECT gdm_percent,
	   gdm_count,
	   gdm_complex_count,
	   ROUND(((gdm_complex_count*100.0)/gdm_count),2) AS percent_with_gdm_with_del_complex
FROM gestational_dm;

Example for joining a CTE with a table

In the below example, the CTE has the logic to find out the patient ID with diabetes and who had C-Section delivery. That CTE is joined outside with another table prior_gestational_health to get the patients with more than 2 deliveries.

WITH gestational_dm AS(
	SELECT
		hl.caseid AS patientID
	FROM hospitalization_labor hl
	JOIN fetal_health_risk fhr
	ON hl.caseid=fhr.caseid
	WHERE fhr.gestational_diabetes_mellitus=1 AND hl.delivery_mode='Cesarean Section'
)
SELECT patientID
FROM gestational_dm gd
JOIN prior_gestational_health pgh
ON gd.patientID=pgh.caseid
WHERE pgh.past_pregnancies_number>2;

Example for how to use multiple CTEs

The below example has two CTEs and in the outer SELECT those two CTEs are used.

-- This calculates the patient count with preeclamsia_record_pregnancy=1 and hospital_hypertension=1
WITH hypertension_preeclampsia AS
(
	SELECT COUNT(*) AS hypertension_count,
	SUM(CASE WHEN preeclampsia_record_pregnancy=1 THEN 1 ELSE 0 END) AS BP_preeclampsia_count
	FROM hospitalization_labor
	WHERE hospital_hypertension=1
),
-- This calculates the patient count with preeclamsia_record_pregnancy=1 and hospital_hypertension=0
no_hypertension_preeclampsia AS
(
	SELECT COUNT(*) AS hypertension_count,
	SUM(CASE WHEN preeclampsia_record_pregnancy=1 THEN 1 ELSE 0 END) AS BP_preeclampsia_count
	FROM hospitalization_labor
	WHERE hospital_hypertension=0
)
SELECT
	ROUND((A.BP_preeclampsia_count*100.0/A.hypertension_count),2) AS percent_with_bp_preeclampsia,
	ROUND((B.BP_preeclampsia_count*100.0/B.hypertension_count),2) AS precent_with_no_bp_preeclampsia,
	ROUND((((A.BP_preeclampsia_count*100.0/A.hypertension_count)-(B.BP_preeclampsia_count*100.0/B.hypertension_count))/
		  (B.BP_preeclampsia_count*100.0/B.hypertension_count)),2) AS percent_increase
FROM hypertension_preeclampsia A,no_hypertension_preeclampsia B;

Pros of using CTE:

Below are the main advantages of using CTE in PostgreSQL.

Simplicity: CTEs make complex queries easier to read and maintain by breaking them into logical, named blocks.
Reusability: CTEs can be referenced multiple times within a single query, reducing redundancy and improving query structure.
Recursive Queries: CTEs support recursive queries, allowing hierarchical data to be queried and manipulated efficiently.
Performance: PostgreSQL can optimize CTEs, potentially leading to improved query performance compared to using subqueries or temporary tables.
A useful property of WITH queries is that they are normally evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries. Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work.

Cons of using CTE:

Limited Scope: CTEs only exist within the context of the WITH clause in which they are defined. They cannot be referenced outside that query.
Memory Usage: WITH queries may consume more resources compared to alternative query structures. This is especially true when dealing with executing multiple CTEs simultaneously on large datasets.

Next will see what is a recursive CTE.

A recursive CTE allows you to perform recursion within a query using the WITH RECURSIVE syntax. A recursive CTE is referred as a recursive query.

Syntax for recursive query:

cte_name: Name of the CTE.
column1, column2, … Specify the columns selected in both the anchor and recursive members.
Anchor member: This forms the base result set of the CTE structure.
Recursive member: Refer to the CTE name itself. It combines with the anchor member using the UNION or UNION ALL operator.
recursive_condition: Is a condition used in the recursive member that determines how the recursion stops.

Recursive CTE is executed in the following sequence:

First, execute the anchor member to create the base result set (R0).
Second, execute the recursive member with Ri as an input to return the result set Ri+1 as the output.
Third, repeat step 2 until an empty set is returned. (termination check)
Finally, return the final result set that is a UNION or UNION ALL of the result sets R0, R1, … Rn.

A recursive CTE can be useful when dealing with hierarchical or nested data structures, such as trees or graphs.

Thank You for reading!

Common Table Expressions (CTE) in PostgreSQL

Recent Posts