Nov 59 min read

COMMON TABLE EXPRESSION (CTE)

What is CTE? Why do We need to use CTE?

A Common Table Expression (CTE) is a temporary result set in SQL that allows for better readability and organization of complex queries.

Purpose of the CTE?

Break Down Complex Queries: CTEs help simplify complicated queries by allowing you to break them into smaller, easier-to-understand parts. Instead of writing long and confusing nested subqueries, you can define each part separately and refer to them when needed.

Improve Readability: Since CTEs are declared at the beginning of a query, they make the code more organized and easier to read. Each CTE has a clear name, so it’s simpler to understand what each section does, which is helpful when multiple developers are working together.

Support Recursive Queries: Recursive CTEs allow a query to reference itself, which is useful for dealing with hierarchical data (like family trees or organizational structures). This means you can handle complex relationships without writing overly complicated code.

CTEs can be used with various SQL commands like SELECT, INSERT, UPDATE, and DELETE. This makes them a flexible option for different data manipulation tasks, whether you’re generating reports, updating data, or creating temporary datasets.

Key Points of the CTE

It is available only for the duration of the main query. It Acts like a temporary table for the main query.

CTE Syntax and structure

Basic CTE Syntax

WITH cte_name AS ( SELECT columns FROM tableWHERE condition)SELECT * FROM cte_name;

The WITH keyword starts the CTE. It signals that we’re creating a temporary result set, making the query clearer and more organized.
This CTE Name acts as a label for the temporary data, so it should be descriptive to enhance understanding.
The AS keyword defines the structure of the CTE. The actual query for the CTE is placed inside parentheses (e.g., (SELECT query…)), specifying the data we want.
Inside the parentheses, you write a standard SQL SELECT statement to choose the columns, tables, and any conditions needed to filter the data.
After defining the CTE, you can use it in your main query just like a regular table. For example, you can use SELECT * FROM cte_name; to access the results of the CTE and even join it with other tables.

Example and Explanation of Basic CTEs

Query to list movies with a rental duration longer than the average:

WITH avg_duration AS (

SELECT AVG(rental_duration) AS avg_duration FROM film )

SELECT title, rental_durationFROM film, avg_durationWHERE rental_duration > avg_duration;

This query finds all movies with a rental duration longer than the average rental duration by using a CTE to first calculate the average duration.

Syntax: WITH avg_duration AS ( … )

We create a CTE called avg_duration to store the average rental duration for movies.
Using a CTE makes it easy to separate the average calculation and use it later in the main query.
Inside the CTE, we calculate the average rental duration with:

SELECT AVG(rental_duration) AS avg_duration FROM film;

This gives us a single value, avg_duration, representing the average rental time for all movies.
After defining the CTE, we can reference it in the main query like a table.
The main query retrieves the movie titles and rental durations from the film table.
It uses both the film table and the avg_duration CTE in the FROM clause, allowing us to access the average duration.
The WHERE clause filters movies with a rental duration greater than the average:

WHERE rental_duration > avg_duration;

This leaves us with only the movies that exceed the average rental time.
The query results will display the titles and rental durations for each movie that has a longer rental duration than the average.

Output:

Recursive CTEs

Recursive CTEs allow you to repeat a process until a condition is met, making them useful for working with hierarchical or sequential data. They consist of two main parts: the anchor member (the starting point) and the recursive member (the part that keeps referencing the CTE itself).

Recursive CTE Syntax

WITH RECURSIVE cte_name AS ( 

— Anchor memberSELECT columnsFROM table

WHERE condition UNION ALL

— Recursive member

SELECT columnsFROM tableJOIN cte_name ON condition

) SELECT * FROM cte_name;

WITH RECURSIVE Clause:

This part introduces the recursive CTE and gives it a name (e.g., cte_name) for later reference in the query.
The anchor member is the base case for the recursion. Here, you define the initial dataset using a SELECT statement that specifies columns from a table and conditions to filter the data (like selecting top-level employees without managers). This part runs once to set the foundation for the recursion.
UNION ALL:

This operator combines the anchor and recursive members, allowing the recursion to repeat. It keeps all rows from both parts, helping to build the hierarchy or sequence step by step.
The recursive member references the CTE itself (e.g., cte_name) to perform further steps. This part typically involves joining the table with the CTE to find new rows based on what has already been processed (like finding employees who report to the previous level of managers). This process continues to build out the hierarchy.
There is an implied stop condition that tells PostgreSQL to stop executing the recursive part when no more rows are returned from the recursive step.
After defining the CTE, you can retrieve all the generated data with SELECT * FROM cte_name. This final output contains the complete result set, which can represent a hierarchy, sequence, or any structure built by the CTE.

Example of recursive CTEs

Find an employee hierarchy

WITH RECURSIVE employee_hierarchy AS (SELECT employee_id, full_name, manager_id, 1 AS level FROM employees WHERE manager_id IS NULL
UNION ALL
SELECT e.employee_id, e.full_name, e.manager_id, eh.level + 1
FROM employees eJOIN employee_hierarchy eh ON e.manager_id =
eh.employee_id)
SELECT * FROM employee_hierarchy ORDER BY level;

For this example, We want to create a report showing each employee and their managers in a company hierarchy, from the bottom to the top. Recursive CTEs are perfect for handling this type of hierarchical data.

Recursive CTEs can reference themselves, allowing data processing in a loop-like manner. They consist of two main parts: the base case (starting point) and the recursive part (which continues the search).

The base case selects the top-level employees who have no managers. This is where the recursion starts.

WITH RECURSIVE employee_hierarchy AS (SELECT employee_id, name, manager_id, 1 AS level FROM employees WHERE manager_id IS NULL  
-- selects top-level employees
)

This query finds employees without managers and adds a level column to track hierarchy depth, starting at 1 for top-level employees.

The recursive part references the CTE itself to find the next level of employees.

UNION ALL SELECT e.employee_id, e.name, e.manager_id, eh.level + 1
FROM employees e INNER JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id

This part joins the employees table with the previous results (from the base case) to find employees who report to the previously selected managers.
Each recursive step increases the level by 1, indicating how deep we are in the hierarchy.
The UNION ALL combines the base case with the recursive steps.
Recursive CTEs need a stop condition to avoid infinite loops. PostgreSQL automatically stops the recursion when no new rows are returned or when it reaches a system limit (default is 100 iterations, which can be adjusted).

Execution Flow of the Query : Base Case Execution:

The query retrieves top-level employees without managers and assigns them level = 1.

Recursive Execution: For each employee found, the query retrieves their direct reports, increases their level by 1, and repeats this process until no new rows are added.
Stop Condition: The recursion stops when all levels of the hierarchy have been retrieved, either when there are no more employees to add or the maximum limit is reached.

Output:

Advanced CTE Usage in SQL Statements

INSERT with CTE:

Populate new tables based on CTE results.

— Step 1: Create a new table to store active customers

CREATE TABLE active_customers ( customer_id INT PRIMARY KEY, first_name VARCHAR(50), last_name VARCHAR(50), rental_count INT
);

— Step 2: Use CTE to identify recent customers and insert into active_customers

WITH recent_customers AS ( SELECT

c.customer_id,c.first_name,c.last_name,COUNT(r.rental_id) AS rental_count

FROM customer c

JOINrental r ON c.customer_id = r.customer_id

WHEREr.rental_date > CURRENT_DATE — INTERVAL ’26 years’ -- Last 26 years

GROUP BYc.customer_id, c.first_name, c.last_name)INSERT INTO active_customers (customer_id, first_name, last_name, rental_count) SELECT * FROM recent_customers;

In this example, we will use a Common Table Expression (CTE) to find active customers who have rented movies in the last 26 years and insert their information into a new table called active_customers. This method helps us handle complex conditions and grouping more efficiently.

Step 1: Create the Target Table

First, we need to create a table to store the active customers’ data:

CREATE TABLE active_customers (
customer_id INT PRIMARY KEY,first_name VARCHAR(50),last_name VARCHAR(50),rental_count INT);

This table includes columns for customer ID, first name, last name, and rental count.

Step 2: Define the CTE

Next, we define a CTE named recent_customers to find customers with recent rentals. We join the customer and rental tables, count the rentals for each customer, and group the results by customer ID and name.

WITH recent_customers AS (
SELECT c.customer_id, c.first_name, c.last_name,COUNT(r.rental_id) AS rental_count FROM customer c JOIN rental r ON c.customer_id = r.customer_id WHERE r.rental_date > CURRENT_DATE — INTERVAL ’26 years’GROUP BY c.customer_id, c.first_name, c.last_name
)

The WHERE clause filters rentals to include only those from the last 26 years.

Step 3: Insert CTE Results into the Target Table

Now that we have identified the recent customers, we can insert this data into the active_customers table:

INSERT INTO active_customers (customer_id, first_name, last_name, rental_count) SELECT * FROM recent_customers;

This statement selects all records from the recent_customers CTE and inserts them into the active_customers table.

Benefits:

Using a CTE makes the query more organized and easier to read. By separating the logic for identifying active customers, we can efficiently reuse that information for the insertion, simplifying the overall query management.

Output:

UPDATE with CTE

Update existing records based on calculations.

— Step 1: Add a rental_count column to the customer table (if it doesn’t already exist)

ALTER TABLE customerADD COLUMN rental_count INT DEFAULT 0;

— Step 2: Use a CTE to calculate rental counts and update the customer table

WITH rental_stats AS (

SELECTcustomer_id,COUNT(*) AS rental_count

FROM rental

GROUP BY customer_id )UPDATE customerSET rental_count = rental_stats.rental_countFROM rental_statsWHERE customer.customer_id = rental_stats.customer_id;

In this example, we’ll use a Common Table Expression (CTE) to update the rental count for each customer in the customer table. This method makes updating data easier by calculating values in a CTE before applying them in an UPDATE statement.

Step 1: Add the rental_count Column

First, we need to ensure the customer table has a column to store the rental count. We can add this column using the following SQL command:

ALTER TABLE customer ADD COLUMN rental_count INT DEFAULT 0;

This new rental_count column will start with a default value of 0 for all customers and will be updated based on their rental activity.

Step 2: Calculate Rental Counts Using a CTE

Next, we define a CTE called rental_stats to calculate how many rentals each customer has made:

WITH rental_stats AS (
SELECT customer_id, COUNT(*) AS rental_count FROM rental GROUP BY customer_id)

This CTE gathers the rental counts for each customer, grouping the results by customer_id.

Step 3: Update the Customer Table Using CTE Results

Finally, we use the UPDATE statement to set the rental_count in the customer table based on the values from the rental_stats CTE:

UPDATE customer
SET rental_count = rental_stats.rental_count
FROM rental_stats
WHERE customer.customer_id = rental_stats.customer_id;

This query updates each customer’s rental_count using the data from the rental_stats CTE.

Benefits

Using a CTE for this update process offers several advantages:

Efficiency: The rental count calculation is performed once, making the update more efficient.
Readability: Breaking the calculation into a CTE makes the query clearer and easier to troubleshoot.
Modularity: If we need to use the rental count data in other statements, the CTE allows for easy reuse.

Output:

Best Practices and Optimization Tips

Limit Recursion

Explanation: Recursive CTEs can run indefinitely without a stop condition.
Solution: Always include a condition to stop recursion, like a maximum recursion limit or a natural endpoint in the hierarchy (e.g., a top-level node with no parent).
Example: When building a hierarchy, ensure there’s a stop condition like MAXRECURSION to prevent infinite loops.

Avoid Overusing CTEs for Simple Queries

Explanation: CTEs are useful for complex queries but can slow down performance for simple tasks.
Guideline: Use regular SELECT or UPDATE statements instead of CTEs for straightforward operations.
Example: If you only need to filter or update a few rows, a simple SELECT or UPDATE might be more efficient than using a CTE.

Break Down Complex Queries with Multiple CTEs

Explanation: Splitting large queries into multiple CTEs can enhance readability and make troubleshooting easier.
Tip: Treat each CTE as a logical step in a process, like data cleaning, transformation, and aggregation.
Example: When creating a complex report, use separate CTEs for data cleaning, joining tables, and calculating summaries, each step being clear and manageable.

Use Temporary Tables for Performance-Intensive Operations

Explanation: Temporary tables can be more efficient than CTEs for large datasets accessed multiple times.
Guideline: If a CTE is used repeatedly or involves heavy calculations, consider using a temporary table to store results.
Example: If you frequently join a large table, loading it into a temporary table and indexing it can be faster than recalculating it with a CTE each time.

COMMON TABLE EXPRESSION (CTE)

What is CTE? Why do We need to use CTE?

Purpose of the CTE?

Key Points of the CTE

CTE Syntax and structure

Basic CTE Syntax

Example and Explanation of Basic CTEs

Output:

Recursive CTEs

Recursive CTE Syntax

Example of recursive CTEs

Execution Flow of the Query : Base Case Execution:

Output:

Advanced CTE Usage in SQL Statements

UPDATE with CTE

Best Practices and Optimization Tips

Recent Posts