top of page
Writer's pictureSruthi Sudhagaran

Back to basics - SQL Joins

SQL (Structured Query Language) is the core of relational database systems that helps in handling of data.

JOIN statement is used to combine rows from two tables based on a common column and selects records that have matching values in these columns. They are primarily used when a user needs to extract data from tables which have one-to-many or many-to-many relationships between them.


Why Joins?


SQL JOINS are used when user needs to retrieve data from multiple tables that are related by a common column or key. Common scenarios for using JOIN are as below:


1.Retrieving data from related tables: When you have data distributed across multiple tables and you need to combine it into a single result set for analysis or reporting.

2.Creating relationships: To establish connections between tables using keys or foreign keys, enabling you to retrieve data from one table based on data in another.

3.Normalizing data: When data is divided into multiple tables to avoid redundancy, JOINs help to reconstruct the data into a complete form when querying.

4.Combining data: When you need to bring together data from different sources or databases for a unified view.


Basic Types of SQL Joins


  • Inner Join

  • Full Outer Join

  • Left Join

  • Right Join


All concepts in this blog will be explained with example using PostgreSQL. The below tables will be used for the same.


Customers Table
Orders Table












Shipping Table

Inner Join

The SQL INNER JOIN statement joins two tables based on a common column and selects rows that have matching values in these columns.   

Inner join can be used when user needs results that have matching values in both tables.

INNER JOIN should be avoided when user wants to retrieve rows from one table that may or may not have corresponding rows in another table.


Syntax:


SELECT column1. table1,column2.table2  [columns from both tables]

FROM table1

INNER JOIN table2

ON table1.column1 = table2.column2


Example Query:


To display the Customers who have an Order, an Inner Join on Orders and Customers table is performed.


Customers Table
Orders Table













Inner Join

Inner Join Use Cases


Retrieving data from related tables: When you need to retrieve only common data from two tables into a single result set excluding the records that do not match.


Establishing relationships: Fetching only common primary key data between two tables.


Normalizing data:  Moving data into multiple tables helps avoid redundancy by excluding rows that do not have matching primary key values.


Combining data: Inner join brings together data from different sources by preserving only matching records.



Full Outer Join


The SQL FULL OUTER JOIN statement joins two tables based on a common column. It selects records that have matching values in these columns and the remaining rows from both of the tables.

This type of join includes rows in a result set even though there may not be a match between rows in the two tables being joined.

An outer join should be used when user wants to retrieve all records from both tables, regardless of whether there are matching records. This join combines the functionality of a Left join and a Right Join.


Syntax:


SELECT  FROM column1.table1,column2.table2 [Columns from both tables]

FULL OUTER JOIN table2

ON table1.column1 = table2.column2;


Example Query:


To preserve the data from both Customer and Orders Tables. Also display combined records from both these tables.




Customers Table
Orders Table












Full Outer Join

Full Outer Join Use cases


Merging data: When you need to combine data from two tables into a single result set while preserving all records.


Handling missing data: In scenarios where data might be missing or incomplete in one or both tables.


Comparing data: For data analysis, auditing, or quality control to compare and identify differences between two data sources.


Reporting exceptions: To identify and report data discrepancies or anomalies across tables.


Left Join


The SQL LEFT JOIN combines two tables based on a common column. It then selects records having matching values in these columns and the remaining rows from the left table.

In a join query, the left table is the one that appears leftmost in the JOIN clause, and the right table is the one that appears rightmost.

If there is a row in the left table but no match in the right table, then the associated result row contains NULL values for all columns coming from the right table.

A left join can be used to retrieve all rows from the left table and the matched rows from the right table, with NULL values for non-matching rows.


Syntax:


SELECT  column1. table1,column2.table2 [columns from both tables]

FROM table1

LEFT JOIN table2

ON table1.column1 = table2.column2


Example Query:


To view the Order status of customers in the Orders Table.


It is to be noted that the table name to the leftmost of the query is considered the Left Table and the rightmost table will be considered the right table. In this case Orders table is in the leftmost position and all records from Orders table needs to be fetched with the corresponding matching rows from Shipping Table. Records that don't have a matching ID will be populated as null values.


Orders Table

Shipping Table











Left Join

Right Join


The SQL RIGHT JOIN combines two tables based on a common column. It then selects records having matching values in these columns and the remaining rows from the right table.

In a join query, the right table is the one that appears rightmost in the JOIN clause, and the left table is the one that appears leftmost.

If there is a row in the right table but no match in the left table, then the associated result row contains NULL values for all columns coming from the left table.

A right join can be used to retrieve all rows from the right table and the matched rows from the left table, with NULL values for non-matching rows.


Syntax:


SELECT column1. table1,column2.table2 [columns from both tables]

FROM table1

RIGHT JOIN table2

ON table1.column1 = table2.column2


Example Query:


To find the Customers who have not placed any Orders.


In this case Orders table is in the leftmost position and all records from Orders table needs to be fetched with the corresponding matching rows from Customers Table. Records that don't have a matching ID will be populated as null values.


Customers Table

Orders Table









Right Join

Left or Right Join?


It is observed that Left and Right Joins function in the same manner but in exactly opposite ways. In other words, Left Join can be termed as the mirror image of Right join. So. in scenarios where one needs to choose between a Left and Right join, it has toe be remembered that a Left Join can be written as Right Join to give the same output given as the Left Join.


Example Query:


Considering the last example of Right Join shown above, this same Right Join query can be written as Left Join as shown below, by switching the position of the tables and changing Right Join to Left Join. This will return the same result.


So when it comes to Left or Right Join, the position of the table from which all records are required , along with the appropriate Left or Right Join clause can fetch the desired results either ways.


Handling Null values in Left or Right Join


When performing a LEFT OR RIGHT JOIN, it's common to encounter NULL values in the result set for rows in the join table that don't have a matching row in the corresponding table.


To handle these NULL values, we can use the COALESCE() function.


Syntax:


SELECT Table1Alias.Column1, COALESCE(Table2Alias.column2, 'No Orders') AS order_status

FROM Table1 AS Table1Alias

LEFT JOIN Table2 AS Table2Alias

ON Table1Alias.Column1 = Table2Alias.Column2

ORDER BY Table1Alias.Column1;


The COALESCE() function ensures that if the status column is NULL, it displays ‘No Orders’ . Also aliases are used

in queries as shown above, to shorten our Query and make it more easily readable.


Customers Table
Shipping Table











Left Join with Coalesce

Using Joins with WHERE clause


We can use the LEFT JOIN statement with an optional WHERE clause.


This helps in further filtering the data as per requirement.


Syntax:


SELECT Table1.Column1,Table2.Column2

FROM Table1

LEFT JOIN Table2

ON Table1.Coumn1 = Table2.Column2

WHERE Table2.Column1 >= 100;


Example Query:


Fetch all Customers from Customer table, who have Orders above the amount 100.




Customers Table
Orders Table











Left Join with "WHERE" clause

Left and Right Join Use cases


Displaying all items: When you want to display all items from the right table, even if there are no corresponding items in the left table.


Handling missing data: When you need to handle situations where data might be missing or incomplete in one of the tables.


Optional relationships: In scenarios where relationships between tables are optional, and you still want to display data from the secondary table.


Combining data: When you want to combine data from multiple tables while preserving all records from one of the tables.


Few additional types of Joins


  • Natural Join

  • Cross Join

  • Self Join


Natural Join


Natural Join in SQL refers to joining two or more tables based on common columns, which have the same name and data type. We do not need to specify the column used for joining two tables in natural join.

The associated tables have one or more pairs of identically named columns.

The columns must be the same data type.

Don’t use ON clause in a natural join.


Syntax:

SELECT * FROM TABLE1

NATURAL JOIN TABLE2.


Example Query:


Natural Join Orders with Shipping Table

Shipping Table
Orders Table












Natural Join

When Natural Join is performed on Shipping and Orders table, the matching column is automatically identified and the tables are combined based on that, without the need for an explicit ON clause in the query. It is observed that the cust_id column is displayed only once, even when this column is present in both the tables. In the above query, since only Natural Join is mentioned, it performs an inner Join by default. Incase other Joins need to be performed, it can be mentioned as "Natural Left Join", "Natural Right Join" and so on.


Self Join


A Self join is where a table is joined with itself.


It is useful when you have a table with a hierarchical relationship or when you need to compare rows within the same table.


Syntax:

SELECT * FROM table1 a, table1 b

WHERE a.common_field = b.common_field;


Example Query:


To create a comparison table of order amounts to identify customers who purchased more and less compared to each other. Here the Self Join compares Customers by their purchase amount to generate a comparison table, displaying corresponding Customer Ids and Amounts.


Orders Table

Self Join

Cross Join


CROSS JOINs are used to combine each row of one table with each row of another table.


 It returns the Cartesian product of the sets of rows from the tables that are joined.


Syntax:

SELECT ColumnName_1, ColumnName_2, ColumnName_N

FROM [Table_1]

CROSS JOIN [Table_2]


OR


SELECT ColumnName_1, ColumnName_2, ColumnName_N

FROM [Table_1],[Table_2]


Example Query:


Cross Join of Customers and Orders

Customers Table
Orders Table












Cross Join

Using Joins in Three Tables


We can use join in three tables.


In the below example the query returns the rows where there is a match between column values in both join conditions.


Syntax:


SELECT Table1Alias1.Coumn1, Table2Alias.Column2, Table3Alias.Column3

FROM Table1 AS Table1Alias

INNER JOIN Table2 AS Table2Alias

ON Table1Alias.Column1 = Table2Alias.Column2

INNER JOIN Table3 AS Table3Alias

ON Table1Alias.Column1  = Table3Alias.Column3;


Example Query:


To display the Customer who has a record in Customers Table, and has an Order placed with a Shipping Status. Here the three tables are Joined together using an Inner Join based on the Use case.


Customers Table
Orders Table
Shipping Table








Join of three tables

Limitations of Joins

Query readability

        Joins can make queries more difficult to read and understand, especially when using  multiple tables or different types of joins.

Database server workload

      More joins in a query can increase the amount of work the database server needs to do, which can make it take longer to retrieve data.

Unwanted rows or columns

     Joins can generate unnecessary or duplicate rows or columns, which can impact the size and quality of the result set.

Database schema and join condition

     Poorly structured or indexed tables or columns can slow down or prevent join operations from working. Incorrect joins can also lead to inaccurate query results.

Join type

     Different types of joins can make it difficult to determine which one to use to get the desired results. For example, natural joins rely on column names and data types to determine the join condition, which can lead to incorrect results if the column names or data types aren't consistent.

Dataset size

     Self-joins can be inefficient when working with large datasets. Full joins can also retrieve all rows from tables, which can lead to very large result sets.


Conclusion

Joins in SQL are very helpful components in extracting data based on user requirements. When used in the correct approach, Joins can prove to be a significant tool in reducing the load on the database and minimize query retrieval time when used effectively at the right context by understanding its limitations and advantages.


96 views

Recent Posts

See All
bottom of page