top of page
Writer's pictureSheba Alice Prathab

A Comprehensive Guide to Indexes in SQL: From Basics to Advanced Usage


Indexes are vital components in SQL databases, enabling faster data retrieval and improving query performance. This blog explores what indexes are, different indexing mechanisms, and delves into the default indexing mechanism. We will also cover various ways to create indexes, how to drop them when no longer needed, and understand the benefits of index usage through practical examples.



What is an Index?


  • An Index is a structure or object that helps retrieve specific rows or data faster by providing a shortcut to locate information efficiently.


  • Indexes are data structures built and assigned to an existing table. They analyze and summarize the table's data, creating paths for quicker access


Why Use Indexes?


Indexes significantly improve the performance of SELECT queries, especially those involving WHERE clauses, JOIN operations, and ORDER BY statements. However, there is a trade-off: while indexes speed up read operations, they can slow down INSERT, UPDATE, and DELETE operations because the indexes must be updated as well.


Role of Indexes in Databases


In SQL, when queries involve searching, sorting, or filtering large datasets, indexes serve as essential tools for optimization. Without indexes, a database engine must scan the entire table (a full table scan) to find the requested data, which is time-consuming as the dataset grows.


With an index, the database engine uses the pre-constructed data structure to locate rows faster.



Types of Indexing Mechanisms in SQL Server


There are several types of indexing mechanisms (PostgreSQL and MySQL), each suited to different use cases:


  • B-Tree Indexes: The most common type of index, which organizes data in a balanced tree structure to optimize search and range queries.


  • Hash Indexes: Primarily used for exact match queries, these indexes use a hash function to map values to a location in memory.


  • GiST (Generalized Search Tree): Used for indexing more complex data types, such as geometric or full-text data, allowing flexibility in data retrieval.


  • SP-GiST (Space-Partitioned GiST): Similar to GiST but optimized for partitioned data, typically used for non-balanced data types like spatial data.


  • GIN (Generalized Inverted Index): Useful for indexing composite data types like arrays or JSON, allowing for full-text searches and partial matching.


  • BRIN (Block Range INdexes): Efficient for large datasets with sequential data, as they store metadata for ranges of rows rather than individual rows, optimizing scan times on large tables.



Understanding B-tree Indexing: The Default Index Type in PL/SQL


In database systems like PL/SQL, the B-tree is the default indexing mechanism. B-trees provide an efficient way to organize and retrieve data, especially when working with large datasets.


What is a B-tree?


A B-tree (short for Balanced Tree) is a special type of tree structure that databases use to store and retrieve data quickly. It keeps data sorted and ensures that the tree remains balanced as data is inserted, searched for, or deleted. This balanced structure ensures that even as the amount of data grows, operations like searching for a particular row or inserting new data remain efficient.


How Does a B-tree Work?


At its core, a B-tree organizes data into small groups (or nodes) that are connected like branches of a tree. Each node can hold multiple values, and those values are always kept in sorted order. These nodes are arranged in levels, starting from the topmost node (the root) and branching down to other nodes (called children).


Searching in a B-tree

When you search for a value in a B-tree, you start at the root and work your way down through the nodes, following pointers to the correct child node based on comparisons. Since each node is sorted, you can decide which path to take at each level without checking all values. This means you can find what you’re looking for faster than if you searched through everything sequentially.


Inserting into a B-tree

When you insert a new value into a B-tree, it finds the correct node where the value should go and inserts it in sorted order. However, there’s a limit to how many values can fit into a single node. If a node gets too full after inserting a new value, the tree automatically splits the node to keep everything balanced.


The Concept of Splitting in a B-tree


A node can only hold a fixed number of values (determined by the B-tree's order). If a new value is added and the node already holds the maximum number of values, the B-tree decides it’s "too full" and performs a split.


Here’s how the split works:


  1. The middle value of the full node is moved up to the parent node.

  2. The values smaller than the middle value are placed in a new node on the left.

  3. The values larger than the middle value are placed in a new node on the right.


This splitting process ensures that the B-tree remains balanced, preventing it from becoming too deep or unbalanced, which would slow down data retrieval.


Example: How a B-tree Splits


Let’s use a simple example with numbers to explain how a B-tree splits.


Assume we have a B-tree that can hold 3 values per node.


  1. Insert 10 – The tree is empty, so 10 becomes the first value in the root node:

    • [10]


  2. Insert 5 – Since 5 is less than 10, it is placed to the left of 10 inside the same node:

    • [5, 10]


  3. Insert 20 – Since 20 is greater than 10, it goes to the right, and the node now holds:

    • [5, 10, 20]

    • At this point, the node is full (it has reached its capacity of 3 values).


  4. Insert 15 – Now, when we try to insert 15, the node is full, so the B-tree triggers a split.

    • The middle value 10 is moved up to create a new root.

    • The values smaller than 10 (5) stay in a node on the left.

    • The values larger than 10 (15, 20) are placed in a node on the right.


Now the B-tree looks like this:

  • Root node: [10]

  • Left node: [5]

  • Right node: [15, 20]


By splitting the full node and promoting the middle value, the B-tree keeps its structure balanced, ensuring that searching for values remains efficient.



Different Ways to Create B-tree Indexes in PL/SQL


1. Single-Column Index


A single-column index is created on a single field or column in a table. This is useful when you frequently query the table using that specific column in the WHERE clause.


Syntax:

CREATE INDEX index_name ON table_name (column_name);

Example:

CREATE INDEX customer_name_index ON Customers(name);

In this case, an index is created on the name column of the Customers table, which speeds up queries that filter by the customer's name.



2. Multicolumn Index (Composite Index)


A composite index is created on multiple columns. This index is beneficial when queries involve conditions on more than one column.


Syntax:

CREATE INDEX index_name ON table_name (column1_name, column2_name);

Example:

CREATE INDEX customer_name_email_index ON Customers(name, email);

This index will optimize queries that filter by both the name and email columns.



3. Unique Index


A unique index ensures that the values in the indexed column(s) are unique. This is useful for fields like email addresses or usernames, where duplicates are not allowed.


Syntax:

CREATE UNIQUE INDEX index_name ON table_name (column_name);

Example:

CREATE UNIQUE INDEX customer_email_index ON Customers(email);

This index will prevent duplicate email addresses in the Customers table.



4. Partial Index


A partial index is created based on a condition. It indexes only the rows that meet a specific condition, which is useful when a large portion of the data doesn't need indexing.


Syntax:

CREATE INDEX index_name ON table_name (column_name) WHERE condition;

Example:

CREATE INDEX active_customers_index ON Customers(customer_id) WHERE status = 'active';

This partial index will only index rows where the customer status is 'active,' optimizing queries that search for active customers.



5. Expression Index


An expression index is created based on the result of an expression rather than just the values of a specific column. This type of index can be useful for optimizing queries that use calculated values or functions.


Syntax:

CREATE INDEX index_name ON table_name ((expression));

Example:

CREATE INDEX idx_lower_name ON Customers (LOWER(name));

This expression index will index the lowercase version of the names in the Customers table, allowing for faster searches when querying with case-insensitive comparisons, such as:


SELECT * FROM Customers WHERE LOWER(name) = 'john doe';

By using an expression index, the database can quickly locate matching entries without needing to convert each name to lowercase during the search.



6.Clustered Index


By default, all indexes are non-clustered, meaning they do not change the physical order of the data in the table; instead, they retain the order of the table itself. In contrast, clustered indexes sort the data according to the indexed column.


You can create a clustered index using the CLUSTER command, which sorts the data physically in the table based on the indexed column(s).


Syntax:

CREATE INDEX index_name ON table_name (column_name); 
CLUSTER table_name USING index_name;

Example:

CREATE INDEX idx_customer_name ON Customers (name); 
CLUSTER Customers USING idx_customer_name;


Practical Example:

Creating an Index , Analyzing and Understanding its Importance


Step 1: Create the Table


CREATE TABLE Products ( 
product_id INT PRIMARY KEY, 
product_name VARCHAR(100), 
price DECIMAL(10, 2), 
category VARCHAR(50) );

Step 2: Insert Sample Data


You can insert some sample data into the Products table for testing purposes. Here’s how to insert multiple rows:

INSERT INTO Products (product_id, product_name, price, category) VALUES (1, 'Laptop', 999.99, 'Electronics'), 
(2, 'Smartphone', 699.99, 'Electronics'), 
(3, 'Tablet', 299.99, 'Electronics'), 
(4, 'Chair', 89.99, 'Furniture'), 
(5, 'Table', 199.99, 'Furniture'), 
(6, 'Desk', 399.99, 'Furniture'), 
(7, 'Headphones', 49.99, 'Electronics'), 
(8, 'Monitor', 249.99, 'Electronics'), 
(9, 'Printer', 149.99, 'Electronics'), 
(10, 'Mouse', 19.99, 'Electronics');

Step 3: Analyze Query Performance (Before Indexing)


You can use EXPLAIN ANALYZE to understand the performance of a sample query. For example, let’s analyze the time taken to fetch all products with a specific category:


EXPLAIN ANALYZE SELECT * FROM Products WHERE category = 'Electronics';

This command will return the execution plan of the query along with the execution time. Note down the time taken for this query.


Step 4: Create an Index


Now, let’s create an index on the category column to optimize queries filtering by category:


CREATE INDEX idx_category ON Products (category);

Step 5: Analyze Query Performance (After Indexing)


Run the same EXPLAIN ANALYZE command to see how the performance changes after creating the index:


EXPLAIN ANALYZE SELECT * FROM Products WHERE category = 'Electronics';

Step 6: Compare Results


After executing the above commands, you should compare the execution times before and after creating the index.


  • Before Indexing: The execution plan will likely show a sequential scan of the Products table, which could be slower, especially with larger datasets.

  • After Indexing: The execution plan should show that it uses the index for faster lookups, which should result in reduced execution time.


Example Output Analysis


  • Before Indexing:

    • Example output might show a Seq Scan (sequential scan) on the Products table.

    • Execution time could be, say, 15 ms.

  • After Indexing:

    • Example output might show an Index Scan on idx_category.

    • Execution time could reduce to, say, 5 ms.


This difference highlights how indexing improves query performance by reducing the amount of data that needs to be scanned, particularly for filtering operations.


Dropping an Index


If an index is no longer needed, it can be dropped using the following command:


Syntax:

DROP INDEX index_name;

Example:

DROP INDEX idx_product_name;

This command removes the index on the product_name column from the Products table.



Conditional Index Drop


You can also drop an index with a condition if it applies to certain criteria:


DROP INDEX IF EXISTS idx_product_name;

This ensures that the index is dropped only if it exists.



Important Points to Keep in Mind for Indexing


  • Balancing Speed vs. Space: While indexes can significantly improve query performance, they require additional storage and can slow down write operations.


  • Limit the Number of Indexes: Too many indexes can negatively impact performance, especially during data modification (INSERT, UPDATE, DELETE).


  • Choose Columns Carefully: Create indexes on columns that are frequently queried or used in JOIN and WHERE clauses.


  • Monitor Index Usage: Regularly check if indexes are being used and maintain them by rebuilding or reorganizing them to avoid fragmentation.


Summarizing, Here is a table for different indexing mechanisms in PostgreSQL and MySQL:

Indexing Mechanism

Syntax

Example

Meaning

B-tree Index

CREATE INDEX index_name ON table_name (column_name);

CREATE INDEX idx_customer_name ON Customers (name);

The default index mechanism. B-tree indexes are used for efficient searching and sorting, especially with comparison operators like =, <, >, BETWEEN.

Hash Index

CREATE INDEX index_name ON table_name USING HASH (column_name);

CREATE INDEX idx_customer_hash ON Customers USING HASH (customer_id);

A hash index is useful for exact matches using the = operator but does not support range queries or ordering efficiently.

GIN (Generalized Inverted Index)

CREATE INDEX index_name ON table_name USING GIN (column_name);

CREATE INDEX idx_fulltext_search ON Articles USING GIN (to_tsvector('english', content));

GIN indexes are used for full-text searches, particularly when dealing with documents or large text fields where tokenized text needs to be indexed for fast lookups.

GiST (Generalized Search Tree)

CREATE INDEX index_name ON table_name USING GiST (column_name);

CREATE INDEX idx_geom_gist ON SpatialData USING GiST (geometry_column);

GiST indexes are used for spatial data types, range queries, and for finding closest matches. Ideal for geometric data like points, lines, and polygons.

SP-GiST (Space-Partitioned Generalized Search Tree)

CREATE INDEX index_name ON table_name USING SPGIST (column_name);

CREATE INDEX idx_spgist_geom ON SpatialData USING SPGIST (point_column);

SP-GiST indexes are used for partitioned search trees, optimized for non-balanced data like unstructured geometric data, text, or hierarchies.

BRIN (Block Range INdex)

CREATE INDEX index_name ON table_name USING BRIN (column_name);

CREATE INDEX idx_brin_time ON Orders USING BRIN (order_time);

BRIN indexes are optimized for large tables where data is stored in sequential blocks. Useful for columns with naturally ordered data, like timestamps.

Each of these indexing mechanisms is optimized for different types of queries and data structures, making them useful in specific scenarios such as text search, spatial data, or large sequential datasets.


Here's the summary of the index creation ways,

Index Creation Way

Syntax

Example

Meaning

Single-Column Index

CREATE INDEX index_name ON table_name (column_name);

CREATE INDEX idx_customer_name ON Customers (name);

An index created on a single column, speeding up searches and queries that filter by that specific column.

Multicolumn Index

CREATE INDEX index_name ON table_name (column1_name, column2_name);

CREATE INDEX idx_customer_fullname ON Customers (first_name, last_name);

An index created on multiple columns, which is useful for queries that filter by more than one column.

Unique Index

CREATE UNIQUE INDEX index_name ON table_name (column_name);

CREATE UNIQUE INDEX idx_unique_email ON Customers (email);

Ensures that all values in the indexed column are unique, preventing duplicate entries.

Partial Index

CREATE INDEX index_name ON table_name (column_name) WHERE condition;

CREATE INDEX active_customers_index ON Customers (customer_id) WHERE status = 'active';

An index that only includes rows meeting a specific condition, reducing index size and improving performance when only a subset of rows is frequently queried.

Expression Index

CREATE INDEX index_name ON table_name ((expression));

CREATE INDEX idx_lower_name ON Customers (LOWER(name));

An index based on the result of an expression rather than just column values, allowing for optimized queries using the expression.

Clustered Index

CREATE INDEX index_name ON table_name (column_name); CLUSTER table_name USING index_name;

CREATE INDEX idx_customer_name ON Customers (name); CLUSTER Customers USING idx_customer_name;

An index that determines the physical order of data rows in the table based on the indexed column, improving performance for range queries.


Indexes play a crucial role in database optimization, improving data retrieval speed and ensuring efficient query performance. Whether you're using single-column, composite, unique, or partial indexes, understanding how and when to use them can significantly enhance the performance of your SQL operations. Remember to monitor index usage, drop unnecessary indexes, and strike a balance between faster read operations and potential overhead on data modification operations.


By following best practices and applying the knowledge shared in this guide, you'll be able to use indexes to their full potential and keep your SQL databases running smoothly.


Happy Indexing!

47 views
bottom of page