Jun 19, 20246 min read

Similar yet different concepts in postgreSQL

With my recent encounter with SQL, I understood the importance of appropriate use of functions wherever possible. It increases the code productivity, readability and optimization. We don’t have to write multiple lines of codes when the same motive can be achieved by a shorter query. Having said that, it is also very important to use correct functions according to our requirement. PostgreSQL offers a plethora of functions broadly categorized into string, math, window, aggregate functions and many more. Some of them are very similar yet a little different.

Trunc() vs Ntile() vs Width_Bucket()

Bucketing or binning the data helps us divide a huge amount of information into specific groups. Especially when the data is numeric, then it is often easier to categorize the entire data based on distinct criteria which is defined by that numeric column. The requirement for grouping data into buckets or bins is that the column on which grouping needs to be done should be continuous.

For example, when the BMI value of a large set of individuals is mentioned then it only makes sense to use this detail when grouped into Underweight, Normal, Overweight and Obese categories. This will make even a lay person understand the analysis better rather than using the number values.

Trunc()

This function truncated any decimal number to either a whole number or a decimal number cut short to specified number of decimal places. To group the data, we have to use Group By clause here.

Select count(payment_id) as num_payment_transactions,trunc(amount) as truncated_amnt from payment group by trunc(amount);

The above query will return the number of transactions made for a particular amount where amount value has been truncated to a whole number.

Width_Bucket()

This window function creates buckets where we have to implicitly specify the boundary values. The lower bound value remains inclusive and the outer bound value is exclusive which means any value exceeding the outer bound value, will not be included in that bucket. Also, here we need to mention how many buckets we want to create.

Select count(payment_id),amount,width_bucket(amount,0.00,12,5) as bucket_num from payment group by amount order by width_bucket(amount,0.00,12,5);

This query divides the distinct values in amount column into 5 buckets and returns the number of transactions performed for each amount and what bucket it falls in. The size of buckets may or may not be equal in this case. Bucket 1 contains 4 values but bucket 2 contains only 3. Buckets 3,4 and 5 have 4 values each.

Ntile()

Ntile() divides the data into as many equal buckets as possible. This function also requires us to mention the number of buckets. It uses over() to identify the column on which buckets are to be generated.

Select count(payment_id),amount,ntile(5) over(order by amount) as ntile_bucket from payment group by amount;

This query does the same task as the one with width_bucket() but only thing is the buckets size, and values are different.

Rank() vs Dense_Rank()

Rank functions are used to designate ranks to individual rows as per a category. Both these functions use Over() clause to define a column on which the ranks are assigned. We can also use a Partition By() clause to divide the resulting data into subsets of rows.

Rank()

Rank() function assigns ranks to each row. If there are multiple rows with same rank then the next number is skipped and the number after that is given as rank to the next row.

Select film_id,title,release_year,length,rank() over(partition by release_year order by length desc) as rank_on_length from film;

Here, top 10 rows have been assigned rank 1 because all the 10 films are of same length. Hence, ranks 2-10 are skipped and the next films in line are assigned rank 11.

Dense_Rank()

This function does not skip any rank number even if 10 rows have been allocated rank 1, the next films in line will be given rank2 and so on.

Select film_id,title,release_year,length,dense_rank() over(partition by release_year order by length desc) as rank_on_length from film;

Concat() vs Concat_ws()

Concat() and Concat_ws() are string functions. They are used to join two or more strings together. Usually, to concatenate two strings we can use the concatenation operator ‘||’. But when one of the strings is null, the entire concatenation results in null. To deal with this we use Concat() and Concat_ws() functions. They accept any string type like Char, Varchar, Text. If one of the strings is null, then the functions ignore null values and return the concatenated strings.

Concat()

In this function, we have to pass the delimiter or separator as a string itself. The separator is not necessary. We may or may not specify any.

Select actor_id, concat(first_name,' ',last_name) as full_name from actor;

Concat_ws()

When we integrate two or more strings using concat_ws(), we have to specify a non-null separator.

select actor_id, concat_ws(', ',first_name,last_name) as full_name from actor;

Array_Agg() vs String_Agg()

The above aggregation functions return a list of elements we called for.

Array_Agg()

The array aggregate function reverts a set of elements in the form of an array. Generally, like any other aggregate function, array_agg() is also used with group by clause. So, the elements pertaining to one group are displayed as an array in the output.

Select title, array_agg(first_name||' '||last_name) as full_name

from film_actor inner join actor on actor.actor_id = film_actor.actor_id

inner join film on film_actor.film_id = film.film_id group by title;

String_Agg()

The only thing that string aggregate function does different from the array aggregate function is that the string_agg() returns elements as a concatenation of strings. It uses a separator in the middle of each string but not at the end.

Select title, array_agg(first_name||' '||last_name) as full_name

from film_actor inner join actor on actor.actor_id = film_actor.actor_id

inner join film on film_actor.film_id = film.film_id group by title;

Nullif() vs Coalesce()

While working with a dataset that had many null values, I learnt early that it is very important to handle nulls if we wish for better results and analysis. There is no point in displaying rows that have null values in the result pane. To deal with these null values we use nullif() and coalesce() functions wherever applicable.

Nullif()

This function works like a comparison operator. It establishes a comparison between two arguments and if they are equal it returns null otherwise it returns the first argument. This function does not literally omit the null values rather it is used in calculations that might result in error. For example, division by zero. When we divide anything by zero, it results in a divide by zero error. To rectify this, we can use nullif(). The division will be executed now without any errors but the result will still be null.

Select 2/0;

Select 2/nullif(0,0) as div_result;

Nullif() compares if 0 is equal to 0 and returns null. Division by null gives the end result as null.

Coalesce()

Coalesce() takes multiple arguments and returns the first value that is not null. If al the parameters entered in the function are null, then it ends in null value.

Select coalesce(null,null,2,4,1,null);

Substring() vs Trim() vs Split_Part()

There are times when we want to extract certain characters or even a smaller part of a string. For this purpose, we use the three functions mentioned above. But again all three have a hairline difference among them.

Substring()

As the name suggests, substring() is used to pull out a part of string from a specified start position for a specified length.

Select first_name,substring(first_name,1,3) as subname from actor;

The first names and the first 3 characters of their first names are returned through this query.

Trim()

Trim() function cuts short a string by trimming specified number of characters either from the beginning or from the end. We can mention the parameters Leading if we want something removed from the beginning, Trailing if we want something removed from the end and Both if we want something removed from both ends of the main string. If nothing is mentioned in the syntax, the function removes all the leading and trailing spaces from the source string (if it has any) by default.

Select ' Trim function in PostgreSQL ' as with_spaces,trim(' Trim function in PostgreSQL ') as without_spaces;

Split_Part()

Split_part() makes use of a delimiter to split the string and returns the substring that is present in the specified position.

Select '2-12-2024' as date,split_part('2-12-2024','-',3) as year;

Hope this article helps you learn one more thing about postgreSQL and help you write better queries.

Similar yet different concepts in postgreSQL

Recent Posts