top of page
Swapna Lakhpati

SQL for Data Analysis-(Filtering Data)Part 3

Filtering Data :

As we all know when we get any data for analysis, its huge and we need to filter data according to our requirements. One of the most powerful feature of using SQL is the ability to rapidly filter data to match certain criteria you are searching for. The whole process of Analysis is based on filtering data.

let's see all possible ways to filter data. It includes Distinct, limit, fetch first row, last row, and many more.

Even select, if we select only 2 column names from the table, the data is filtered according to these two columns only.

1. WHERE

WHERE clause is used to specify a condition while fetching the data from single table or joining with multiple tables.

Using the WHERE filters with a query, allows us to only return results that match a set condition. We can use this together with conditional operators like =, >, <, >=, <=, etc.

If the given condition is satisfied, only then it returns specific value from the table. You can filter out rows that you do not want included in the result-set by using the WHERE clause.


select title,rating

from public.film

where rating = 'PG-13';

AND and OR Conjunctive Operators

The PostgreSQL AND and OR operators are used to combine multiple conditions to narrow down selected data in a PostgreSQL statement. These two operators are called conjunctive operators.

2. AND

We can combine two or more conditions into a single query by using the AND command. All of the conditions must be met for the result to be returned.

The AND operator allows the existence of multiple conditions in a PostgreSQL statement's WHERE clause. While using AND operator, complete condition will be assumed true when all the conditions are true. For example [condition1] AND [condition2] will be true only when both condition1 and condition2 are true.

If we want to see a movie which is PG 13 and it is not greater than 100 min. we can run this query to find the movie list.


select title,rating,length

from public.film

where rating = 'PG-13' AND length < 100

order by length desc

limit 10;


Just by doing order by we can see the shortest or longest PG 13 movie.


select title,rating, length

from public.film

where rating = 'PG-13' AND length < 100

order by length

limit 10;


3. OR

The OR operator is also used to combine multiple conditions in SQL statement's WHERE clause. While using OR operator, complete condition will be assumed true when at least any of the conditions is true. For example [condition1] OR [condition2] will be true if either condition1 or condition2 is true.

You can combine N number of conditions using OR operator. For an action to be taken by the PostgreSQL statement, whether it be a transaction or query, only any ONE of the conditions separated by the OR must be TRUE.

If we want to see a movie whose rental rate < 3 or length > 90 min or whose rating is 'PG' .If either of the condition is satisfied it will return the desired outcome. As we reduce the condition in the form of OR clause, the results will be less.(Number of movies)


SELECT title

from public.film

WHERE rental_rate < 3 or rating ='PG' or length >90;

with these 3 or conditions we are getting 916 options .

As we reduce one of the OR condition , the options will be reduced.

There are total 732 options available.

4. BETWEEN

We can specify a range using the BETWEEN filter. For example creating a length range for listings as follows.

expression BETWEEN low AND high;


SELECT title

from public.film

WHERE length BETWEEN 30 and 60;


5. LIKE

We can search for patterns within our data using the LIKE command.

The PostgreSQL Like condition is used to fetch data from a table where the defined condition satisfies the LIKE condition. The result contains strings, which are case-sensitive and follow the specified pattern.

In other words, we can also say that the Like condition is used to perform pattern matching for identifying the exact outcome.

To serve the purpose of patterns matching, the PostgreSQL provides two different wildcard characters, which are as follows:

  • Percent (%) wildcard: To match any sequence of zero or more characters.

  • Underscore (_) wildcard: To match any single character.

Other operators for LIKE:

  • %x — will select all values that ends with x

  • %x% — will select all values that include x

  • x% — will select all values that start with x

  • x%y — will select all values that begin with x and end with y

  • _x% — will select all values have x as the second character

  • x% — will select all values that begin with x and are at least two characters long. You can add additional characters to extend the length requirement, i.e. x___%

Note:

  • If we do not want to specify the escape_character, PostgreSQL assumes that "\" is the escape_character.

  • The Like operator behaves like the equal "=" Operator if the pattern does not have any wildcard character.

In this example, we are looking for movies that start with the letter 'Ha ' and whatever is ahead of it.

It should start with the letter 'Ha'.

SELECT title

FROM public.film

WHERE title LIKE 'Ha%';

Movies whose title ends with 'es'.


SELECT title

FROM public.film

WHERE title LIKE '%s';


filters movies with title ' The' anywhere.

SELECT title

FROM public.film

WHERE title LIKE '%The%';


film title starts with N and ends with s


SELECT title

FROM public.film

WHERE title LIKE 'N%s';


-- _x% — will select all values have x as the second character

SELECT title

FROM public.film

WHERE title LIKE '_h%';


-- x% — will select all values that begin with x and are at least two characters long.

SELECT title

FROM public.film

WHERE title LIKE 'C%__';


SELECT title

FROM public.film

WHERE title LIKE 'Y%___';


6. IN

By using the IN command we can specify multiple values we want to select from with the WHERE command.

The SQL IN condition is used within the WHERE clause to get those data that matches any data in a list. In other words, we can say that the IN condition is used to reduce multiple OR conditions.


select title,rating

from public.film

WHERE rating in ('PG','PG-13');

Query 1

select title,rating

from public.film

WHERE rating in ('PG','PG-13')

order by film_id desc;


Query 2

select title,rating

from public.film

WHERE rating = 'PG' or rating ='PG-13'

order by film_id desc;

Query 1 and Query 2 both returns the same outcome. So in a nutshell IN clause is same as OR operator.

7. IS NULL & IS NOT NULL

IS NULL will return only rows with a NULL value, while IS NOT NULL does the opposite returning only rows without a NULL value.


select title,rating

from public.film

WHERE description IS NULL;




select first_name

from public.staff

where email is not null;


In the staff table, there is no null in an email column.

I tried to cover the majority of filtering functions.

Thank you ...

32 views

Recent Posts

See All
bottom of page