This article talks about the multiple ways (Relationships, Joins, and Blends) provided by Tableau to combine data from source(s) depending on the use case.
Source: www.unsplash.com
In today's world, everything is driven by data and it has become imperative to have a better understanding of the vast amount of data available to us. Businesses at every level need to understand data and its impact. This is why data visualization is one of the most important tools today.
Tableau is an industry-leading data analytics and visualization tool. Tableau ensures the responsible use of data and drives better business outcomes with fully-integrated data management and governance, visual analytics and data storytelling, and collaboration. Tableau is easy to learn and one of the many reasons for its success. Mastering Tableau gives an added advantage to any person or business looking to scale up.
To begin with visualization, we need to first connect to the data source. The data source can be an MS Excel file, text file, CSV file, or any database like Amazon Aurora, Redshift, Google Cloud, MariaDB, IBM DB2, Oracle, etc. Once the connection to the data source is established, Tableau provides several ways to combine the data and use them for visualizations. In this blog, we will be looking at each of them one by one.
Relationships are a dynamic, flexible way to combine data from multiple tables for analysis. A relationship helps to relate two or more tables based on common fields without merging the tables. Thus, the tables remain separate and no data is lost. Today, relationships are the default method and can be used in most instances, including across tables with different levels of detail. The following image shows how two tables (Orders and Returns) from the Superstore sample are related using relationships.
There are many advantages of using relationships:
There is no need to configure join types between tables. The user needs to only select the fields based on which relationship is to be established.
The tables remain separate and are not merged into a single table.
Although relationships use joins, Tableau automatically selects the join types based on the fields being used in the visualization.
Tables at different levels of detail are supported in a single data source.
There is no accidental loss of data since unmatched measure values are not dropped.
Data duplication and filtering issues are avoided that sometimes result from joins.
There are certain limitations to the usage of relationships:
You can't create relationships between tables from published data sources.
Dirty data in tables can increase the complexity of multi-table analysis.
Tableau's ability to do join culling in the data will be limited if data source filters are used.
Since a publishedrelationship is based on common fields in the tables, the presence of a lot of unmatched values will deter the analysis.
Joins are a more static way to combine data. Joins are created between physical tables in the physical layer of the data source. Joins must be defined between physical tables before analysis. Joins merge data from two tables into a single table. This can lead to duplicate data or loss of data from one or both tables. The below diagram shows the inner join between two tables Orders and Returns from the Superstore sample.
Even though Tableau recommends the use of relationships, joins may be required under the following scenarios:
Use of a data source from a pre-2020.2 version of Tableau
Use of a specific join type
Use a data model that supports shared dimensions
Limitations of using joins:
Joins cannot be changed without impacting all sheets using that data source.
Joins cannot be used in a published data source.
Loss of data or data duplication can occur if the data model is not well constructed.
Data blending is used to combine data from multiple sources. In data blending, there is a primary data source and a secondary data source. Data from a secondary source is displayed along with the primary source data. Data blending is used when the linking fields need to vary on a sheet-by-sheet basis for different visualizations. Also, data blending is useful for combining data from published data sources. The below diagram shows the blend between the two data sources.
Blends do not combine the data directly. Instead, each data source is queried independently, results are aggregated to the right level and then the results are presented in the view. Hence, blends can handle different levels of detail. Data blending acts like a traditional left join. The main difference between the two is when the aggregation is performed. A blend aggregates data and then combines whereas a join combines data and then aggregates.
The limitations of data blending are:
Data blending may result in some missing data from the secondary data source.
Blended data sources cannot be published as a single unit.
Data from secondary sources must be aggregated in calculations.
Non-additive aggregates such as COUNTD, MEDIAN, and RAWSQLAGG will have limited usage in data blending.
Let me summarize the 3 ways to combine data tables in Tableau:
Relationships | Joins | Blend |
Used when combining data from different levels of detail | Used to add more columns of data across the same row structure | Used when combining data from different levels of detail |
Requires matching fields between two logical tables | Requires common fields between two physical tables | Requires fields to be linked in the primary and secondary data source |
Automatically uses correct aggregations and contextual joins based on the fields relations and usage in the visualization | Requires establishing a join type and join clause | Once fields are linked, Tableau automatically aggregates data from the secondary source |
Cannot be used to relate published data source | Cannot be used to join published data source | Can be used with published data source |
The related tables are separate and hence no data loss occurs | The joined tables are merged into one table increasing the chances of data loss or duplication. | The blend mimics a left join and hence there can be some data loss from the secondary data source. |
Conclusion: This article talks about the multiple ways (Relationships, Joins, and Blends) provided by Tableau to combine data from source(s) depending on the use case. current use case to select a relationship, join or blend. Tableau recommends relationships as it is the most flexible and has auto functionalities by which Tableau can change the join types based on the fields used in the respective visualization worksheet. Joins have limited usage since most use cases will be covered by relationships. Data blending is usually done when the relationship needs to change from sheet to sheet. Whatever method you choose, Tableau ensures that you are equipped with the right tools to get the right results.