top of page
Writer's pictureSheba Alice Prathab

Understanding and Creating a Sankey Chart in Tableau: A Comprehensive Guide - I


In today’s data-driven world, visualizing complex relationships is more crucial than ever. Did you know that over 80% of decision-makers find data visualization essential for understanding and communicating insights effectively? Among the myriad of visualization techniques, Sankey charts stand out for their unique ability to illustrate the flow of information or resources between categories.


A Sankey chart is a type of flow diagram that visually represents the flow of data between a source dimension and a target dimension. In this chart, the width of the curves is proportional to the quantity or flow rate being represented, allowing viewers to easily understand the relationships and transitions between different categories. Sankey charts are particularly effective for illustrating how values move between groups, such as sales moving from regions to customer segments or resources flowing between different processes.


This guide will show you how to build a Sankey chart in Tableau, covering the creation of essential calculated fields like ToPad, T, Rank1, Rank2, Size, Curve, and the Sigmoid function for smoother curves.



Step 1: Data Union as the Primary Step for a Sankey Chart


Unioning tables is crucial to creating a Sankey chart because it allows us to duplicate data, ensuring that we have enough points for smooth curves between categories.


Scenario Setup


We have three tables in the data source:

  • Parent Table: Orders

  • Child Table 1: People

  • Child Table 2: Returns


The Parent Table (Orders) forms relationships with Child Table 1 (People) and Child Table 2 (Returns). We’ll union the Parent Table (Orders) to itself to generate duplicates that can be used in the Sankey chart.


How to Union Tables in Tableau


Data Source Tab: Drag the Orders table onto the canvas.



Establish Relationships: Before unioning, create necessary relationships with related tables (like People and Returns).



Unioning: Right-click on the Orders table (the base table) and click "Convert to Union".



Drag the Orders table again to the Union Window that appears.



Click "OK" and now You should see the table name listed below.




Remember, only the base table should be unioned, not the related tables.


This method ensures cleaner data management and maintains the integrity of your visualizations!


Step 2: Create Calculated Fields for the Sankey Chart


Once the union is done, you will need to create several calculated fields to build the Sankey chart. Each calculated field serves a specific role in generating the bar charts and smooth curves.


1. ToPad Field


When creating Sankey charts in Tableau, the ToPad field plays a critical role in controlling the smoothness of the flow between different nodes (categories or dimensions).


Purpose: The ToPad field helps to create smooth curves between categories or dimensions in Sankey charts by duplicating data points. These duplicated points ensure that transitions between the different nodes in the chart are visually appealing and fluid.


Example Calculated Field for ToPad field (considering the table name - Orders) :


IF [Table Name] = "Orders" THEN 1 ELSE 49 END


  • When the data comes from the "Orders" table, the ToPad value is set to 1, meaning only one data point is created for each record.

  • For other tables, the ToPad value is set to 49, meaning 49 data points are created for each record.





You can customize the ToPad value depending on your dataset’s needs. Instead of using a fixed value like 49, you can base the ToPad value on business logic, such as regions or sales thresholds.


For example, you can modify the formula like this:


IF [Region] = "East" THEN 60 ELSEIF [Region] = "West" THEN 40 ELSE 20 END


  • East region: Use 60 points for smoother curves because this region is more significant.

  • West region: Use 40 points to create a moderately smooth flow.

  • Other regions: Use 20 points for less important regions, where the smoothness is less critical.



2. Creating Bins for ToPad


Creating a Bin:


After defining the ToPad field, the next step is to create a bin to manage how data points are grouped.


  • Right-click on the ToPad field and select Create > Bins.

  • Set the bin size to 1.




Impact of Bin Size:


  • Bin Size of 1: This allows each of the 49 duplicated points for each segment to be visualized individually, providing maximum granularity.

  • Larger Bin Sizes (e.g., 5 or 10): Larger bins reduce the granularity, grouping the data into broader intervals. While this may simplify the visualization, it can also obscure finer details of the flow.


Setting the bin size to 1 is generally recommended for Sankey charts, as it ensures each of the duplicated points from ToPad is clearly represented.



3. The T Variable


Purpose: The T variable is crucial for normalizing data points generated by the ToPad field, ensuring that values are mapped within a range of 0 to 1. This normalization is essential for creating smooth curves in the Sankey chart and accurately visualizing relationships between categories.


Default Calculated Field Value:


(INDEX()−25)/4


  • INDEX() generates a sequential number for each row within the partition, starting from 1.

  • Subtracting 25 adjusts the starting point of the index, shifting the range of T.

  • Dividing by 4 scales the result, normalizing it within a specific range.


For example, if the INDEX() values range from 1 to 50, applying this formula yields T values from approximately -6 to 6.



You can customize the T values as needed by adjusting the normalization process or applying dynamic factors based on specific conditions in your dataset.



4. The Rank1 and Rank2 Fields:


Purpose: In Sankey charts, rank is used to order the flow between two dimensions based on a measure, such as sales. Ranking is essential to ensure that the flow between the source and target dimensions is visually meaningful and accurately represents the data. In a Sankey chart, two ranks are needed—one for the source dimension and one for the target dimension—because we are visualizing the transition between two distinct entities.


These two ranks are necessary because Sankey charts show the flow from one group (source) to another (target), and without ranking both sides, the flow may not properly align, making the visualization unclear.


Example Calculated Field for Ranks:


RANK1 : RUNNING_SUM(SUM([Sales]))/TOTAL(SUM([Sales]))

RANK2  : RUNNING_SUM(SUM([Sales]))/TOTAL(SUM([Sales]))


  • Rank1 orders the source dimension (Region), ensuring that regions with the highest cumulative total of sales (as a percentage of the overall total sales) appear at the top of the chart.

  • Rank2 orders the target dimension (Segment), ensuring that segments with the highest cumulative total of sales (as a percentage of the overall total sales) appear at the top of their group.




We can customize the rank based on our requirements by adjusting the ranking criteria, such as using different measures or applying filters, to highlight specific data points or segments that are most relevant to our analysis.


5. Sigmoid Field


Purpose: The Sigmoid Function is crucial for creating smooth transitions in visualizations like Sankey diagrams. It provides a natural flow between dimensions, making it essential for accurate data representation.


Default Calculated Field Value:

1 / (1 + EXP(1)^-[T])


  • The sigmoid function typically outputs values ranging from 0 to 1, but in this application, we adapt it to range from -0.5 to 0.5.

  • At T=0T = 0T=0: The sigmoid value is approximately -0.5.

  • At T=0.5T = 0.5T=0.5: The value is around 0, representing the midpoint.

  • At T=1T = 1T=1: The sigmoid function reaches a value of 0.5.



You can modify parameters, such as the steepness of the curve, to tailor the sigmoid characteristics to your dataset.


6. Curve Field


Purpose: The Curve Field is derived from the Sigmoid Function and is essential for smooth transitions.


Default Calculated Field Value:

[Rank1]+([Rank2]-[Rank1])*[Sigmoid]


  • The Curve Field also produces values from -0.5 to 0.5.

  • At T=0T = 0T=0: Approximately -0.5.

  • At T=0.5T = 0.5T=0.5: About 0, indicating the midpoint.

  • At T=1T = 1T=1: Reaches 0.5.



Modify parameters to fit specific visualization needs, such as the steepness of the curve.


7. SIZE Field


Purpose: The SIZE Field adjusts the width of flows in the Sankey chart, representing the magnitude of relationships between dimensions.


Example Calculated Field Value:

RUNNING_AVG(SUM([Sales]))



Adjust the formula to emphasize specific segments or apply filters, tailoring flow widths to your analysis needs.


Field

Purpose

Values (can be changed based on the requirement)

ToPad

Duplicates data points for smoother transitions.

IF [Table Name] = "Orders" THEN 1 ELSE 49 END

Bins for ToPad

Groups duplicated points for granularity.

Set bin size to 1

T Variable

Normalizes data points for smooth curve plotting.

(Index()-25)/4

Rank1

Rank source dimension based on measure.

RUNNING_SUM(SUM([Sales]))/TOTAL(SUM([Sales]))

Rank2

Rank target dimension based on measure.

RUNNING_SUM(SUM([Sales]))/TOTAL(SUM([Sales]))

Sigmoid

Creates smooth transitions between dimensions.

1/(1+EXP(1)^-[T])

Curve

Derives values for natural flow between dimensions.

[Rank1]+([Rank2]-[Rank1])*[Sigmoid]

Size

Adjusts the width of the flow

RUNNING_AVG(SUM([Sales]))


You can use different names for the calculated fields as long as they are clearly defined and understood in the context of your analysis. Consistent naming helps maintain clarity, especially when sharing your work with others. However, feel free to customize names to suit your specific project or preferences.


Before embarking on your Sankey chart visualization, ensure that all necessary calculated fields—such as ToPad, T, Rank1, Rank2, Sigmoid, Curve, and SIZE—are meticulously prepared. Having these values in place streamlines the plotting process, allowing for efficient and effective visual representation of your data. By organizing your calculations ahead of time, you can create clear and impactful Sankey charts that convey meaningful insights with ease.


With a comprehensive understanding of the essential calculated fields required for creating a Sankey chart, you are now prepared to develop a basic visualization that effectively illustrates the flow between categories. In our next installment, we will explore a practical example, demonstrating how to apply these concepts using a real-world case study. Let’s embark on this journey together!

94 views

Recent Posts

See All
bottom of page