Sep 256 min read

Mastering the Art of Sankey Charts: Creating Smooth Transitions with Rank and Sigmoid Functions

Sankey charts are a powerful tool for visualizing flows and relationships between different nodes, providing insights into how data moves from one point to another. A crucial aspect of creating effective Sankey charts is generating smooth curves between source and target nodes. This is achieved through calculated fields that leverage the concepts of Rank and Sigmoid functions. In this article, we will explore the components involved in creating these smooth transitions, emphasizing the importance of using cumulative values and percentages.

Why is Unioning Tables a Mandatory Step for Sankey Charts?

Unioning tables is essential in Sankey charts because it combines two sets of related data—such as source and target nodes—into a single dataset. This allows us to visualize the flow of values between these nodes effectively. Without unioning, the Sankey chart wouldn’t have a complete structure to visualize the relationship between data points.

Example: Sales Between Regions and Segments

Let's say we have two tables: one representing Regions and another representing Segments. The Region table contains sales data by geographical regions (e.g., North, South, East, West), while the Segment table contains sales by customer segments (e.g., Consumer, Corporate, Home Office).

Region Table:

Region	Sales
North	1000
South	1500
East	800
West	1200

Segment Table:

Segment	Sales
Consumer	2000
Corporate	1000
Home Office	1500

Without unioning these tables, you would have separate datasets, making it impossible to visualize the total sales flow between regions and segments in one Sankey chart. However, by unioning the Region and Segment tables, we create a combined dataset that tracks the flow of sales from one region to each customer segment.

Unioned Table:

Region/Segment	Sales
North	1000
South	1500
East	800
West	1200
Consumer	2000
Corporate	1000
Home Office	1500

This unioned dataset allows us to visualize the flow of sales from Regions to Segments in a single Sankey chart. For instance, the chart could show how much of the total sales in the North region went to the Consumer segment, thereby illustrating a complete flow of data.

By unioning the tables, we ensure that the flow of data between Regions and Segments is displayed in a continuous and accurate manner in the Sankey chart. Without it, the relationships between the regions and segments would be disjointed and unclear.

Key Components of Smooth Curve Generation

To create smooth curves in Sankey charts, several calculated fields play distinct roles. The following table outlines each field's purpose, contribution, and its position in the graph.

Field	Purpose/Contribution	X-Axis or Y-Axis	Explanation
ToPad	Generates additional data points by multiplying the original dataset, ensuring smoother curves.	-	Helps in creating extra rows to increase the granularity of data points between the source and target, facilitating smooth transitions.
Bins	Applied to the data point values to segment the data for smoother curve plotting.	X-Axis (indirectly)	Segments the dataset into smaller intervals, helping to control the number of intermediate points and creating a smoother curve.
T	A calculated field that controls the smooth horizontal transition from source to target.	X-Axis	Defines the x-axis values for smooth movement from source to target, typically a sequential range like -6 to 6 for mapping.
Rank1	Defines the rank of the starting node (source) based on a dimension like People or Orders.	Y-Axis	Positions the start of the flow on the y-axis based on the rank of the source node.
Rank2	Defines the rank of the destination node (target) based on a dimension like People or Orders.	Y-Axis	Positions the end of the flow on the y-axis based on the rank of the target node.
Sigmoid	Applies the sigmoid function 11+e−T\frac{1}{1 + e^{-T}}1+e−T1 to create the smooth "S"-shaped curve.	Y-Axis (curve shape)	Converts linear T values into smooth curves, enabling the flow lines to transition smoothly from source to target.
Curve	Combines T, Sigmoid, Rank1, and Rank2 to define the exact curve path between the nodes.	Y-Axis (typically)	Defines the vertical path of the flow between source and target, creating the "S" shape.
Size	Determines the width (thickness) of the flow, representing a KPI (e.g., Sales, Orders, etc.).	Width of the flow	Controls the thickness of the flow between nodes, indicating the relative size of the connection. Larger flows signify stronger relationships.

The Importance of Using Cumulative Values and Percentages

Using cumulative values for Rank1 and Rank2 allows for a more intuitive representation of how each node contributes to the overall flow. When combined with percentages, these ranks provide a clearer context for understanding the data's significance.

Example of Percentage Usage

Consider a scenario where Node A generates 100 units, Node B generates 200 units, and Node C generates 700 units. Instead of displaying these absolute values, presenting them as percentages of the total (1000 units) offers better insight:

Node A:
- Value: 100 units
- Percentage: 100/1000×100=10%
Node B:
- Value: 200 units
- Percentage: 200/1000×100=20%
Node C:
- Value: 700 units
- Percentage: 700/1000×100=70%

Using percentages facilitates relative comparisons, making it easier to identify trends and contributions across different nodes in the Sankey chart.

Utilizing Rank1, Rank2, and Sigmoid

In a Sankey chart, Rank1, Rank2, and Sigmoid can be utilized independently on the Y-axis, but they are typically combined to generate smooth curves between the source and target nodes. Let’s break down their separate uses:

Rank1: This can be represented alone on the Y-axis to show the starting position of the source node.
Rank2: Similarly, Rank2 can indicate the position of the target node on the Y-axis.
Sigmoid: This function can create a smooth transition on its own on the Y-axis, resulting in a smooth "S-shaped" flow without considering the ranks.

While individual usage of Rank1, Rank2, and Sigmoid can achieve basic functionality, they do not create the visually appealing smooth curve that defines effective Sankey charts. This is why they are often combined into a calculated field to achieve that seamless flow.

Understanding the Curve Formula in a Sankey Chart

The formula used to create smooth transitions in a Sankey chart is:

Curve=[Rank1]+([Rank2]−[Rank1])×[Sigmoid]

Breaking Down the Formula

Rank1:Represents the starting position of the source node on the Y-axis. For example, if Rank1 equals 2, the flow begins at position 2.
Sigmoid: The Sigmoid function smoothens the transition between the source (Rank1) and the target (Rank2). It maps the T value (which typically ranges from -6 to +6) to a value between 0 and 1. When T = -6, the curve starts at Rank1 (closer to the source), and as T approaches +6, it transitions smoothly toward Rank2. The Sigmoid formula is:

Sigmoid (T) = 1/(1+e^−T)

This function helps create the smooth "S" curve between the two nodes.
Combining Them: The final formula adjusts the Y-axis position of the curve based on the progression of T. Initially, the curve starts at Rank1 (when the Sigmoid value is near 0). As T increases, the curve gradually moves toward Rank2. The formula used to combine Rank1, Rank2, and Sigmoid is:

Curve= Rank1 + (Rank2−Rank1) × Sigmoid (T)
- Rank1 defines where the flow starts on the Y-axis.
- (Rank2 - Rank1) determines how far the curve needs to move from the source to the target.
- Sigmoid (T) controls how smoothly the curve transitions from Rank1 to Rank2, based on the value of T.

Example:

Let’s say:

Rank1 = 2
Rank2 = 5
We evaluate Sigmoid(T) at T = 0 (which gives a value of 0.5)

Using the formula:

Curve = 2 + ( 5 − 2 ) × Sigmoid (0) = 2 + 3 × 0.5 = 2 + 1.5 = 3.5

At T = 0, the curve is halfway between Rank1 and Rank2, at position 3.5 on the Y-axis, representing the smooth transition between source and target. The Sigmoid function ensures this curve moves gradually from Rank1 to Rank2 as T progresses from -6 to +6.

The combination of Rank1, Rank2, and the Sigmoid function is essential for crafting the smooth curves characteristic of Sankey charts. The formula ensures a fluid transition from source to target, producing visually appealing flows that effectively illustrate relationships and transitions within the data. By understanding the contributions of each component and utilizing cumulative percentages, you can create Sankey charts that not only inform but also captivate your audience.

This understanding will not only enhance your Sankey chart creation skills but also improve the clarity and impact of the insights you communicate through your visualizations. If you're interested in learning more about Sankey charts, I invite you to explore my other blogs. They provide practical insights and examples that can help you deepen your understanding and create your own beautiful visualizations!