Sankey charts are a powerful tool for visualizing flows and relationships between different nodes, providing insights into how data moves from one point to another. A crucial aspect of creating effective Sankey charts is generating smooth curves between source and target nodes. This is achieved through calculated fields that leverage the concepts of Rank and Sigmoid functions. In this article, we will explore the components involved in creating these smooth transitions, emphasizing the importance of using cumulative values and percentages.
Why is Unioning Tables a Mandatory Step for Sankey Charts?
Unioning tables is essential in Sankey charts because it combines two sets of related data—such as source and target nodes—into a single dataset. This allows us to visualize the flow of values between these nodes effectively. Without unioning, the Sankey chart wouldn’t have a complete structure to visualize the relationship between data points.
Example: Sales Between Regions and Segments
Let's say we have two tables: one representing Regions and another representing Segments. The Region table contains sales data by geographical regions (e.g., North, South, East, West), while the Segment table contains sales by customer segments (e.g., Consumer, Corporate, Home Office).
Region Table:
Region | Sales |
North | 1000 |
South | 1500 |
East | 800 |
West | 1200 |
Segment Table:
Segment | Sales |
Consumer | 2000 |
Corporate | 1000 |
Home Office | 1500 |
Without unioning these tables, you would have separate datasets, making it impossible to visualize the total sales flow between regions and segments in one Sankey chart. However, by unioning the Region and Segment tables, we create a combined dataset that tracks the flow of sales from one region to each customer segment.
Unioned Table:
Region/Segment | Sales |
North | 1000 |
South | 1500 |
East | 800 |
West | 1200 |
Consumer | 2000 |
Corporate | 1000 |
Home Office | 1500 |
This unioned dataset allows us to visualize the flow of sales from Regions to Segments in a single Sankey chart. For instance, the chart could show how much of the total sales in the North region went to the Consumer segment, thereby illustrating a complete flow of data.
By unioning the tables, we ensure that the flow of data between Regions and Segments is displayed in a continuous and accurate manner in the Sankey chart. Without it, the relationships between the regions and segments would be disjointed and unclear.
Key Components of Smooth Curve Generation
To create smooth curves in Sankey charts, several calculated fields play distinct roles. The following table outlines each field's purpose, contribution, and its position in the graph.
Field | Purpose/Contribution | X-Axis or Y-Axis | Explanation |
ToPad | Generates additional data points by multiplying the original dataset, ensuring smoother curves. | - | Helps in creating extra rows to increase the granularity of data points between the source and target, facilitating smooth transitions. |
Bins | Applied to the data point values to segment the data for smoother curve plotting. | X-Axis (indirectly) | Segments the dataset into smaller intervals, helping to control the number of intermediate points and creating a smoother curve. |
T | A calculated field that controls the smooth horizontal transition from source to target. | X-Axis | Defines the x-axis values for smooth movement from source to target, typically a sequential range like -6 to 6 for mapping. |
Rank1 | Defines the rank of the starting node (source) based on a dimension like People or Orders. | Y-Axis | Positions the start of the flow on the y-axis based on the rank of the source node. |
Rank2 | Defines the rank of the destination node (target) based on a dimension like People or Orders. | Y-Axis | Positions the end of the flow on the y-axis based on the rank of the target node. |
Sigmoid | Applies the sigmoid function 11+e−T\frac{1}{1 + e^{-T}}1+e−T1 to create the smooth "S"-shaped curve. | Y-Axis (curve shape) | Converts linear T values into smooth curves, enabling the flow lines to transition smoothly from source to target. |
Curve | Combines T, Sigmoid, Rank1, and Rank2 to define the exact curve path between the nodes. | Y-Axis (typically) | Defines the vertical path of the flow between source and target, creating the "S" shape. |
Size | Determines the width (thickness) of the flow, representing a KPI (e.g., Sales, Orders, etc.). | Width of the flow | Controls the thickness of the flow between nodes, indicating the relative size of the connection. Larger flows signify stronger relationships. |
The Importance of Using Cumulative Values and Percentages
Using cumulative values for Rank1 and Rank2 allows for a more intuitive representation of how each node contributes to the overall flow. When combined with percentages, these ranks provide a clearer context for understanding the data's significance.
Example of Percentage Usage
Consider a scenario where Node A generates 100 units, Node B generates 200 units, and Node C generates 700 units. Instead of displaying these absolute values, presenting them as percentages of the total (1000 units) offers better insight:
Node A:
Value: 100 units
Percentage: 100/1000×100=10%
Node B:
Value: 200 units
Percentage: 200/1000×100=20%
Node C:
Value: 700 units
Percentage: 700/1000×100=70%
Using percentages facilitates relative comparisons, making it easier to identify trends and contributions across different nodes in the Sankey chart.
Utilizing Rank1, Rank2, and Sigmoid
In a Sankey chart, Rank1, Rank2, and Sigmoid can be utilized independently on the Y-axis, but they are typically combined to generate smooth curves between the source and target nodes. Let’s break down their separate uses:
Rank1: This can be represented alone on the Y-axis to show the starting position of the source node.
Rank2: Similarly, Rank2 can indicate the position of the target node on the Y-axis.
Sigmoid: This function can create a smooth transition on its own on the Y-axis, resulting in a smooth "S-shaped" flow without considering the ranks.
While individual usage of Rank1, Rank2, and Sigmoid can achieve basic functionality, they do not create the visually appealing smooth curve that defines effective Sankey charts. This is why they are often combined into a calculated field to achieve that seamless flow.
Understanding the Curve Formula in a Sankey Chart
The formula used to create smooth transitions in a Sankey chart is:
Curve=[Rank1]+([Rank2]−[Rank1])×[Sigmoid]
Breaking Down the Formula
Rank1:Represents the starting position of the source node on the Y-axis. For example, if Rank1 equals 2, the flow begins at position 2.
Sigmoid: The Sigmoid function smoothens the transition between the source (Rank1) and the target (Rank2). It maps the T value (which typically ranges from -6 to +6) to a value between 0 and 1. When T = -6, the curve starts at Rank1 (closer to the source), and as T approaches +6, it transitions smoothly toward Rank2. The Sigmoid formula is:
Sigmoid (T) = 1/(1+e^−T)
This function helps create the smooth "S" curve between the two nodes.
Combining Them: The final formula adjusts the Y-axis position of the curve based on the progression of T. Initially, the curve starts at Rank1 (when the Sigmoid value is near 0). As T increases, the curve gradually moves toward Rank2. The formula used to combine Rank1, Rank2, and Sigmoid is:
Curve= Rank1 + (Rank2−Rank1) × Sigmoid (T)
Rank1 defines where the flow starts on the Y-axis.
(Rank2 - Rank1) determines how far the curve needs to move from the source to the target.
Sigmoid (T) controls how smoothly the curve transitions from Rank1 to Rank2, based on the value of T.
Example:
Let’s say:
Rank1 = 2
Rank2 = 5
We evaluate Sigmoid(T) at T = 0 (which gives a value of 0.5)
Using the formula:
Curve = 2 + ( 5 − 2 ) × Sigmoid (0) = 2 + 3 × 0.5 = 2 + 1.5 = 3.5
At T = 0, the curve is halfway between Rank1 and Rank2, at position 3.5 on the Y-axis, representing the smooth transition between source and target. The Sigmoid function ensures this curve moves gradually from Rank1 to Rank2 as T progresses from -6 to +6.
The combination of Rank1, Rank2, and the Sigmoid function is essential for crafting the smooth curves characteristic of Sankey charts. The formula ensures a fluid transition from source to target, producing visually appealing flows that effectively illustrate relationships and transitions within the data. By understanding the contributions of each component and utilizing cumulative percentages, you can create Sankey charts that not only inform but also captivate your audience.
This understanding will not only enhance your Sankey chart creation skills but also improve the clarity and impact of the insights you communicate through your visualizations. If you're interested in learning more about Sankey charts, I invite you to explore my other blogs. They provide practical insights and examples that can help you deepen your understanding and create your own beautiful visualizations!