top of page
hand-businesswoman-touching-hand-artificial-intelligence-meaning-technology-connection-go-

Understanding and Creating a Dendrogram Chart in Tableau: A Comprehensive Guide - I


In today's era of data analytics, visualizing hierarchical relationships within datasets is crucial for understanding how categories or data points are organized. Dendrograms, which are tree-like structures that reveal the arrangement of clusters, provide a powerful tool for visualizing these hierarchical relationships. By effectively presenting clusters, branches, and leaves, dendrograms assist in uncovering hidden structures in data, such as customer segments, sales trends, or product hierarchies.


This guide will walk you through the process of creating a dendrogram in Tableau, focusing on the calculated fields and formulas essential for constructing a clear, meaningful dendrogram visualization. 


Let’s consider an example where the dendrogram visually represents how sales are distributed across different categories and subcategories, offering insights into patterns or imbalances in sales performance across various levels of the hierarchy. We will work through a practical scenario involving a dataset with orders, people, and returns data, ensuring a clear understanding of how each calculated field fits into the bigger picture.


Step 1: Dataset and Union of Tables


For our example, the dataset includes:


  • Primary Table: Orders

  • Secondary Tables: People and Returns


Unioning the Orders table to itself helps to generate the duplicates needed for creating a smooth, structured dendrogram.  ( Detailed Steps, Please refer The Sankey charts Part 1 blog)


Step 2: Creating Essential Calculated Fields


Once the union is done, you will need several calculated fields to build a dendrogram. Each field plays a role in visualizing the hierarchical structure (branches) and the individual data points (leaves) in the tree. Here in our case,


  • Branches represent the hierarchical connections or relationships between the different levels, acting as the links between categories and subcategories.

  • Leaves at the higher level (Category) are the Category Types (e.g., "Electronics," "Furniture").

  • Leaves at the lower level (Subcategory) are the Subcategory Types (e.g., "Laptops," "Chairs") under each main category.

  • The Sales values can be visualized along the branches, indicating the distribution of sales across these hierarchical levels.



1. Calculated Field: ToPad

  • Purpose: To identify the source of the data (Orders or others) and ensure a uniform starting point for the hierarchy.

  • Applied to: Category

  • Explanation: The formula IF [Table Name]="Orders" THEN 0 ELSE 200 END sets a value of 0 for Orders and 200 for other data entries. This helps in managing how the hierarchy starts, allowing for proper alignment of data points. Right click the ToPad, Use the Create Bins option to group sales values into appropriate ranges (bin size 1) . This can help to smooth out the dendrogram by ensuring that similar sales figures are represented together, providing a clearer picture of trends and relationships.



2. Calculated Field: T

  • Purpose: To generate a numerical sequence that serves as a positioning factor along the curve of the dendrogram.

  • Applied to: Overall Flow

  • Explanation: The formula (INDEX()-1) * 0.12 - 6 calculates a linear progression that positions each element along the X-axis. It ensures that the points are spread evenly for better visualization of relationships.



3. Calculated Field: Rank1

  • Purpose: To calculate a cumulative sum of sales, divided by 2, for ranking purposes.

  • Applied to: Category

  • Explanation: The formula WINDOW_SUM(SUM([Sales]) / 2 aggregates the sales values at the category level, allowing for a balanced representation that aids in understanding the sales distribution. WINDOW_SUM(SUM([Sales]) / 2) is used instead of SUM([Sales]) to aggregate sales over a range of data points, providing context within the hierarchical structure of the dendrogram. This approach helps visualize relationships across categories and subcategories. The division by 2 serves as a scaling factor, normalizing the values for better visual representation. It prevents overcrowding in the graph and maintains clarity between branches and leaves. Overall, this method enhances interpretability while capturing cumulative sales data effectively.



4. Calculated Field: Rank2

  • Purpose: To function similarly to Rank1, helping to define hierarchical levels among subcategories.

  • Applied to: Subcategory

  • Explanation: This field uses the same formula as Rank1 (WINDOW_SUM(SUM([Sales]) / 2), allowing for the same cumulative summation at the subcategory level, thereby maintaining consistent scaling in the visualization.



5. Calculated Field: Rank

  • Purpose: To assign a unique rank based on Rank1 for establishing a clear hierarchical order.

  • Applied to: Category

  • Explanation: The formula RANK_UNIQUE([Rank1], "desc") ranks the categories uniquely in descending order based on their sales figures, establishing which categories are performing better.

  • Why no calculated field to Rank2: In constructing a dendrogram, the use of Rank1 is crucial as it uniquely ranks the overall sales figures for each category, effectively establishing a clear hierarchical structure at the primary level. By not applying a rank to Rank2, which represents the sales at the subcategory level, we avoid complicating the hierarchy with multiple ranks within each category. This streamlined approach allows for a more coherent visualization, enabling us to gain insights into the distribution of sales across categories while still maintaining clarity regarding their respective subcategories.



6. Calculated Field: Sigmoid

  • Purpose: To create a smooth curve for visual representation based on the variable T.

  • Applied to: Leaves (Overall Smoothing). 

  • Explanation: The formula 1 / (1 + EXP(1)^-[T]) generates a sigmoid function that helps in creating smooth transitions between branches in the dendrogram. (detailed explanation of the formula given in Mastering Sankey Chats blog )



7. Calculated Field: Curve

  • Purpose: To adjust the visual curve by applying the sigmoid transformation and scaling it according to rank.

  • Applied to: Leaves

  • Explanation: The formula [Sigmoid] * ([Rank] - (WINDOW_MAX([Rank]) + 1) / 2) / 100 combines the sigmoid output with the rank, creating a visual curve that illustrates the sales performance along the branches.


    • Sigmoid Scaling: The [Sigmoid] component ensures that values are normalized between 0 and 1, which is essential for consistent visual representation, especially when dealing with varying scales of data.

    • Centering the Rank: The expression [Rank]−(WINDOW_MAX([Rank])+1)/2 centers the rank values around zero. This allows for a better distribution of ranks, making it easier to see how each subcategory compares relative to the highest rank.

    • Adjusting Range: Dividing by 100 scales down the resultant values to fit within a desired range for visualization purposes. This prevents any excessively large values that could skew the visual representation of the dendrogram.


Together, this formula effectively emphasizes the relative significance of leaves in the hierarchical structure while maintaining clarity and coherence in the visualization.


The sigmoid function (Calculated Field 6)  scales the values of the leaves, creating a smooth curve that ranges between 0 and 1 for better visual interpretation. The curve calculation (Calculated Field 7) adjusts this output based on the rank of each leaf, emphasizing higher-ranked subcategories within the dendrogram to reflect their significance.



8. Calculated Field: Percentage

  • Purpose: To define the proportion of sales based on rank comparisons.

  • Applied to: Overall Flow

  • Explanation: The formula [Rank1] / [Rank2] calculates the percentage of sales, allowing for a clear understanding of sales distribution relative to the overall ranking.



9. Calculated Field: Percentage Adjusted

  • Purpose: To normalize the Percentage field to ensure consistent scaling across the visualization.

  • Applied to: Overall Flow

  • Explanation: The formula [Percentage] / WINDOW_MAX([Percentage]) ensures that the percentages are relative to the maximum value, providing a better visual context for sales comparisons across categories and subcategories. WINDOW_MAX is used here to normalize each individual percentage against the highest percentage in the dataset, allowing for a relative comparison.



Thickness of Branches: The width of the branches might correspond to the percentage values, with thicker branches representing higher sales or performance metrics.


Positioning of Leaves: The adjusted percentages can affect how leaves are positioned, ensuring that the overall layout reflects their proportional contributions while maintaining visual clarity.


Together, these elements help viewers quickly identify key insights within the hierarchical structure of the data presented in the dendrogram.



10. Calculated Field: Dendrogram Size

  • Purpose: To determine whether a point should be part of the dendrogram based on adjusted percentages.

  • Applied to: Leaves

  • Explanation: The formula IF [T] >= 6 AND [T] <= 6 + (10 * [Percentage Adjusted]) THEN 1 ELSE 0 END assesses the conditions under which a data point appears in the dendrogram, ensuring only relevant points are visualized.



Step 3: Putting It All Together


With all the calculated fields in place, the final step is to plot the dendrogram in Tableau. Here's a summary of each calculated field and its role:

Calculated Field Name

Purpose/Function

Applied on (Category or Subcategory)

Axis (X or Y)

Additional Values/Formula

ToPad

Identifies the source of the data (Orders or others) and adjusts for uniformity.

Category

X

IF [Table Name]="Orders" THEN 0 ELSE 200 END

Bins

Categorizes sales data into specified intervals.

Sales

X

Right click ToPad and Create Bins (Bin Size 1)

T

Generates a numerical sequence for positioning points along the curve.

Overall flow

X

(INDEX()-1) * 0.12 - 6

Rank1

Calculates the cumulative sum of sales, divided by 2, for ranking purposes.

Category

Y

WINDOW_SUM(SUM([Sales]) / 2)

Rank2

Functions similarly to Rank1, helping define hierarchical levels.

Subcategory

Y

WINDOW_SUM(SUM([Sales]) / 2)

Rank

Assigns a unique rank based on Rank1 for hierarchical order.

Category

Y

RANK_UNIQUE([Rank1], "desc")

Sigmoid

Creates a smooth curve for the visualization based on T.

Leaf (Overall smoothing)

X

1 / (1 + EXP(1)^-[T])

Curve

Adjusts the curve by applying the sigmoid transformation and scaling based on rank.

Leaf

Y

[Sigmoid] * ([Rank] - (WINDOW_MAX([Rank]) + 1) / 2) / 100

Percentage

Defines the proportion of sales based on rank comparisons.

Overall flow

Y

[Rank1] / [Rank2]

Percentage Adjusted

Normalizes Percentage to ensure consistent scaling across the visualization.

Overall flow

Y

[Percentage] / WINDOW_MAX([Percentage])

Dendrogram Size

Determines if a point should be part of the dendrogram based on adjusted percentages.

Leaf

Y

IF [T] >= 6 AND [T] <= 6 + (10 * [Percentage Adjusted]) THEN 1 ELSE 0 END

All the values in the table can be adjusted to meet your specific needs, allowing you to customize your dendrogram according to the dataset.


This guide provided a detailed walkthrough for creating a dendrogram in Tableau, highlighting the essential calculated fields that contribute to its structure and visualization. Importantly, all the values in the table can be adjusted to meet your specific needs, allowing for customization based on your data analysis objectives. Hope this guide enhances your understanding of dendrograms and their applications in data visualization. Be sure to check out the other dendrogram blogs in the future for more insights and techniques!

27 views0 comments

Comentários

Avaliado com 0 de 5 estrelas.
Ainda sem avaliações

Adicione uma avaliação
bottom of page